Interesting stats from Emerson Network Power and the Ponemon Institute who surveyed over 450 IT operators, reported by Data Center Knowledge. First, the facts:
- The average cost of data center downtime across industries was approximately $7,900 per minute. (A 41 percent increase from the $5,600 in 2010.)
- Average reported incident length was 86 minutes, resulting in average cost per incident of approximately $690,200. (In 2010 it was 97 minutes at approximately $505,500.)
- 91 percent of respondents reported an unplanned data center outage in the last two years (down from 95% in 2010).
While one would naturally suspect that the cost of downtime would go up, a 41 percent increase over 3 years is much higher than one would expect. Inflation alone doesn’t explain such an increase, but the steady trend of businesses becoming more reliant on technology to drive customer interactions and business processes does. As a result, these outages are gaining more attention throughout the business and therefore increasing the pressure on IT to get ahead of issues.
Better monitoring is clearly called for, and these figures make a compelling business case for such an investment. However, there were a few more metrics from the study that should be considered when making such an investment. Among the questions in the survey was regarding the source of these outages, with some interesting results.
- UPS battery failure (55 percent)
- Accidental EPO/ human error (48 percent)
- UPS capacity exceeded (46 percent)
- Cyber attack (34 percent)
- IT equipment failure (33 percent)
- Water incursion (32 percent)
- Weather related (30 percent)
- Heat related/CRAC failure (29 percent)
- UPS equipment failure (27 percent)
- PDU/circuit breaker failure (26 percent)
When evaluating your current monitoring strategy, or a potential investment, it may be useful to consider if the approach has the ability to trace issues to these most frequently cited causes of downtime.
The latest generation of monitoring tools have placed a heavy emphasis on application and user experience monitoring, with many vendors even making the claim that application monitoring is all an organization needs. Yet, application monitoring has no visibility into power and air and the physical infrastructure, meaning the approach has no ability to trace the vast majority of the root-causes identified above. Yes, you would get alerted that the application is down, but how does that help you trace the source? Monitoring tools should do more than tell you what’s down, they should aid in root-cause analysis and troubleshooting. But as the old adage goes, “You can’t fix what you can’t see.” Clearly, the most effective approach to monitoring must be inclusive of the entire service stack, particularly the network and power and air, elements that are increasingly being ignored until they fail.