Natural disasters like the one that shook up the east coast, courtesy of Winter Storm Jonas, can have the best of IT departments grappling with how well their data center disaster recovery plans will hold up. When these incidents happen, they can be brutal.
However, protecting your data against blizzards, flooding and earthquakes is hardly enough. Studies show that cybercrime has steadily risen to among the leading causes of data center outages — causing more than 22 percent of them, up from 18 percent in 2013 and 2 percent in 2010, according to a recent survey by the Ponemon Institute and Emerson Network Power. That upward trend makes it the fastest-growing cause of outages.
Worse yet, the downtime is steadily increasing as well — with the average cost per minute of downtime coming in at $9,000 in 2015 — up from $8,000 in 2013.
The study also revealed that UPS failure remains the top cause of data center downtime, causing 25 percent of data center outages in 2015. The third? Just behind cybercrime? Simple human error, which was to blame for 22 percent of outages in 2015. Weather came in fifth, behind the mechanical system failure, which was fourth.
While there’s no way to completely eliminate your risks for unplanned data center downtime, there are several ways to minimize them. Here are just several measures you can implement:
Test your batteries. One of the leading causes of UPS failure, which happens to be the No. 1 cause of data center downtime, is malfunction related to the batteries. Many UPS batteries are lead acid and require an ideal temperature of 71.6 degrees to 77 degrees for optimal performance. Anything outside of that range reduces the battery life. Here’s another thing. During a power outage — caused by a natural disaster or other cause, your data center will operate on the batteries. Doing so will impact the battery life. After a power failure, make sure you test the performance of your battery. On an ongoing basis, make sure you maintain them and replace them as necessary. Ideally, they should be checked and tested every 12 months.
Minimize the risks caused by human error. With human mistakes ranking at the top of causes of downtime events, it’s important to analyze all areas that are susceptible to those types of errors. That should include analyzing operating strategies for deficiencies; assignment of employee and department responsibilities and tasks; and access and security measures. It’s important to get everything in writing, and following up with training.
Regularly test. With battery changes, human errors and any assortment of factors, including software upgrades and hardware updates, your team will need to constantly test your data center operations to make sure they are fully functional and that you’re able to recover from an event if necessary.