Data centers are crucial to any company’s operations, whether servicing external customers or internal users. They are also evolving, increasing in complexity while continuously generating more and more data. To wit, a recent study from Cisco predicts a 226% increase in global data center traffic, 4.7 to 15.3 ZB, from 2015 to 2020—much of which will pass through the cloud. However, managed service providers will remind you that it is data centers that power the cloud, not some unseen computing force in the ether. In order to maintain the standard of “always up, always on” that is synonymous with cloud computing, and that is becoming demanded of IT in general, the data center must be reliable and operational at all times.
As data centers evolve and become more advanced, so too should the tools that are built for them. The traditional approach of monitoring data centers results in reactive problem solving and more alerts than anyone can handle. As we move from reactively monitoring to proactively managing data centers, there will be fewer disruptions, faster resolution, and higher efficiencies. But in order to achieve meaningful gains in these metrics, the new approach and tools must support a critical functionality: automation.
IT departments everywhere are currently inundated with alerts from monitoring tools. These e-mails and texts always seem to come through at inopportune times, whether in the middle of lunch or the middle of the night. In addition to dealing with the massive amounts of alerts, the IT professional then has to deal with the issues causing the alert. If not resolved swiftly, the effects can compound and cause headaches for those trying to resolve problems down the line. For example, the organization’s bottom line can be significantly impacted if the issue ultimately results in an outage.
Research from Ponemon Institute has calculated the average cost of a data center outage to be approximately $740,000, or $7,900 per minute, in 2016, a 38% increase from 2010. With a mean outage duration of 95 minutes, the downtime not only impacts direct customer business but also the use of internal resources and external reputation. By incorporating automation into data center management, the system is able to take action to resolve issues as a replacement for sending alerts and help prevent costly outages.
For example, underestimating peak load can cause unexpected downtimes from unforgiving system failures. Normally, breaching a static, pre-determined threshold would trigger an alert and require human intervention to ensure the system stays online. However, if a data center management tool was able to increase the amount of available storage, by drawing from a shared pool for instance, these catastrophic system failures could be avoided.
Once the load returned below the threshold, the excess storage could be released back into the pool with a return to business as usual. These simple actions could be performed in the background, without human intervention. The ideal is to move past having rules for triggering alerting towards having rules that can trigger actions—automation replacing human intervention. This leaves IT professionals free to perform higher-level functions and adding increased value to the organization.
Why stop at automation only within the same tool? There are a plethora of excellent IT-focused tools on the market today and it would be impossible for one super-solution to capture all of their collective functionality. Luckily, the vast majority of tools offer access to their APIs allowing outside programs to tap into their expertise.
A powerful example is the native applications purpose-built by hardware manufacturers for data center equipment. In order to utilize these APIs effectively, next-generation IT tools must be able to execute scripts and push requests out to other programs. The ultimate goal is to achieve end-to-end, zero-touch, automated datacenter management that has governance and is auditable. The modern enterprise, with its massive amounts of data and its expansive, interconnected networks, requires an IT management system that can take autonomous actions if it is to harness the power of the next-generation datacenter.
With the aforementioned peak load example, it was implied that the thresholds were static. This is how IT events are measured today with phrases like “20% of max” being commonplace. But what if you expect a region of the data center to be busy at a certain time? For example, your email server at 9am Monday morning. There is no need for automated action in this case.
To this end, prediction can greatly enhance the benefits of automation. Data center management tools that have predictive capabilities use historical data previously gathered from the IT ecosystem to build models and produce expectations. These models show the predicted ebb and flow of variables in the data center which allows for the detection of outliers, those rogue events that are not part of normal operation. The models get more accurate over time as the amount of historical data is increased. Using outliers greatly reduces the volume of alerts, and therefore noise, and only triggers automation when action is truly required. In addition to affecting increased precision in automated actions, predictive capabilities can also save money with more accurate capacity planning and maintenance schedules for individual pieces of equipment.
From a business perspective, increasing the amount of automation in the data center frees up IT professionals to pursue higher-value activities. Logs describing all automated actions and persistent changes are generated in parallel so they can be reviewed. These logs conform to compliance mandates and are instantly auditable so data centers running business solutions can take advantage of automation. Persistent triggering of the same alert, and subsequent automated action, can also be reported to an employee so a review of the rule can be undertaken.
We as a society reap the benefits of automation constantly in our everyday lives. As we move from monitoring data centers to managing them, we should leverage automation to reduce IT’s workload and better utilize existing infrastructure.
The next generation of tools for data center monitoring will go beyond sending emails or texts with the ability to take action without human intervention. The results include fewer disruptions, faster resolution, and higher efficiencies, all on the same, or smaller, budget. And couldn’t we all use an alert-free lunch and good night’s sleep?