Improvements Instead of Incidents: Optimize and Align IT Services

Image courtesy of Shutterstock.

These days, managing a data center can be like working inside a pressure cooker. In a business environment that is increasingly reliant on the services that originate in data centers, the stakes get higher every day. Waves of emerging technologies over the last few decades have converged rapidly, leaving even the best of IT departments racing to stay on top. Virtualization, dynamic computing, cloud computing, big data, the Internet of Things—each major development turns up the heat, but budgets, staff, and skills often lag behind explosive growth in data center scale and complexity.

At the same time, both data and business management paradigms are shifting, putting service and customers at the heart of the enterprise. The experience of end users is paramount, so decision-making in the data center must be closely aligned with business objectives. Internal and external customers have more choices, and competition is fierce. Keeping costs low, and doing more with less, is a constant message. How can data center professionals keep up their ruthless drive for efficiency while simultaneously delivering seamless, on-demand service?

The answer lies in the power of advanced analytics to improve IT service through continuous optimization efforts. A recent  Teamquest survey shows there is significant room for improvement: almost every respondent (93%) said that proper IT optimization and performance analysis would increase IT efficiency in their organization, yet less than a quarter (22%) said they were able to accurately predict incidents and their consequences. In such a high stakes environment, flying blind just won’t suffice, no matter how skilled your technicians may be.

On average, IT managers address eight unexpected issues each week, including slowdowns and outages, availability issues, equipment failures, and underperforming applications. Competitive advantage goes to the enterprise that is free to spend time and resources on advancement—innovating products and services, increasing customer satisfaction, reducing risk and costs—over the enterprise that is constantly putting out fires. In the same survey, 74% of respondents who manage IT optimization initiatives claim direct benefits for their company, including reduced overall risk, better productivity, and fewer outages.

Very few companies have reached peak maturity when it comes to IT performance management, and no organization is ever finished optimizing. Thanks to ever-changing business and consumer needs and the nonstop emergence of new technologies, vendors, and approaches, there’s never a chance to rest on your laurels.   

When planning a comprehensive service optimization program, where should your organization start? It won’t work to layer on an analytics program and hope for the best. To achieve useful and effective results, advanced analytics have to be closely integrated. Carefully selected machine, log, usage, power and cost data from throughout the stack should be correlated, aligned with carefully defined business objectives, and supported by mature, automated processes.

Each enterprise has its own blend of strengths and weaknesses, not to mention its unique IT infrastructure deployment. Having a clear understanding of current status is an essential first step. This is the time to ask questions like:

  • “What do we have and what is it doing?
  • If it’s doing something unexpected, why?
  • When will it break and why?
  • What should we do to prevent it?”

A thorough assessment should identify priorities (e.g., mission critical services, data, and users) and component dependencies. The correlated data can be used to identify historical trends, which in turn informs forecasts of future component behavior. Matching these forecasts with established priorities and business objectives will help you identify the weak points that carry the most risk. Armed with this intelligence, you can begin to proactively avoid incidents, a key step in improving overall service levels.

Applying advanced analytics to the complexity of data center operations is no stroll in the park. To get sustainable improvements from your efforts, it’s important to crawl before you walk. Gradually refining the supporting processes and skills ensures a balanced approach, where each level of maturity attained feeds off the capabilities mastered in the preceding levels.  For example, access to empirical data about how the various components and layers of the infrastructure are being used is a basic requirement. Component numbers and interdependencies have vastly increased with the adoption of virtualization and software defined data centers. Optimization efforts will be little more than a guessing game if you only have partial data.

Once you have detailed information about the behavior of the components underpinning your services, you can use descriptive analytics to monitor them and send alerts based on preset thresholds. This will enable you to identify and mitigate incidents in a more timely fashion. Even at this fairly simple reactive phase, automating your analytical capabilities is essential. Not only is it necessary for keeping pace with the rate data is being produced by your infrastructure, but it will lead to consistent, repeatable procedures. You will have a better understanding of how the infrastructure is actually being used, so you can accurately address issues of under- or over-provisioning. After all, capacity management lies at the heart of both efficiency and service.

At the next level, you will build in more proactive capabilities. Discovering patterns in the data you have collected, and learning from the relationships between events provides richer context. Aggregating and retaining selected portions of collected data will allow you to perform more advanced analyses. You will need to work across traditional silos to integrate and federate more information sources: configuration data, facilities data (power, cooling, floor space), asset and costing data, service level agreements, business transaction volumes, and more.

Shift from a purely technical focus to a more holistic view that incorporates business objectives at this stage. Set priorities for incident response and service improvement. Understand how the business currently uses components, how it hopes to use them in the future, and what uses are most critical, most expensive, most likely to cause process bottlenecks, etc.  A more comprehensive view will help direct your inquiries. Limiting your scope will keep your efforts focused and produce results more quickly. This enables you to communicate in terms that will make the most sense to stakeholders and customers and bring IT and business into closer collaboration.

In the next phases of IT service optimization, it’s time to move beyond predictions based on historical data. Creating and analyzing indicators of service health based on response times and latency will produce more accurate predictions for long-term planning. Likewise, scenarios involving non-linear growth caused by factors not represented in historical data require the use of sophisticated mathematical algorithms and advanced machine learning. Predictive analytics will allow you to assess and model various scenarios and base decisions on which projected outcome is the best fit for your business and resources. This type of approach is obviously more resource intensive and may not be practical in every case. Again, a solid understanding of the business context will allow you to use predictive capabilities to perform wider, more standard assessments and drill deeper only as needed and justified by business impact.

If you reach a truly advanced state of applying analytics, you will be able to move beyond capacity management and risk reduction into a comprehensive decision support role. The ability to see the whole, integrated picture combined with the ability to model predictions will put you in a position to influence how resources are being consumed. Prescriptive analytics begin to play a role at this level. By analyzing a set of scenarios and forecasting consequences, you can experiment with changes and choose the optimization efforts that deliver the most value with the least disruption. Ultimately, you can develop the capability to automatically rank and prescribe actions in response to predicted events. Imagine the value you could deliver through innovation and efficiency if you could drive all the smaller corrective actions involved in continuous improvement through automated, intelligent processes.

If you stay closely aligned with the business and customer, you will find that along the way to maturing your IT service, you have achieved core operational enhancements: maximized resource utilization, responsiveness to business change, planning and budgeting accuracy, and better SLA performance. In other words, IT infrastructure isn’t the only part of the enterprise that benefits. Overall visibility into the business, its services, and its customers is significantly elevated.

There are myriad challenges to address in today’s data center, with new ones piling on at every turn. The drive for efficiency will only intensify due to resource scarcity, sustainability initiatives, and environmental regulations. You certainly can’t afford to under-utilize existing assets or invest in infrastructure you don’t actually need. The only way to get ahead and rise above is to arm your teams with intelligent analyses and automated incident response. Spending resources on one-off responses to unexpected events will drag businesses further behind, exposed to risk and unable to compete. Incrementally deploying the power of automated, advanced analytics throughout the data center builds in the processes and insight that create a strong and flexible foundation for the future, whatever it may hold.