The Death of Capacity Management as We Know It

The objective of “old school” capacity management was to ensure that each server had enough capacity to avoid performance issues, and typically focused on trend-and-threshold techniques to accomplish this.  But the rapid adoption of the cloud, and now OpenStack, means that supply and demand is much more fluid, and this form of capacity management is now obsolete. Unfortunately, many are ignoring this new reality and continuing to rely on what have quickly become the bad habits of capacity management. These old school methods and measures not only perpetuate an antiquated thought process, but they also lead to, among other problems, low utilization and density.  In a world of tightening IT budgets these are problems that can’t be ignored.

The modern ecosystem of virtualized and cloud infrastructure is far more dynamic and sophisticated, and the management approaches many organizations use to govern it have not kept up. Relying on utilization trending and growth-centric models – which often means various Excel spreadsheets, post-it notes and gut-feelings -- cannot accurately predict the future for complex “new school” environments.  In fact, it can result in horribly inaccurate predictions of infrastructure requirements, as these approaches typically only model organic workload growth, and ignore net new workloads that may be on the radar. In virtual and cloud environments, the largest impact on capacity tends to be these incoming demands, arising from new VM requests, application deployments and physical workloads being virtualized.  

Capacity management has traditionally been about looking to the past to predict the future, but many managers are still underestimating the importance of capacity booking and placement analysis.The ability to reserve capacity in target hosting environments based on concrete knowledge of upcoming demands is becoming critical, both for users and for the groups that manage back-end capacity. Similarly, placement analysis is becoming indispensable for both planning and ongoing management, and determining exactly where a workload should be hosted is arguably one of the most important decisions in cloud environments. While these measures are increasingly important to data center operations, their power has yet to be leveraged in many IT organizations.

Additionally, many old school capacity managers take allocation analysis too lightly. While analyzing VM resource allocations is quite similar to sizing servers in traditional physical infrastructure, it can now be performed much more frequently, and has become more complex due to the emergence of cloud catalogs. While allocation analysis is on data center managers’ radar, it isn’t finding itself at the forefront of capacity management, and many environments experience operational risks due to VM sizing issues.

The Next Frontier

Ultimately it comes down to the fact that the next generation data center is much more complex. New management challenges include:

  • Faster and a much higher rate of change: Workload mobility and the introduction of new and transient workloads into virtual infrastructure results in a far more fluid environment. Decisions need to be made much more quickly, and often the environment is changing while the decision-making process is underway.
  • Forecasting is critical: Capacity management in the past also used to be synonymous with over-provisioning due to managers not wanting to impact the performance of servers. As a result, cost savings and efficiency targets for virtualization initiatives have not been met. Infrastructure managers are now being asked to take a harder look at how they make these decisions to address the issue, with the goal of driving up density and decreasing costs.  The first step to take is the implementation of processes and systems to capture upcoming capacity demand, through both reservations systems and cloud request portals, in order to more accurately model requirements.
  • Shared infrastructure means competition for resources: With the ability to create high density environments, managing the allocation and sharing of server resources is a critical concern. Done poorly, this leads to either performance issues or excessive allocations and increased costs.  IT needs to optimize allocations in order to balance efficiency with performance risk.
  • Regulatory and policy compliance: Another challenging aspect of sharing infrastructure is ensuring that workload placements adhere to regulatory requirements and policies. For example, in financial institutions, the workloads of researchers and traders typically can’t go on the same physical systems. Beyond regulatory requirements, operational policies also have a role in placement and infrastructure design that require attention. Organizations typically have resiliency policies that require workloads and their failover counterparts to be kept on separate physical hosts, or disaster recovery (DR) policies that require capacity to be automatically reserved in remote sites.
  • Workload placement: How virtual workloads are placed or fit together on physical infrastructure directly determines how much infrastructure is required.  This is conceptually similar to the game of Tetris, in which different shapes and sizes of game pieces need to be fit together properly to make the best possible use of the game board. Virtual workloads are no different than those game pieces, and servers are the game boards. By understanding their patterns and personalities, you can fit them together in a way that safely increases density. Poorly placed workloads, on the other hand, either increase performance risk or leave capacity stranded, wasting resources and valuable IT budget.

All these factors necessitate a more intelligent approach to capacity management. Traditional approaches simply don’t allow organizations to handle these various factors and determine where to place workloads, how to allocate resources to individual workloads, and how much infrastructure is really required. Organizations need to evolve their thinking about the management challenges that exist in order to effectively address them. They also need to completely change the way they look at capacity management if they want to achieve the benefits of virtualization and cloud.

About the author:

Andrew Hillier is CTO and co-founder of CiRBA.