Avoiding Big Headaches with Big Infrastructure

Aug 19, 2014

By Tim McIntire

The overwhelming movement to cloud computing and big data is creating compelling efficiencies for businesses. However, this growing revolution is also introducing previously unseen levels of complexity for IT managers.

Open source software and commodity x86 hardware has started to level the playing field for businesses of all sizes. By adopting open source systems, companies are no longer locked into proprietary solutions from big vendors. Smart IT leaders can now design and deploy their own custom systems with open source software and less costly commodity hardware.

However, while the economics are overpowering, this transition is problematic because it requires configuring an order of magnitude more settings and parameters in software. This is needed to manage what was previously contained in proprietary hardware/software packages for traditional servers, storage and networking products.

In addition, clustered compute and storage functions introduce an additional layer of management complexity. These clusters offer exponentially greater flexibility, allowing users to dynamically configure computing power and storage capacity to fit rapidly evolving workloads. Their benefits, however, can be overshadowed by the difficulty of managing them.

In order to solve the problem of big data and cloud architectures that don’t fit into traditional IT frameworks, we need entirely new technologies to unlock these potential rewards.

The Emergence of Hadoop and OpenStack

Apache Hadoop is one such architecture most commonly associated with big data. Hadoop is an open source software framework that allows for the distributed processing of large data sets across clusters of commodity hardware. Hadoop does not, however, cover the management of bare metal, OS, disk, and network infrastructure below it.

Another example, is the OpenStack project, which encompasses new technology approaches for cloud implementations and shares the cluster infrastructure management problem presented by Hadoop. The stated mission of OpenStack is to produce a ubiquitous open source cloud computing platform that meets the needs of private clouds regardless of size.

The OpenStack project includes modules for computing, networking, and storage. With OpenStack, most functions of a typical data center are disaggregated from expensive proprietary hardware and incorporated into software. OpenStack allows data centers to manage all of the services for computing, networking and storage through the software abstraction of hardwired functions, enabling scale-out cloud infrastructure at a fraction of the cost of proprietary solutions.

These open source projects provide much greater flexibility for IT managers to quickly modify business processes and introduce new applications. The lower cost and scale-out architecture of Hadoop is a major benefit. Legacy technologies can process very large data sets, but not nearly fast enough and at a much higher cost. For instance, in the case of fraud detection for a financial services company, it’s critical to analyze intrusions on the spot and then immediately take action.

The Drive to Automate Infrastructure Tasks

Making the transition from legacy IT infrastructure to a software-defined methodology is becoming a central initiative for every business today. The economic incentives of moving to an open cloud model are just too great to ignore.

Companies can save 70-80% on their infrastructure budgets by swapping out their expensive, purpose-built server, storage and network systems for cheaper x86 components and adopting virtualization abstracted by an open cloud API.

In order to efficiently scale up open cloud infrastructure, there are 3 key points to manage your system:

Automation – The only way to maximize your ROI when investing in an open cloud or Big Data solution is to ensure that your IT employees remain productive. This means automating as much maintenance work as possible and leaving your IT workforce free to create new applications, rather than care and feeding for infrastructure.
Integration – One of the largest deterrents blocking organizations from migrating to open cloud APIs is the complexity of integrating with older legacy systems. In order to succeed in running legacy programs (and adding unknown new ones in the future), your infrastructure must be compatible to integrate with a wide range of the leading hardware, operating systems and applications.
Scalability – Enterprises are migrating to open computing models including Hadoop and OpenStack because their businesses are growing. Your infrastructure should be able to grow at the same pace as your company. Deploying individual servers is a challenge with linear complexity, but deploying clusters is a challenge with exponential complexity. This complexity can inhibit growth as you scale – your infrastructure must remain flexible enough, yet robust enough, to rapidly add additional capacity.

Investing in Big Infrastructure to Make Big Progress

With the growth of cloud computing and big data on the rise, success hinges on wider market adoption of big infrastructure solutions. According to the IDC, private cloud revenues will reach $22.2 billion by the end of this year. Big data is poised to reach $16.1 billion, and infrastructure components account for 45% of the big data industry (growing faster than any other segment).

Such growth is forecast to increase further in 2014, especially after Microsoft announced a recent plan to launch a $1.1 billion initiative to build the world’s largest data center in Iowa, spanning more than 6 million square feet – that’s larger than the Pentagon.

However, enterprises that migrate to OpenStack or Hadoop without upgrading their related infrastructure management services will experience slow delivery, degraded performance, or in some cases, even have production outages in the data center. We call this problem the “pain curve,” which was first described by Dr. Greg Bruno, co-founder and vice president of engineering at StackIQ who asserts that systems in every data center have an eventual tipping point where the pain of managing IT puts the business at great risk. Hitting this “pain threshold” indicates a company has increased its data center complexity faster than the underlying infrastructure management solution can maintain it, similar to a car that has run out of roadway.

Most data center administrators have this issue under control for traditional enterprise applications, which generally have a linear complexity curve as they scale. However, cluster applications such as Hadoop and OpenStack have an exponential complexity curve. Starting out small may be manageable, but we’ve seen top-tier enterprise customers blow through their pain threshold time and time again as they scale.

It’s Automatic to Simplify IT Management

The widespread benefits of cloud computing and big data are now obvious. These new IT architectures enable companies to slice and dice massive troves of data and uncover the hidden patterns behind business outcomes.

Business intelligence software allows decision-makers to cut through the noise to find actionable information based on customized queries. Companies are using predictive analytics to forecast sales trends, fine-tune targeted marketing campaigns, and balance their supply chains with their demand outputs.

Yet managing these complicated clusters requires an integrated platform to keep data centers running smoothly in the background. The most cost-effective solution involves an open source framework using Hadoop and OpenStack, supported by a management system that can efficiently automate, integrate and scale all the related infrastructure components.

Tim McIntire, CEO and Co-Founder of StackIQ.