Your Data and the Cloud

Choosing when to  leverage cloud  infrastructure  is  a  topic that should not be taken lightly. First and foremost, it is about a data-first approach and not an application-first approach. Leveraging data is what drives a business.

I have been saying for a number of years that a data center is a data center, is a data center. The “cloud” is just someone else’s data center. There are a few issues that should be considered when debating cloud as part of a business  strategy.

The first and foremost is security. This is very important, and it used to be the No. 1 concern about going to the cloud. In the last few years, however, this issue has fallen by  the wayside and is not quite the hot topic it used to be. This is because cloud infrastructures have been shown to be rather secure. Nonetheless, this issue shouldn’t be ignored, as there is more than just system security to be concerned with. The second issue is managing a data center in the cloud. This has become a hotter topic as more people are considering cloud deployments, and in general this should be considered during your own planning, especially if you don’t want to be locked into a single cloud provider. The final concern for running on-premise versus in the cloud is cost.

There are two approaches to running in the cloud. The first is to tightly couple the data center to the cloud, and the second is to stay decoupled from the cloud provider. Provider services such as Amazon S3 or Google BigTable provide value to customers, but can also be viewed as ways to keep a customer tightly connected, making it difficult to pick up and move to another provider. Leveraging the facilities of your  big data solution is the second approach that can provide feature consistency across on-premise and cloud deployments. Utilizing a converged data platform has major benefits that are part of an ideal solution, because it can run on any hosting provider, as well as simultaneously run on-premise. This enables your applications and data to be decoupled from the cloud provider, allowing you to choose on-premise or one or more cloud providers that best meet your elastic compute and storage needs. This puts the power into your hands and effectively forces cloud providers into a deregulated utility provider model where you as the customer have the  leverage.


Let’s focus on two types of security situations. The first is keeping people out of your data center. Given good management practices, this is reasonably easy to address, and it’s worth noting that cloud providers have done a nice job of giving their customers a warm and fuzzy feeling about running in  the cloud. The second, and I may argue the more important, issue is securing your data within your data center. Regardless of where the data center exists, most organizations have policies and perhaps even government regulations that they must adhere to.

Big data platform users have had to pay strict attention to data security, as this is the heart of any enterprise. Whether data is in files, a database, or streaming events, each location must be considered. The choice to bind to a cloud provider’s services is a choice each person must make on his/her own. Given the option of a converged data platform, it is very easy to apply security in one place and no longer have to worry about how to secure your data elsewhere, regardless of which provider you choose. The problem with a security model that is tightly coupled to the cloud is that the providers have no desire to standardize; they want to lock you in. Keep security at the forefront when making choices in the  cloud.


The management of resources in the cloud from a data perspective is about more than just spinning servers up and down. The services you need to build your business solutions are what give your business an edge. Consider data replication across the globe, perhaps even requirements for omni-directional replication, either from a NoSQL JSON  database or from streaming events. Global, strong consistency and even location awareness for business applications are absolute musts. Most importantly, with the need to run in multiple locations, whether only in Amazon or across providers, a global single namespace is a requirement in order to view multisite clusters as one logical cluster.

Consider these capabilities when choosing an enterprise stack to meet your current and future  needs.

How data is managed will continue to evolve, and the ability to abstract or blend your data going forward is a certainty. Taking this whole cloud concept a step further, what if your business needed to run servers in a car trunk? How would you manage that? It’s guaranteed that cloud providers won’t be able to provide their solution in a car trunk.


There are many costs to consider when going to the cloud. The comparison starts with upfront capital expenditures versus monthly operating expenses. The ease of getting started in the cloud can’t be touched with on-premise. Ongoing data storage in the cloud will add up quickly if that data is not being used for operational purposes. Storage costs on-premise are considerably lower, but data center costs, electricity usage, and upfront hardware purchases must be considered. An always-on platform in the cloud can rack up CPU costs rather quickly versus running always-on hardware for on-premise.

When considering the optimal way to leverage resources, whether on-premise or in the cloud, the benefits of the Zeta Architecture start to shine through. The biggest waste in the cloud is when new servers are spun up and then underutilized. At the core of the Zeta Architecture—an enterprise architectural construct not unlike the Lambda architecture, which enables simplified business processes and defines a scalable way to increase the speed of integrating data into the business—is global resource utilization, which enables you to squeeze the most out of the hardware being used. Web servers typically only utilize about 10% of the resources allocated to them. If you have three dedicated web servers, that averages to 70% waste in total server capacity, which could be reduced by keeping costs and resource utilization top of  mind.

Final Thoughts

When deciding what is best for your business, think about everything that matters to you, and then think about everything that matters to a cloud provider. Make sure you are looking out for No. 1, because after all, your business is the most important part of this equation.

Take a company such as Netflix into consideration. The company is  world-renowned for  streaming movies and has publicly announced that it runs completely on Amazon in a rather tightly coupled manner. The company does, however, have a full backup and disaster recovery (DR) copy of its data stored on Google. While this is a fine situation, one may imagine that there could be better options to consider which would deliver significantly greater benefit to its business. Imagine for a moment if it were leveraging both Amazon and Google just as utility providers for its  infrastructure.

Netflix could run a converged data platform on any cloud infrastructure of  its choosing. Instead of  having a dead copy of data to recover from and having to be concerned about a  DR event, it could be running all its services on multiple providers simultaneously, removing the necessity to worry about a disaster  scenario.  If there  were  a catastrophic failure with one provider, users  would  be  moved  over  to the other provider without ever knowing. To make sure it is abundantly clear, however, Netflix built its infrastructure before a converged data platform was an option. It has  built and developed internal processes and components to handle disaster recovery. Companies looking at leveraging cloud infrastructures  now  have  even more options.

Everyone should consider a multi-data center approach to managing their data. Whether in the cloud, on-premise, or both at the same time, leveraging someone else’s infrastructure to meet business needs is a great luxury to have. Running in multiple data centers is also the easiest way to make disaster recovery events an afterthought.


Subscribe to Big Data Quarterly E-Edition