If one considers the inception of the system known as “Amazon Web Services” (AWS) to be the onset of what has become known as the public cloud, then today the cloud is more than 10 years old. After 10 years, the cloud industry would have you believe it is mature and stable, safe and secure and that it is, most importantly, fit to use as an essential infrastructure component of a modern 21st-century business. Yet, insiders remember the cliché of caveat emptor, or buyer beware.
Many providers of cloud services market the idea that all critical computing functions should be run using their public cloud services because this paradigm is the future and the future is now. While we do share that long-term vision, the reality is less impressive, and the solution is not yet complete. Amazon itself does not run 100% of its critical business systems in the AWS Public Cloud, a fact that was revealed in The Wall Street Journal article, “Cloud-Computing Kingpins Slow to Adapt to Own Movement.” This is also true for Google, Microsoft, and other top cloud providers.
Cloud Outages in 2016
Every major cloud provider had a significant outage in 2016. For example, in January, the Verizon cloud scheduled its cloud to go offline for 40 hours for system maintenance. Ironically, this effort was to prevent future outages.
In February and March, the Google Compute Engine had an outage due to connectivity issues. In March, millions of people couldn’t buy music, books, or applications from Apple. In March, Microsoft Azure in the central U.S. had two of its public cloud services go down for more than 2 hours due to connectivity issues. Within 24 hours of that incident, a second incident on Azure affected users on the East Coast and took down virtual machines, websites, and other cloud services.
In April, Starbucks had 7,000 stores in the U.S. and 2,000 stores in Canada affected by a cloud outage. Stores did not reopen until the next day. In May, Apple suffered a 7-hour outage that affected some 40% of the world’s 500 million iCloud users. Finally, in July, an AWS glitch affected Netflix and Pinterest.
Cloud Lessons Learned by the Early Adopters
Lesson 1
Public clouds by design are not built to be highly redundant. Both Apple and Starbucks put all their eggs in one basket. So when their cloud provider had an outage, their business was disrupted. They learned the hard lesson that public clouds may not be highly redundant.
Lesson 2
You must architect a hybrid cloud solution that spans multiple cloud infrastructures. According to Morgan Stanley, it is estimated that Apple spends $1 billion annually with AWS. Common industry rumors are that Apple intends to return much of its computing to its own data centers. Also, earlier this year, it was publically reported that Apple will be spending more than $400 million a year on the Google Compute Engine.
What is clear is that Apple is moving to a hybrid cloud approach, so it is not reliant on a single cloud infrastructure. The move to Google will split Apple’s cloud infrastructure between two major  providers. A  wise  approach  to optimizing cloud architectures may be to treat all cloud resources as a commodity by utilizing multiple low-cost providers. A hybrid cloud that spans multiple providers is your best defense against business disruption due to infrastructure outages.
Netflix is one of the few accomplished companies to implement a successful and redundant cloud architecture within a single provider. It  utilizes  Chaos  Monkey,  which was developed by Netflix engineers to constantly test the resiliency and recoverability of its AWS environment. Netflix, in our opinion, is the exception to the rule. One could also argue that Netflix was forced to develop Chaos Monkey because it was such an early adopter of the cloud and had no other good options at the time—which brings us to the next lesson learned.
Lesson 3
Don’t try to develop an enterprise-wide cloud strategy without help from a qualified provider of infrastructure services. Many companies mistake early success in deploying test and development systems into the cloud as having gained the experience needed to deploy production workloads into the cloud. The reality is that only the largest companies with the biggest IT budgets and internal resources are capable of self-deployment of enterprise-wide cloud strategy into a public or hybrid cloud. The advice is to seek help from the experts. Look for infrastructure vendors who broker multiple solutions.
Lesson 4
Not all infrastructure should be “bought by the drink.” The idea of acquiring resources when they are needed can seem attractive. Buying by the proverbial drink may be efficient and effective—until there is a “run at the bar.” Architecting the right cloud infrastructure requires an understanding of the service-level requirements of the client, the capabilities of the technology infrastructure, and the cost model associated with that infrastructure. Many companies have experienced “sticker shock” when they picked up their monthly tab for all those drinks.
Lesson 5
You are responsible for backing up your data. While some public cloud services providers are advising customers to abandon their own data centers and trust them to manage business critical computing infrastructure, often the contract still holds the customer responsible, for situations such as data loss.
In addition, each and every cloud provider makes a point to market the level of security and compliance that they offer. For example, some of the certifications listed for Microsoft Azure include  HIPAA/HITECH, PCI-DSS, SOC-1, SOC-2, SOC-3, UK G-Cloud, FedRAMP. Although impressive, Microsoft’s list of industry-verified certifications is conceptually incomplete. When scrutinizing the fine print of the contract, it is revealed that the customer retains responsibility for ensuring systems and applications compliance. Even if a cloud provider has a list of impressive security certifications, the  actual  responsibility for security may fall squarely on the customer’s shoulders.
Lesson 6
Look for cloud providers that will share responsibility with you for critical tasks such as backup and security. Not all cloud providers are equal. Attempting to force a large cloud sevices provider to change its contract is difficult and not likely to happen unless you have the buying power of an equally impoing company such as Apple. However, this is not true with all providers. When you work with a smaller cloud broker, you can still get access to public cloud products and, at the same time, they may take on a higher level of responsibility.
Lesson 7
Look for cloud providers that will put “skin in the game.” Identify cloud providers with well-documented service-level agreements and real self-imposed penalties for failure to meet those requirements. When you trust your business to someone else, you want to make sure that penalties for failure exist. In the event of a failure, the cloud provider should bear a cost equal to the cost borne by the customer.
Lesson 8
Choose a vendor that offers a single pane of  glass to monitor and manage it all. It was hard enough to manage infrastructure when it was all under one roof. If you were to take a typical organization and map its technology into four quadrants (off-premise, on-premise, corporate managed, and non-corporate-managed), it would become apparent just how complex the world that our infrastructure resides in has become.
Managing your infrastructure across silos is not efficient or effective. As more and more of your infrastructure falls outside your physical walls, it is increasingly important to find cloud providers that offer a single tool to monitor and manage it all.
Lesson 9
Choose a cloud vendor that offers a single bill. Just as it is ineffective to monitor and manage across silos, managing costs with multiple bills is also inefficient and ineffective. Many cloud brokers today offer a single bill across all the infrastructure they supply, providing billing services that are similar to the way that your mobile phone carrier analyzes your bill for cost savings.
The Bottom Line on Cloud
The cloud is now 10 years old but it is still evolving. Just as with any new technology, it brings new capabilities and new challenges with it. The better you understand the risks, the better your ability to redeem the awards that can come with it.