6 Key Cloud Takeaways from Amazon’s S3 Outage

Mar 1, 2017

By Joyce Wells

Amazon's Simple Storage Service (S3) outage prompted observations and reflections from industry experts about the need for proactive cloud services monitoring, the requirement to diversify with multi-cloud strategies, and even the possibility of "too-big-to-fail" safeguards for large cloud services providers. The disruption, which took place on Feb. 28, affected many AWS customers for more than 4 hours, impacting websites and other services. As Amazon explained later, the cause of the service disruption in the Northern Virginia (US-EAST-1) Region was found to be simple human error.

Here, a roundup of some of the dominant themes expressed in executives' comments:

Communication - Brands need to be proactively monitoring their website from an end-user perspective. If they quickly discover the problem, they can take steps to mitigate the impact. A company can deploy a multi-cloud strategy and shift traffic away from vendors that are experiencing issues. Another key learning point is need for a crisis communication plan. By proactively communicating when and why an outage occurs, and what mitigating steps are being taken, a brand can come out stronger after an outage. If a problem happens, a quick response, clear communication, and transparency can increase the trustworthiness of a brand in the eye of consumers. – Carmen Carey, CEO of Apica
Contingency Planning - The cloud infrastructures are so stable that we are forgetting about what we do if there is a problem. It’s like your home internet which works pretty well 99% of the time, and then once in a while you have a disaster. For this reason, Apple and Starbucks no longer rely on one public cloud provider. The other thought that jumps out at me is the memory of the financial institutions that almost took down the U.S. economy during the financial crisis of 2007-2008. I wonder, moving forward, whether this similarly opens the door, as more people go to the cloud, for cloud services to be regulated—although I hate the thought of regulation. – Michael Corey, Business & Technology Advisor
Multi-Cloud Strategies - All businesses need a multi-cloud strategy so they can adapt quickly when one of their cloud vendors experiences a failure. – Chip Childers, CTO, Cloud Foundry
HA and DR - This is another example of the need for high availability and disaster recovery systems that protect businesses, whether large or small. And, while industry leaders are often thought of as being invulnerable, they too are susceptible to issues such as failed migrations and upgrades, system failures, and human error. There are plenty of solutions available to protect businesses from unplanned events in on-premise, cloud, and hybrid environments. – Edward Vesely, EVP and CMO of Vision Solutions
Independent Monitoring - Don’t rely on your third-party provider to tell you when they are down. The old adage of not putting all your eggs in one basket applies to decoupling monitoring from hosting to ensure that you get early warning and are ready to react in the case of a future service disruption. Whether your business-critical apps are hosted on AWS, Azure, GCP, or any other cloud or hosting platform, you need to be monitoring those apps. External monitoring is key to knowing immediately when something is down and to identifying which app is causing the problem. – Denis Goodwin, Product Marketing Manager at SmartBear
Plan for Outages - Companies utilizing AWS (or any cloud) for production workloads need to plan for outages just like they would on-premises. AWS has tremendous redundancy built into their infrastructure, but it is still susceptible to failure, as we witnessed. Implementing a solid DR/BC strategy when migrating to the cloud is a must. Specific to AWS, that should include a multi-region design that can help guard against these types of regionalized failures. There is a big misconception regarding the cloud and the level of redundancy it provides. It is there to take advantage of, but it needs to be understood and architected properly from the start of any engagement. – Daniel Robinson, Senior Engagement Manager, TriCore Solutions

Comments have been edited and condensed.

Newsletters

6 Key Cloud Takeaways from Amazon’s S3 Outage

White Papers

Sponsors