As one of the most widely used hosting platforms with a market share of more than 40 percent, Amazon Web Services (AWS) took a big hit yesterday afternoon due to widespread outages in Amazon S3, a web-based storage solution providing object repositories, backup & recovery, and cloud services.
Increased error rates with Amazon S3 caused numerous websites and web applications to experience outages making stored content unreachable and services unusable for nearly 5 hours. Issues appeared to be coming from AWS’s data center in Ashburn, Virginia, impacting the US eastern region of Amazon S3.
Affected websites and services included Slack, Giphy, Alexa, Wix websites, Adobe Cloud, Expedia, as well as IoT hardware such as connected lightbulbs, Nest thermostats, Amazon Echo, and more.
AWS spent most of yesterday repairing Amazon S3’s service health dashboard. Ironically, the AWS health dashboard relies on Amazon S3 for storage of its health marker graphics, which caused the status indicators to misrepresent the actual health of the impacted services for part of the day. With a 5-hour recovery time, Amazon S3 was operating as normal by 2 PM EST following the full recovery in retrieval, listing and deletion of existing objects.
According to SimilarTech, nearly 148,213 websites and 121,761 domains rely upon Amazon S3 as a content hosting provider. With the fallibility of public cloud service, 100% availability cannot be guaranteed. In response to this outage, Amazon is making efforts to encourage customers to purchase Hybrid IT.
"The question should never be ‘If we have an outage’ but ‘when we have an outage.’ Everyone goes down eventually," says Rachel Bair, Director of Hosting for Unleashed. "Amazon is a very solid hosting provider, and has more money invested in its infrastructure than almost any of their competitors. The reality is that every provider can go down. The lesson we all need to learn is how well are our systems set up for a critical failure and what does our contingency plan, or lack thereof cost?"
Unleashed believes there is great importance in redundancy for all mission critical technology systems. Our team encourages users to take the following steps: 1) analyze how your infrastructure is currently set up, 2) consider how a true outage will affect your business, and 3) determine how long an outage can be sustained before its catastrophic. These can be very complicated and confusing questions to answer. Should you have any questions about Cloud services or Disaster Recovery planning for your organization, our hosting team is happy to help!