What Led To The 15-Hour AWS Outage — And Could It Happen Again?
Times Now
This Monday became a rather tedious one for large platforms like Snapchat, Canva, Roblox, and more due to an Amazon Web Services outage. The impact of the same was such that DownDetector, a website that keeps
track of outages, revealed that they received around 11 million user complaints in total on Monday. The outage for sure hampered the lives of a lot of people as apps from different niches like editing, food delivery, gaming, streaming, and some financial apps were not working as well.Talking specifically about the DownDetector stats, around 15000 AWS users were affected due to the outage. AWS status page stated, 'We can confirm increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region.' The outage began at 12:30PM according to the Indian Standard Time and first affected platforms like Snapchat, Canva, Fortnite, Roblox, and more.
What Was The Reason Behind Amazon Web Service Outage?
The main issue was due to an error in Amazon's EC2 internal network that affected SQS, DynamoDB, AWS, and Amazon Connect services. The company said, 'The root cause is an underlying internal subsystem responsible for monitoring the health of our network load balancers.' Amazon said, 'We have identified a potential root cause for error rates for the DynamoDB APIs in the US-EAST-1 Region. Based on our investigation, the issue appears to be related to DNS resolution of the DynamoDB API endpoint in US-EAST-1. We are working on multiple parallel paths to accelerate recovery. This issue also affects other AWS Services in the US-EAST-1 Region. Global services or features that rely on US-EAST-1 endpoints, such as IAM updates and DynamoDB Global tables, may also be experiencing issues. During this time, customers may be unable to create or update Support Cases. We recommend customers continue to retry any failed requests. We will continue to provide updates as we have more information to share, or by 2:45 AM.'Also Read:Canva, Snapchat, AWS, Roblox, and More Face Massive Outage In India And Globally: Full List of Services That Went Down
Could It Happen Again?
See, Amazon will be more cautious about an error so big that it stopped half the world's most popular apps. But one thing that the readers should know is that the risk can never be completely eliminated because complexities and human errors exist in the same dimension. What this will trigger is that a lot of firms or platforms that rely on AWS will look for multi-region and multi-cloud options to reduce dependency.