Amazon AWS
Photo credit: Amazon

Amazon Web Services confirmed its systems were recovering following a major outage that disrupted dozens of websites and applications including Facebook, Snapchat, Amazon, Coinbase and Robinhood, reports The Wall Street Journal, which was among several media organisations affected by the outage.

The outage, which began around 3:00 AM Eastern Time, affected major retailers, airlines, social media applications, financial services companies and productivity tools across the AWS US-EAST-1 region centred around Northern Virginia. Sites including Slack, United Airlines, AI tool Perplexity and videogames Fortnite and Roblox experienced disruptions.

AWS traced the problem to its DynamoDB system, which provides websites with database storage and computing power. The service has more than one million customers across retail, financial services, media and entertainment sectors, with clients including Disney+, Zoom, Airbnb, Lyft, Dropbox and Nike.

The company identified the root cause at 2:01 AM Pacific Daylight Time as a DNS resolution issue affecting the DynamoDB API endpoint in US-EAST-1. AWS stated it was “working on multiple parallel paths to accelerate recovery” with the issue also affecting other services in the region.

Early signs of recovery

Engineers applied initial mitigations at 2:22 AM PDT with early signs of recovery appearing for some impacted services. By 3:35 AM PDT, AWS confirmed “the underlying DNS issue has been fully mitigated, and most AWS Service operations are succeeding normally now.”

However, requests to launch new EC2 instances and services that launch EC2 instances such as ECS continued experiencing increased error rates. AWS recommended customers configure EC2 instance launches without targeting specific Availability Zones and that Auto Scaling Groups be configured to use multiple zones.

Some services continued working through backlogs of events, including CloudTrail and Lambda, following initial recovery. AWS reported elevated polling delays for Lambda Event Source Mappings for SQS, affecting features depending on Lambda’s SQS polling capabilities including Organisation policy updates.

Global services and features relying on US-EAST-1 endpoints, including IAM updates and DynamoDB Global Tables, also experienced issues during the outage before recovering at 3:03 AM PDT.

The AWS infrastructure underpins millions of websites and platforms, providing cloud computing services such as servers and storage to major companies globally. The service is the largest cloud computing provider in the United States.

Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like

Super-intelligent AI could ‘play dumb’ to trick evaluators and evade controls

The dream of an AI-integrated society could turn into a nightmare if…

Satellite dataset uses deep learning to map 9.2 million kilometres of roads

Researchers have combined deep-learning models with high-resolution satellite imagery to classify 9.2…

Universities quietly deploying GenAI to ‘game’ £2bn research funding system

UK universities are widely using generative AI to prepare submissions for the…

AI guardrails defeated by poetry as ‘smarter’ models prove most gullible

The world’s most advanced artificial intelligence systems are being easily manipulated into…

Researchers hijack X feed with ad blocker tech to cool political tempers

Scientists have successfully intercepted and reshaped live social media feeds using ad-blocker-style…

Doing good buys forgiveness as CSR becomes ‘insurance’ against layoffs

Companies planning to slash jobs or freeze pay should start saving the…