AWS Declares Major Outage Resolved After Nearly 24 Hours of Disruption

AWS Declares Major Outage Resolved After Nearly 24 Hours of Disruption

Amazon Web Services (AWS), the world’s largest cloud computing provider, has officially marked a widespread outage in its US-EAST-1 region as resolved, following nearly a full day of cascading failures that disrupted services for millions worldwide.

The incident, which began late on October 19, 2025, and persisted until early afternoon on October 20, highlighted the fragility of global internet infrastructure reliant on AWS’s backbone.

By 3:01 PM PDT, AWS confirmed all services had returned to normal operations, though some backlogs in data processing for tools like AWS Config and Redshift were expected to clear within hours.

The outage originated from DNS resolution issues affecting the DynamoDB API endpoint in the US-EAST-1 region, AWS’s busiest data center in Northern Virginia.

At 11:49 PM PDT on October 19, elevated error rates and latencies emerged across multiple services, initially pinpointed to DynamoDB, a core database service powering everything from user data to application backends.

Engineers identified the root cause by 12:26 AM PDT on October 20, linking it to a faulty DNS update that prevented applications from locating server IP addresses, akin to a broken phonebook for the internet.

google

This failure triggered a domino effect: EC2 instance launches stalled due to DynamoDB dependencies, Network Load Balancer health checks failed, and connectivity broke for services like Lambda, SQS, and CloudWatch.​

100+ AWS Services Impacted

The blast radius was immense, impacting over 100 AWS services and spilling over to consumer-facing platforms.

Popular apps such as Snapchat, Fortnite, Roblox, and Coinbase went offline, with users unable to log in or access features amid surging complaints on DownDetector.

Gaming services like Epic Games’ Fortnite reported server downtimes, while financial platforms including Venmo and banking apps from Lloyds and Halifax in the UK faced login hurdles.

Even Amazon’s own ecosystem suffered Prime Video buffering spiked, Ring doorbells lost remote access, and e-commerce checkouts faltered.

AI startup Perplexity attributed its disruptions directly to the AWS issue, with CEO Aravind Srinivas noting on X that funds remained safe but access was blocked.

Government agencies, airlines like Delta, and media outlets including Disney+ and The New York Times also logged interruptions, underscoring AWS’s 33% market dominance in cloud infrastructure.​

Critics pointed to the 75-minute diagnostic delay and initial “all clear” status page messages as transparency shortfalls, echoing past AWS critiques on outage notifications.

No cyberattack was suspected; it stemmed from an internal update error in a foundational service.

AWS Response

AWS’s response involved parallel mitigations: flushing DNS caches, throttling EC2 launches to stabilize subsystems, and scaling up polling rates for SQS queues tied to Lambda.

By 2:24 AM PDT, the core DynamoDB DNS fix was deployed, yielding early recovery signs, though network issues lingered into the morning.

Temporary throttles on operations like asynchronous Lambda invocations helped prioritize critical paths, with full EC2 launch restoration by 2:48 PM PDT.

Global features dependent on US-EAST-1, such as IAM updates and DynamoDB Global Tables, also rebounded, allowing support case creations to resume.

AWS promised a detailed post-incident summary, emphasizing ongoing backlog processing for analytics in Connect and Redshift.

Experts like those at ThousandEyes noted no external network anomalies, confirming the issue’s internal nature and rapid recovery post-mitigation.

As services return to normal, affected users should try their operations again and check the AWS Health Dashboard for updates.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews





Source link

About Cybernoz

Security researcher and threat analyst with expertise in malware analysis and incident response.