Cloudflare Discloses Technical Details Behind Massive Outage That Breaks The Internet

Cloudflare published a comprehensive report detailing the causes of a major network failure that disrupted global internet traffic for several hours, affecting millions of users and various services.

The outage, which began at 11:20 UTC, stemmed from an internal configuration error rather than any cyber threat, underscoring the vulnerabilities in even the most robust cloud infrastructures.

This incident echoes recent disruptions at competitors like Microsoft Azure and Amazon Web Services, raising alarms about the fragility of global digital reliance.

Cloudflare’s troubles stemmed from a routine update to permissions in its ClickHouse database cluster, intended to enhance security for distributed queries.

At 11:05 UTC, the change made underlying table metadata in the ‘r0’ database visible to users, but a Bot Management query failed to account for this, pulling duplicate column data and bloating a critical feature file to double its expected size.

This file, refreshed every five minutes to combat evolving bot threats via machine learning, overwhelmed the software’s hardcoded limit of 200 features, triggering panics in the core proxy system known as FL.

google

Initially mistaken for a massive DDoS attack coinciding with the downtime of Cloudflare’s external status page, the fluctuating failures puzzled investigators as good and bad files alternated during the cluster’s gradual rollout.

The Bot Management module, essential for scoring automated traffic, halted request processing, cascading errors through the network. In the newer FL2 proxy, this caused outright 5xx HTTP errors; older FL versions defaulted bot scores to zero, potentially blocking legitimate traffic for customers using bot-blocking rules.

The blackout hit core services hard, delivering error pages to users accessing Cloudflare-protected sites and spiking latency due to resource-intensive debugging.

Turnstile CAPTCHA failed entirely, blocking logins; Workers KV saw elevated errors, indirectly crippling dashboard access and authentication via Cloudflare Access.

Email Security temporarily lost some spam detection, though no major customer data was compromised, and configuration updates lagged. By 17:06 UTC, full recovery was achieved after halting bad-file propagation, rolling back to a known-good version, and restarting the proxies.

Cloudflare’s CEO, Matthew Prince, expressed sincere apologies, describing the incident as “deeply painful” and unacceptable for a major internet service provider. The company identified this as its worst core traffic outage since 2019.

Massive Cloud Giants Outage

This incident highlights a concerning trend of failures related to configuration issues among major cloud providers.

Just weeks prior, on October 29, 2025, Azure suffered a global outage from a buggy tenant change in its Front Door CDN, disrupting Microsoft 365, Teams, and Xbox for hours and affecting airlines like Alaska.

Similarly, AWS endured a 15-hour blackout on October 20 in its US-East-1 region, where DNS issues in DynamoDB rippled to EC2, S3, and services like Snapchat and Roblox.

A smaller AWS e-commerce hiccup hit Amazon.com on November 5, stalling checkouts amid holiday prep. Experts warn these incidents highlight over-dependence on centralized providers, where single missteps can “break the internet” repeatedly in 2025.

To prevent future incidents, Cloudflare is strengthening its file ingestion processes to guard against malformed inputs. They are also implementing global kill switches, reducing the overload of error reports, and reviewing proxy failure modes.

Although the outage was not caused by malicious intent, it serves as a clear reminder that as cloud ecosystems expand, the importance of operational precision also increases.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews

Source link

Search

Massive Cloud Giants Outage

Latest Posts