Inside The Cloudflare Outage Of November 18, 2025

A major Cloudflare outage struck on 18 November 2025, beginning at 11:20 UTC and spreading across its global network within minutes. Although the issue initially looked like a large-scale Cloudflare cyberattack, it was later confirmed to be an internal configuration error that disrupted company’s core traffic-routing systems.

According to Cloudflare, the disruption began when one of the company’s database systems generated incorrect data and published it across the network. The problem stemmed from altered permissions in a ClickHouse database cluster, which inadvertently caused the system to output duplicate rows into a “feature file” used by Cloudflare’s Bot Management module. The feature file, normally stable in size, doubled unexpectedly. Once this oversized file propagated across Cloudflare’s machines, the software responsible for distributing global traffic encountered a hard limit and failed.

This internal malfunction translated into widespread HTTP 5xx errors for users trying to reach websites that rely on Cloudflare’s network. A screenshot shared by the company showed the generic error page millions of users saw during the outage.

Cloudflare initially suspected that the symptoms resembled a hyper-scale DDoS attack, a concern shaped partly by recent “Aisuru” attack campaigns, raising fears of a potential cyberattack on Cloudflare. The company later clarified that “the issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.” Once engineers discovered the faulty feature file, they halted its propagation and reinserted an earlier, stable version.

Core traffic began recovering by 14:30 UTC, and Cloudflare reported full restoration of all systems by 17:06 UTC. “Given Cloudflare’s importance in the Internet ecosystem, any outage of any of our systems is unacceptable,” the company wrote, noting that the incident was “deeply painful to every member of our team.”

Why the System Failed During the Cloudflare Outage

The root cause of the Cloudflare outage originated with a permissions change applied at 11:05 UTC. Cloudflare engineers were in the process of improving how distributed queries run in ClickHouse. Historically, internal processes assumed that metadata queries returned results only from the “default” database. The new permissions change allowed these queries to also surface metadata from the underlying “r0” database.

A machine learning–related query, used to build the Bot Management feature configuration file, combined metadata from both locations without filtering database names. The oversight caused the file to double in size as duplicate features were added. Bot Management modules preallocate memory based on a strict feature limit of 200 entries; the malformed file exceeded this threshold, triggering a Rust panic within the proxy system.

Because Cloudflare’s core proxy (called FL, or “Frontline”) touches nearly every request on the network, the failure cascaded quickly. The newer version of the proxy system, FL2, also encountered 5xx errors. Legacy FL systems did not crash, but they produced invalid bot scores, defaulting everything to zero and potentially leading to false positives for customers who blocked bot traffic.

Systems Impacted

The Cloudflare outage disrupted multiple services:

Core CDN and security services returned widespread HTTP 5xx errors.

Turnstile, Cloudflare’s verification system, failed to load, preventing many users from logging into the Cloudflare dashboard.

Workers KV experienced a sharp increase in error rates until engineers applied a bypass patch at 13:04, stabilizing dependent services.

Cloudflare Access experienced authentication failures from the start of the incident. Existing sessions remained valid, but new attempts failed and returned error pages.

Email Security continued processing email but temporarily lost access to an IP reputation source, slightly reducing spam-detection accuracy.

Cloudflare also noted latency spikes across its CDN during the incident as debugging and observability tools consumed excess CPU while attempting to analyze the errors.

Complicating the investigation further, Cloudflare’s external status page briefly went offline, despite being completely hosted outside Cloudflare’s network, adding to internal suspicion that an attacker might be targeting multiple systems simultaneously. This coincidence reinforced early fears of a potential Cloudflare cyberattack, though this theory was later dismissed.

Post-Incident Actions and Next Steps

After restoring service, Cloudflare implemented a series of fixes, strengthening configuration protection, improving kill-switch controls, refining proxy error-handling, and preventing diagnostic tools from overwhelming system resources. The company described the event as its most serious outage since 2019, noting that while it briefly raised concerns about a potential cyberattack on Cloudflare, the root cause was purely internal.

Events like this highlight the value of proactive threat intelligence. Cyble, ranked #1 globally in Cyber Threat Intelligence Technologies on Gartner Peer Insights, provides AI-native, autonomous threat detection and attack-surface visibility. To assess your organization’s exposure and strengthen resilience, book a personized demo or start a free External Threat Assessment today.

Source link

Search

Why the System Failed During the Cloudflare Outage

Systems Impacted

Post-Incident Actions and Next Steps

Related

Latest Posts