Web traffic protection specialist Cloudflare has issued an apology to its customers after an unspecified outage affecting its Access zero-trust platform downed multiple public-facing services, including OpenAI’s ChatGPT, outage information aggregator DownDetector and social media platform X.
The issue at Cloudflare, which is well known for its frontline cyber defensive work in blocking distributed denial of service (DDoS) attacks, first came to wider attention at around 11.20am in the UK – approximately 7.00am on the east coast of the US – and was initially described by Cloudflare as an “internal service degradation” causing intermittent impacts to some services.
At approximately 1.00pm UTC, Cloudflare was forced to take further steps affecting UK users specifically, pulling its WARP proxy tunnelling client offline, meaning users in London trying to access the internet via WARP saw their connections fail.
In an update made at 1.13pm UTC (8.13am EST), Cloudflare said: “We have made changes that have allowed Cloudflare Access and WARP to recover. Error levels for Access and WARP users have returned to pre-incident rates. We have re-enabled WARP access in London.”
A Cloudflare spokesperson told Computer Weekly: “We saw a spike in unusual traffic to one of Cloudflare’s services beginning at 11.20am UTC. That caused some traffic passing through Cloudflare’s network to experience errors. We do not yet know the cause of the spike in unusual traffic. We are all hands on deck to make sure all traffic is served without errors. After that, we will turn our attention to investigating the cause of the unusual spike in traffic.”
In a second statement issued at 3:30pm, Cloudflare said: “The root cause of the outage was a configuration file that is automatically generated to manage threat traffic. The file grew beyond an expected size of entries and triggered a crash in the software system that handles traffic for a number of Cloudflare’s services.
“There is no evidence that this was the result of an attack or caused by malicious activity. We expect that some Cloudflare services will be briefly degraded as traffic naturally spikes post incident but we expect all services to return to normal in the next few hours. A detailed explanation will be posted soon on blog.cloudflare.com.
“Given the importance of Cloudflare’s services, any outage is unacceptable. We apologise to our customers and the Internet in general for letting you down today. We will learn from today’s incident and improve,” they said.
Repeating patterns
Cloudflare’s period of downtime, albeit brief, comes in the wake of other high-profile outages at tech giants such as Amazon Web Services (AWS) and Microsoft Azure, which caused chaos at multiple downstream organisations.
Graeme Stewart, head of public sector at Check Point, said the upside of such large platforms was clear – in that their scale keeps costs low, gives smaller organisations access to enterprise-grade performance and, in Cloudflare’s case, improves accessibility to security tools. However, he added, the downside is just as clear.
“When a platform of this size slips, the impact spreads far and fast, and everyone feels it at once,” he said.
“During today’s outage, news sites, payments, public information pages and community services all froze. That was not because each organisation failed on its own. It was because a single layer they all rely on stopped responding. People saw a simple error page, but the disruption reached into the systems that hold up essential services.”
Stewart added: “From a cyber security view, this is the part that matters. Any platform that carries this much of the world’s traffic becomes a target. Even an accidental outage creates noise and uncertainty that attackers know how to use. If an incident of this scale were deliberately triggered, the disruption would spread across countries that use these platforms to communicate with the public and deliver essential services.”
Once again, said Stewart, users were paying the price for a lack of choice in the industry and the concentration of huge amounts of global traffic into a few large providers.
“Large platforms bring benefits, but events like today show the cost of that decision. Until there is real diversity and redundancy in the system, each outage will hit people harder than it should,” he said.
This article was edited at 3.35pm on Tuesday 18 November to incorporate additional information about the outage from Cloudflare.
