Cloudflare Details 1.1.1.1 Service Outage Incident


On June 27, 2024, Cloudflare experienced a disruption of its 1.1.1.1 DNS resolver service.

This several-hour incident was caused by a combination of BGP (Border Gateway Protocol) hijacking and a route leak.

The event led to a noticeable impact on users globally, with some unable to reach the service and others experiencing high latency.

Incident Timeline

The incident began at 18:51 UTC when AS267613 (Eletronet) started announcing the 1.1.1.1/32 prefix to its peers and upstream providers.

This unauthorized announcement caused traffic meant for Cloudflare’s 1.1.1.1 DNS resolver to be misrouted.

Shortly after, at 18:52 UTC, AS262504 (Nova) leaked the 1.1.1.0/24 prefix upstream to AS1031 (Peer-1 Global Internet Exchange).

This leak was further propagated by AS1031, significantly widening the impact of the incident.

Cloudflare’s internal monitoring systems detected the issue at 20:03 UTC, prompting immediate action.

By 20:08 UTC, Cloudflare had disabled a partner peering location with AS267613 and engaged with the network to address the problem.

Join our free webinar to learn about combating slow DDoS attacks, a major threat today.

02:28 UTC resolved the route leak on June 28, 2024, when AS262504 ceased the unauthorized announcements.

However, the impact on users persisted throughout the incident, with some experiencing high latency and others unable to reach the 1.1.1.1 service altogether.

Technical Analysis

BGP Hijacking and Route Leaks

The incident’s root cause was a mix of BGP hijacking and a route leak.

BGP hijacking occurs when a network announces IP prefixes it does not own, causing traffic to be misrouted.

In this case, AS267613 announced the 1.1.1.1/32 prefix, which multiple networks, including at least one Tier 1 provider accepted.

This led to the blackholing of traffic destined for 1.1.1.1.

Route leaks, on the other hand, occur when a network incorrectly announces prefixes it has learned from one provider to another provider.

AS262504 leaked the 1.1.1.0/24 prefix to AS1031, which propagated it further, exacerbating the impact.

The incident affected users in various countries, including Germany and the United States.

Some users could not reach the 1.1.1.1 DNS resolver, while others experienced high latency.

The impact on the overall percentage of users was relatively low, with less than 1% affected in countries like the UK and Germany.

However, the disruption was significant for those who relied on the 1.1.1.1 service for DNS resolution.

To mitigate the impact, Cloudflare disabled peering in multiple locations with AS267613 and engaged with all networks involved in the incident.

This included discussions with at least one Tier 1 transit provider that had accepted the unauthorized blackhole route.

Cloudflare is committed to improving its detection and response mechanisms for similar incidents in the future.

This includes advocating for adopting RPKI (Resource Public Key Infrastructure) for route origin and AS path validation.

RPKI helps limit the spread of hijacked BGP prefixes by allowing IP prefix owners to store and share ownership information securely.

Cloudflare continues to work within the Internet community to encourage adopting best practices for BGP security.

This includes promoting Autonomous System Provider Authorization (ASPA) objects for BGP, which helps prevent route leaks by signing AS paths with a list of authorized provider networks.

The Cloudflare 1.1.1.1 service outage on June 27, 2024, highlighted the vulnerabilities in the current BGP system.

While the actions of external networks were beyond Cloudflare’s control, the company is taking proactive steps to enhance its security measures and reduce the risk of similar incidents in the future.

Users are encouraged to check if their ISPs are enforcing RPKI origin validation.

Cloudflare remains dedicated to providing reliable and secure DNS resolution services and will continue to work towards improving the resilience of its network infrastructure.

"Is Your System Under Attack? Try Cynet XDR: Automated Detection & Response for Endpoints, Networks, & Users!"- Free Demo



Source link