In a recent preliminary Post-Incident Review (PIR), cybersecurity firm CrowdStrike provided a detailed account of the events that led to a massive global IT outage on July 19, 2024.
The incident affected millions of Windows systems worldwide and was traced back to a problematic Rapid Response Content configuration update released as part of the Falcon platform’s regular operations.
On Friday, July 19, 2024, at 04:09 UTC, CrowdStrike issued a content configuration update for its Windows sensor. This update aimed to gather telemetry on potential novel threat techniques but inadvertently caused Windows systems running sensor version 7.11 and above to crash.
The crash resulted in the infamous Blue Screen of Death (BSOD) on affected systems. The incident was isolated to online Windows hosts, and the update was received between 04:09 UTC and 05:27 UTC. Mac and Linux hosts were not impacted.
By 05:27 UTC, CrowdStrike had identified the defect and reverted the problematic update. Systems that came online after this time or did not connect during the affected window were not impacted.
Join our free webinar to learn about combating slow DDoS attacks, a major threat today.
What Went Wrong and Why?
CrowdStrike’s Falcon platform utilizes two types of security content configuration updates: Sensor Content and Rapid Response Content.
Sensor Content includes on-sensor AI and machine learning models, which are part of the sensor release and undergo extensive quality assurance processes, including automated and manual testing. These updates are not dynamically updated from the cloud and are controlled by customers through Sensor Update Policies.
Rapid Response Content, on the other hand, is designed to respond quickly to emerging threats and is dynamically updated. It involves behavioral pattern-matching operations and is delivered as Template Instances that configure the sensor’s runtime behavior.
The issue on July 19, 2024, stemmed from a Rapid Response Content update containing an undetected error. Specifically, a bug in the Content Validator allowed a problematic Template Instance to pass validation and be deployed. When the sensor received this instance, it caused an out-of-bounds memory read, leading to a system crash.
Timeline of Events
- February 28, 2024: Sensor version 7.11, introducing a new IPC Template Type to detect attacks using Named Pipes, was released.
- March 05, 2024: The IPC Template Type passed stress tests and was validated for use.
- March 05 – April 24, 2024: Several IPC Template Instances were successfully deployed.
- July 19, 2024: Two additional IPC Template Instances were deployed, one of which contained problematic content data due to a validation bug.
The incident had a widespread impact, affecting over 8.5 million Windows users globally, including critical sectors such as banking, healthcare, and emergency services. CrowdStrike’s shares dropped nearly 30%, and the company faced significant scrutiny from customers and regulatory bodies.
CrowdStrike has since implemented several measures to prevent similar incidents in the future. These include:
- Enhancing the Content Validator with additional checks.
- Improving testing mechanisms for Rapid Response Content, including local developer testing, rollback testing, stress testing, and fault injection techniques.
- Implementing a staggered deployment strategy, known as canary deployment, to test updates on a smaller scale before a full rollout.
- Providing customers with greater control over content update delivery and detailed release notes.
Statement from CrowdStrike CEO
George Kurtz, CrowdStrike’s Founder and CEO, apologized to all affected customers and partners, emphasizing the company’s commitment to transparency and continuous improvement. He assured that the Falcon platform’s core systems were not compromised and that the company is fully mobilized to restore customer systems and prevent future disruptions.
CrowdStrike’s detailed PIR explains the technical reasons behind the July 19, 2024, incident and outlines the steps being taken to enhance the reliability and security of its Rapid Response Content updates.
The forthcoming Root Cause Analysis will provide further insights and recommendations to ensure the stability and security of CrowdStrike’s services.
Protect Your Business Emails From Spoofing, Phishing & BEC with AI-Powered Security | Free Demo