CrowdStrike released a Preliminary Post Incident Review (PIR) on the faulty Falcon update explaining that a bug allowed bad data to pass its Content Validator and cause millions of Windows systems to crash on July 19, 2024.
The cybersecurity company explained that the issue was caused by a problematic content configuration update meant to gather telemetry on new threat techniques.
After passing the Content Validator, the update didn’t go through additional verifications due to trust in previous successful deployments of the underlying Inter-Process Communication (IPC) Template Type. Therefore, it wasn’t caught before it reached online hosts running Falcon version 7.11 and later.
The company realized the error and reverted the update within an hour.
However, by then, it was too late. Approximately 8.5 million Windows systems, if not more, suffered an out-of-bounds memory read and crashed when the Content Interpreter processed the new configuration update.
Inadequate testing
CrowdStrike uses configuration data called IPC Template Types that allows the Falcon sensor to detect suspicious behavior on devices where the software is installed.
IPC Templates are delivered through regular content updates that CrowdStrike calls ‘Rapid Response Content. ‘ This content adjusts the sensor’s detection capabilities to find new threats without requiring full updates by simply changing its configuration data.
In this case, CrowdStrike attempted to push a new configuration to detect malicious abuse of Named Pipes in common C2 frameworks.
While CrowdStrike has not specifically named the C2 frameworks it targeted, some researchers believe the update attempted to detect new named pipe features in Cobalt Strike. BleepingComputer contacted CrowdStrike on Monday about whether Cobalt Strike detections caused the issues but did not receive a response.
According to the company, the new IPC Template Type and the corresponding Template Instances tasked with implementing the new configuration were thoroughly tested using stress testing techniques.
These tests include resource utilization, system performance impact, event volume, and adverse system interactions.
The Content Validator, a component that checks and validates Template Instances, checked and approved three individual instances, which were pushed on March 5, April 8, and April 24, 2024, without a problem.
On July 19, two additional IPC Template Instances were deployed, with one containing the faulty configuration, which the Content Validator missed due to a bug.
CrowdStrike says that due to baseline trust from the previous tests and successful deployments, no additional testing like dynamic checks was performed, so the bad update reached clients, causing the massive global IT outage.
New measures
CrowdStrike is implementing several additional measures to prevent similar incidents in the future.
Specifically, the firm listed the following additional steps when testing Rapid Response Content:
- Local developer testing
- Content update and rollback testing
- Stress testing, fuzzing, and fault injection
- Stability testing
- Content interface testing
Moreover, additional validation checks will be added to the Content Validator, and error handling in the Content Interpreter will be improved to avoid such mistakes leading to inoperable Windows machines.
In what concerns Rapid Response Content deployment, the following changes are planned:
- Implement a staggered deployment strategy, starting with a small canary deployment before gradually expanding.
- Improve monitoring of sensor and system performance during deployments, using feedback to guide a phased rollout.
- Provide customers with more control over the delivery of Rapid Response Content updates, allowing them to choose when and where updates are deployed.
- Offer content update details via release notes, which customers can subscribe to for timely information.
Crowdstrike has promised to publish a more detailed root cause analysis post in the future, and more details will become available after the internal investigation is completed.