Microsoft has confirmed CrowdStrike’s analysis of the root cause of the July 19 global Windows outage – and outlined plans to work with anti-malware vendors to help prevent similar events in the future.
In a July 27 blog post by David Weston, Microsoft’s VP for Enterprise and OS Security, the software giant outlined four initiatives for helping anti-malware vendors roll out updates more safely:
- Offering rollout guidance, best practices, and technologies “to make it safer to perform updates to security products.”
- Reducing the need for kernel drivers to access security data.
- Providing improved isolation and anti-tampering capabilities with technologies like Virtualization-based security (VBS) enclaves.
- Enabling zero trust approaches like high integrity attestation, which can determine the security state of a machine based on the health of Windows native security features.
The Microsoft CrowdStrike response also emphasizes support for the Rust memory-safe programming language as a way “for security tools to detect and respond to emerging threats safely and securely.”
Weston’s blog post is the latest post-mortem on the faulty CrowdStrike update that brought down 8.5 million Windows machines around the world in what was possibly the largest cyber incident of all time.
Microsoft Confirms CrowdStrike Root Cause
Weston’s blog post confirms CrowdStrike’s version of the causes of the global “blue screen of death” outage before getting into Microsoft’s plans for making updates safer.
“Our observations confirm CrowdStrike’s analysis that this was a read-out-of-bounds memory safety error in the CrowdStrike developed CSagent.sys driver,” Weston wrote. Such errors “can lead to widespread availability issues when not combined with safe deployment practices.”
He said csagent.sys is a file system filter driver used by anti-malware agents to receive notifications about file operations such as the creation or modification of a file, useful for scanning downloads and other new files.
File system filters can also be used as a signal for monitoring system behavior. “CrowdStrike noted in their blog that part of their content update was changing the sensor’s logic relating to data around named pipe creation,” he wrote. “The File System filter driver API allows the driver to receive a call when named pipe activity (e.g., named pipe creation) occurs on the system that could enable the detection of malicious behavior.”
Kernel Usage Important But Not Always Necessary
Microsoft generally defended the practice of using kernel drivers for their ability to provide system-wide visibility, to load early to detect threats like boot kits and rootkits, which can load before user-mode applications, and to monitor for events like file creation, deletion, or modification. Weston said Kernel activity can also trigger call backs for drivers to decide when to block activities like file or process creations, and many vendors use drivers to collect network information in the kernel using the NDIS driver class.
Microsoft noted tamper resistance and performance benefits too, but added, “There are many scenarios where data collection and analysis can be optimized for operation outside of kernel mode and Microsoft continues to partner with the ecosystem to improve performance and provide best practices to achieve parity outside of kernel mode.”
“It is possible today for security tools to balance security and reliability,” Weston wrote. Security vendors can use “minimal sensors” that run in kernel mode for data collection and enforcement, limiting exposure to availability issues.
Other key product functionality – managing updates, parsing content, and other operations – “can occur isolated within user mode where recoverability is possible. This demonstrates the best practice of minimizing kernel usage while still maintaining a robust security posture and strong visibility.” He included this image on where those functions might run:
Best Practices for Windows Security and Stability
Weston also mentioned a number of best practices that can improve Windows security and availability, with App Control for Business and VBS memory integrity two of the more noteworthy ones.
App Control for Business (formerly Windows Defender Application Control) can be used to allow only trusted and business-critical apps. “Your policy can be crafted to deterministically and durably prevent nearly all malware and ‘living off the land’ style attacks. It can also specify which kernel drivers are allowed by your organization to durably guarantee that only those drivers will load on your managed endpoints.”
VBS offers memory integrity with a specific allow list policy to further protect the Windows kernel. “Combined with App Control for Business, memory integrity can reduce the attack surface for kernel malware or boot kits,” Weston wrote. “This can also be used to limit any drivers that might impact reliability on systems.”
Running as Standard User and using Device Health Attestation (DHA) are other important controls.
Microsoft CrowdStrike Response Could Involve MVI
Microsoft engages with third-party security vendors through the Microsoft Virus Initiative (MVI) “to define reliable extension points and platform improvements, as well as share information about how to best protect our customers.”
Presumably MVI will be involved in efforts to improve Windows reliability and availability in the wake of the CrowdStrike outage.