First-Ever Rowhammer Attack Targeting NVIDIA GPUs
Researchers from the University of Toronto have unveiled the first successful Rowhammer attack on an NVIDIA GPU, specifically targeting the A6000 model equipped with GDDR6 memory.
Dubbed “GPUHammer” in some circles, this exploit builds on the decade-old Rowhammer vulnerability, traditionally associated with CPU-based DRAM like DDR and LPDDR.
The attack demonstrates how repeated access to adjacent memory rows can induce bit flips in nearby cells, potentially allowing unauthorized data corruption or privilege escalation.
NVIDIA has acknowledged the research, emphasizing that it reinforces existing mitigations rather than introducing new threats.
Research Exposes Vulnerabilities in GPU Memory
The study, conducted without System-Level Error-Correcting Code (SYS-ECC) enabled, highlights the risks in unmitigated environments but also confirms that activating SYS-ECC effectively neutralizes the vulnerability, underscoring the importance of robust memory protection in high-performance computing.
The Rowhammer phenomenon exploits the physical density of modern DRAM chips, where electrical interference from aggressive row activations can disturb neighboring cells, leading to unintended bit flips.
While this has been a known issue in CPU memory for years, its extension to GPUs marks a significant escalation, given the parallel processing demands of graphics accelerators.
NVIDIA’s response details how their GPU and SoC products adhere to industry standards for GDDR, LPDDR, and HBM memory, yet susceptibility varies by DRAM type, platform design, and configuration.
For instance, the attack’s success on the A6000 part of the Ampere architecture without ECC points to potential risks in workstation and data center setups where multi-tenant access could enable cross-process exploitation.
Researchers showed that enabling SYS-ECC not only detects and corrects single-bit errors but also thwarts the multi-bit flips induced by Rowhammer, providing a layered defense when combined with hardware features like On-Die ECC (OD-ECC).
NVIDIA’s Mitigation Strategies
NVIDIA strongly advocates for enabling SYS-ECC across a wide array of products to mitigate Rowhammer risks, including Blackwell-based systems like the HGX and DGX series (GB200, B200, B100), Ada architectures such as the L40S and RTX 6000, Hopper lines including H100 and H200, Ampere models like A100 and RTX A6000, Jetson devices like AGX Orin Industrial, Turing GPUs such as T4 and RTX 8000, and even Volta-era Tesla V100.
This feature is enabled by default on Hopper and Blackwell data center GPUs, offering out-of-the-box protection for enterprise environments.
Additionally, newer DRAM generations starting with DDR4, LPDDR5, HBM3, and GDDR7 incorporate OD-ECC, which operates transparently at the die level to correct internal errors and indirectly bolsters resistance to Rowhammer without user intervention.
Products supporting OD-ECC span Blackwell’s RTX 50 series and HGX platforms, as well as Hopper’s H100 and GH200, enhancing overall memory integrity amid shrinking process nodes.
For heightened security, NVIDIA recommends professional and data center-grade hardware over consumer GPUs, particularly in multi-tenant scenarios where simultaneous GPU access could facilitate attacks between users.
Risk assessment should factor in tenancy models, as single-tenant setups inherently limit exploitation opportunities.
Enabling SYS-ECC can be achieved via out-of-band methods like Redfish APIs or NVIDIA SMBPBI through the baseboard management controller, or in-band tools such as nvidia-smi for direct CPU-to-GPU configuration.
Detailed guides are available through NVIDIA’s partner portals, ensuring administrators can verify and set ECC modes efficiently.
This notice, initially released on July 10, 2025, serves as a proactive reminder amid evolving threats, encouraging users to leverage these mitigations to safeguard against Rowhammer-style vulnerabilities in GPU ecosystems.
As GPU computing powers AI and HPC workloads, such advancements in attack research drive continuous improvements in hardware resilience, keeping pace with the relentless march of memory technology.
Stay Updated on Daily Cybersecurity News. Follow us on Google News, LinkedIn, and X.
Source link