VLLM Vulnerability Enables Remote Code Execution Via Malicious Payloads

A critical memory corruption vulnerability in vLLM versions 0.10.2 and later allows attackers to achieve remote code execution through the Completions API endpoint by sending maliciously crafted prompt embeddings.

The vulnerability resides in the tensor deserialization process within vLLM’s entrypoints/renderer.py at line 148.

When processing user-supplied prompt embeddings, the system loads serialized tensors using torch.load() without adequate validation checks.

The Vulnerability Explained

A change introduced in PyTorch 2.8.0 disabled sparse tensor integrity checks by default, creating an attack vector for malicious actors.

Without proper validation, attackers can craft tensors that bypass internal bounds checks, triggering an out-of-bounds memory write during the to_dense() conversion.

This memory corruption can cause the vLLM server to crash and potentially enable arbitrary code execution within the server process.

google

Attribute	Details
CVE ID	CVE-2025-62164
Severity	High
CVSS Score	8.8/10
Affected Product	vLLM (pip)
Affected Versions	≥ 0.10.2

This vulnerability affects all deployments running vLLM as a server, particularly those deserializing untrusted or model-provided payloads.

Any user with API access can exploit this flaw to achieve denial-of-service conditions and potentially gain remote code execution capabilities.

The attack requires no special privileges, making it accessible to both authenticated and unauthenticated users, depending on the API configuration.

Organizations using vLLM in production environments, cloud deployments, or shared infrastructure face significant risk, as successful exploitation could compromise the entire server and adjacent systems.

The vLLM project has addressed this vulnerability in pull request #27204. Users should immediately upgrade to the patched version.

As a temporary mitigation, administrators should restrict API access to trusted users only and implement input validation layers that inspect prompt embeddings before they reach the vLLM processing pipeline.

The vulnerability was discovered and responsibly disclosed by the AXION Security Research Team, highlighting the importance of coordinated vulnerability disclosure in the AI infrastructure ecosystem.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews

Source link

Search

The Vulnerability Explained

Latest Posts