VLLM RCE Exposes Millions Of AI Servers

A newly disclosed security flaw has placed millions of AI servers at risk after researchers identified a critical vulnerability in vLLM, a widely deployed Python package for serving large language models.

The issue, tracked as CVE-2026-22778 (GHSA-4r2x-xpjr-7cvv), enables remote code execution (RCE) by submitting a malicious video URL to a vulnerable vLLM API endpoint. The vulnerability affects vLLM versions 0.8.3 through 0.14.0 and was patched in version 0.14.1.

The disclosure was released as breaking news and is still developing, with additional technical details expected as the investigation continues. Due to vLLM’s scale of adoption, reportedly exceeding three million downloads per month, the impact of CVE-2026-22778 is considered severe.

What Is vLLM and Why CVE-2026-22778 Matters

vLLM is a high-throughput, memory-efficient inference engine designed to serve large language models efficiently in production environments. It is commonly used to address performance bottlenecks associated with traditional LLM serving, including slow inference speeds, poor GPU utilization, and limited concurrency. Compared to general-purpose local runners such as Ollama, vLLM is frequently deployed in high-load environments where scalability and throughput are critical.

Because vLLM is often exposed through APIs and used to process untrusted user input, vulnerabilities like CVE-2026-22778 increase the attack surface. Any organization running vLLM with video or multimodal model support enabled is potentially affected. OX customers identified as vulnerable were notified and instructed to update their deployments.

Impact: Full Server Takeover via Remote Code Execution

CVE-2026-22778 allows attackers to achieve RCE by sending a specially crafted video link to a vLLM multimodal endpoint. Successful exploitation can result in arbitrary command execution on the underlying server. From there, attackers may exfiltrate data, pivot laterally within the environment, or fully compromise connected systems.

The vulnerability does not require authentication beyond access to the exposed API, making internet-facing deployments particularly at risk. Because vLLM is commonly used in clustered or GPU-backed environments, the blast radius of a single exploited instance may extend well beyond one server.

Technical Analysis

The root cause of CVE-2026-22778 is a chained exploit combining an information disclosure bug with a heap overflow that ultimately leads to remote code execution. According to OX Security, the first stage involves bypassing ASLR protections through memory disclosure. When an invalid image is submitted to a multimodal vLLM endpoint, the Python Imaging Library (PIL) raises an error indicating it cannot identify the image file.

In vulnerable versions, this error message includes a heap memory address. That address is located before libc in memory, reducing the ASLR search space and making exploitation more reliable. The patched code sanitizes these error messages to prevent leaking heap addresses.

With the leaked address available, the attacker proceeds to the second vulnerability. vLLM relies on OpenCV for video decoding, and OpenCV bundles FFmpeg 5.1.x. That FFmpeg release contains a heap overflow flaw in its JPEG2000 decoder.

JPEG2000 images use separate buffers for color channels: a large buffer for the Y (luma) channel and smaller buffers for the U and V (chroma) channels. The decoder incorrectly trusts the image’s cdef (channel definition) box, allowing channels to be remapped without validating buffer sizes. This means large Y channel data can be written into a smaller U buffer.

Because the attacker controls both the image geometry and the channel mapping, they can precisely control how much data overflows and which heap objects are overwritten. By abusing internal JPEG2000 headers and crafting specific channel values, the overflow can overwrite adjacent heap memory, including function pointers. Execution can then be redirected to a libc function such as system(), resulting in full RCE.

Affected Versions and Recommended Actions

The following vLLM Python package versions are affected:

Affected versions: vLLM >= 0.8.3 and < 0.14.1
Fixed version: vLLM 0.14.1

Organizations are strongly advised to update immediately to vLLM 0.14.1, which includes an updated OpenCV release addressing the JPEG2000 decoder flaw. If upgrading is not immediately feasible, disabling video model functionality in production environments is recommended until patching can be completed.

CVE-2026-22778 demonstrates how vulnerabilities in third-party media processing libraries can cascade into critical RCE flaws in AI infrastructure. For teams operating vLLM at scale, prompt remediation and careful review of exposed multimodal endpoints are essential to reducing risk.

Source link