Critical RCE Vulnerabilities in AI inference Engines Exposes Meta, Nvidia and Microsoft Frameworks

Critical RCE Vulnerabilities in AI inference Engines Exposes Meta, Nvidia and Microsoft Frameworks

As artificial intelligence infrastructure rapidly expands, critical security flaws threaten the backbone of enterprise AI deployments.

Security researchers at Oligo Security have uncovered a series of dangerous Remote Code Execution (RCE) vulnerabilities affecting major AI frameworks from Meta, NVIDIA, Microsoft, and PyTorch projects, including vLLM and SGLang.

The vulnerabilities, collectively termed “ShadowMQ,” stem from the unsafe implementation of ZeroMQ (ZMQ) communications combined with Python’s pickle deserialization.

What makes this threat particularly alarming is how it spread across the AI ecosystem through code reuse and copy-paste development practices.

How the Vulnerability Spread Across Frameworks

The investigation began in 2024 when researchers analyzed Meta’s Llama Stack and discovered the dangerous use of ZMQ’s recv_pyobj() method, which deserializes data using Python’s pickle module.

ShadowMQ Vulnerability CVE Data Table

CVE ID Product Severity CVSS Score Vulnerability Type
CVE-2024-50050 Meta Llama Stack Critical 9.8 Remote Code Execution
CVE-2025-30165 vLLM Critical 9.8 Remote Code Execution
CVE-2025-23254 NVIDIA TensorRT-LLM Critical 9.3 Remote Code Execution
CVE-2025-60455 Modular Max Server Critical 9.8 Remote Code Execution
N/A (Unpatched) Microsoft Sarathi-Serve Critical 9.8 Remote Code Execution
N/A (Incomplete Fix) SGLang Critical 9.8 Remote Code Execution

This configuration created unauthenticated network sockets that could execute arbitrary code during deserialization, enabling remote attackers to compromise systems.

google

After Meta patched the vulnerability (CVE-2024-50050), Oligo researchers found identical security flaws across multiple frameworks.

NVIDIA’s TensorRT-LLM, PyTorch projects vLLM and SGLang, and Modular’s Max Server all contained nearly identical vulnerable patterns.

Oligo Code analysis revealed that entire files were copied between projects, spreading the security flaw like a virus. These AI inference servers power critical enterprise infrastructure, processing sensitive data across GPU clusters.

Organizations trusting SGLang include xAI, AMD, NVIDIA, Intel, LinkedIn, Oracle Cloud, Google Cloud, Microsoft Azure, AWS, MIT, Stanford, UC Berkeley, and numerous other major technology companies.

Successful exploitation could allow attackers to execute arbitrary code, escalate privileges, exfiltrate model data, or install cryptocurrency miners.

Oligo researchers identified thousands of exposed ZMQ sockets communicating unencrypted over the public internet. However, Microsoft’s Sarathi-Serve and SGLang remain vulnerable with incomplete fixes.

Organizations should immediately update to patched versions, avoid using pickle with untrusted data, implement authentication for ZMQ communications, and restrict network access to ZMQ endpoints.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews



Source link