New Prompt Injection Attack Via Malicious MCP Servers Let Attackers Drain Resources

Security researchers have uncovered critical vulnerabilities in the Model Context Protocol (MCP) sampling feature.

Revealing how malicious servers can exploit LLM-integrated applications to conduct resource theft, conversation hijacking, and unauthorized system modifications.

Attack Vector	Mechanism	Impact
Resource theft	Hidden instructions in sampling requests make the LLM generate extra, non-visible content.	Drains AI compute quotas and API credits by running unauthorized workloads without the user noticing.
Conversation hijacking	Alters assistant behavior across the entire session and can degrade the usefulness or enable harmful behavior.	Alters assistant behavior across the entire session and can degrade usefulness or enable harmful behavior.
Covert tool invocation	Embedded instructions cause the LLM to call tools without explicit user awareness or consent.	Enables unauthorized file operations, persistence, and possible data exfiltration or system modification.

The Model Context Protocol, introduced by Anthropic in November 2024, standardizes how large language models integrate with external tools and data sources.

While designed to enhance AI capabilities, the protocol’s sampling feature, which allows MCP servers to request LLM completions, creates significant security risks when proper safeguards are absent.

Three Critical Attack Vectors

Paloalto researchers demonstrated three proof-of-concept attacks conducted on a widely used coding copilot:

*The user asks the copilot to help summarize the current code file*

Resource Theft: Attackers inject hidden instructions into sampling requests, causing LLMs to generate unauthorized content while appearing normal to users.

google

A malicious code summarizer, for example, appended instructions for generating fictional stories alongside legitimate code analysis. Consuming substantial computational resources and API credits without user awareness.

Conversation Hijacking: Compromised MCP servers can inject persistent instructions that affect entire conversation sessions.

*The user receives a summary of the code file as normal*

In demonstrations, hidden prompts forced AI assistants to “speak like a pirate” in all subsequent responses. Demonstrating how malicious servers fundamentally alter system behavior and potentially compromise functionality.

Covert Tool Invocation: Malicious servers leverage prompt injection to trigger unauthorized tool executions. Researchers showed how hidden instructions could trigger file-writing operations, enabling data exfiltration.

*LLM puts the malicious instruction in its response as requested by the MCP’s hidden prompt*

Persistence mechanisms and unauthorized system modifications without explicit user consent.

The vulnerability stems from MCP sampling’s implicit trust model and lack of built-in security controls.

Servers can modify prompts and responses, allowing them to slip in hidden instructions while still appearing to be normal tools.

*The copilot follows the malicious instructions that are put into the response*

Effective defense requires multiple layers: request sanitization using strict templates to separate user content from server modifications.

Response filtering to remove instruction-like phrases and access controls to limit server capabilities.

Organizations should implement token limits based on operation type and require explicit approval for tool execution.

*The copilot follows the malicious tool invocation request*

According to Paloalto Networks, organizations should evaluate AI security solutions, including runtime protection platforms and comprehensive security assessments.

The findings underscore the critical importance of securing AI infrastructure as LLM integration becomes increasingly prevalent across enterprise applications.

Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.

googlenews

Source link

Search

Three Critical Attack Vectors

Latest Posts