Security researchers have uncovered critical vulnerabilities in the Model Context Protocol (MCP) sampling feature.
Revealing how malicious servers can exploit LLM-integrated applications to conduct resource theft, conversation hijacking, and unauthorized system modifications.
| Attack Vector | Mechanism | Impact |
|---|---|---|
| Resource theft | Hidden instructions in sampling requests make the LLM generate extra, non-visible content. | Drains AI compute quotas and API credits by running unauthorized workloads without the user noticing. |
| Conversation hijacking | Alters assistant behavior across the entire session and can degrade the usefulness or enable harmful behavior. | Alters assistant behavior across the entire session and can degrade usefulness or enable harmful behavior. |
| Covert tool invocation | Embedded instructions cause the LLM to call tools without explicit user awareness or consent. | Enables unauthorized file operations, persistence, and possible data exfiltration or system modification. |
The Model Context Protocol, introduced by Anthropic in November 2024, standardizes how large language models integrate with external tools and data sources.
While designed to enhance AI capabilities, the protocol’s sampling feature, which allows MCP servers to request LLM completions, creates significant security risks when proper safeguards are absent.
Three Critical Attack Vectors
Paloalto researchers demonstrated three proof-of-concept attacks conducted on a widely used coding copilot:

Resource Theft: Attackers inject hidden instructions into sampling requests, causing LLMs to generate unauthorized content while appearing normal to users.
A malicious code summarizer, for example, appended instructions for generating fictional stories alongside legitimate code analysis. Consuming substantial computational resources and API credits without user awareness.
Conversation Hijacking: Compromised MCP servers can inject persistent instructions that affect entire conversation sessions.

In demonstrations, hidden prompts forced AI assistants to “speak like a pirate” in all subsequent responses. Demonstrating how malicious servers fundamentally alter system behavior and potentially compromise functionality.
Covert Tool Invocation: Malicious servers leverage prompt injection to trigger unauthorized tool executions. Researchers showed how hidden instructions could trigger file-writing operations, enabling data exfiltration.

Persistence mechanisms and unauthorized system modifications without explicit user consent.
The vulnerability stems from MCP sampling’s implicit trust model and lack of built-in security controls.
Servers can modify prompts and responses, allowing them to slip in hidden instructions while still appearing to be normal tools.

Effective defense requires multiple layers: request sanitization using strict templates to separate user content from server modifications.
Response filtering to remove instruction-like phrases and access controls to limit server capabilities.
Organizations should implement token limits based on operation type and require explicit approval for tool execution.

According to Paloalto Networks, organizations should evaluate AI security solutions, including runtime protection platforms and comprehensive security assessments.
The findings underscore the critical importance of securing AI infrastructure as LLM integration becomes increasingly prevalent across enterprise applications.
Follow us on Google News, LinkedIn, and X for daily cybersecurity updates. Contact us to feature your stories.
