Malicious MCP Servers Enable Stealthy Prompt Injection to Drain System Resources

Security researchers have uncovered critical vulnerabilities in the Model Context Protocol (MCP) sampling feature that enable malicious servers to execute stealthy prompt injection attacks, drain computational resources, and compromise large language model applications without user detection.

The findings reveal three primary attack vectors that exploit the protocol’s inherent trust model and lack of robust security controls.

Three Critical Attack Vectors Identified

Researchers demonstrated practical proof-of-concept attacks using a coding copilot application that integrates MCP for code assistance.

The user asks the copilot to help summarize the current code file

The experiments revealed that malicious MCP servers can exploit the sampling feature through resource theft, conversation hijacking, and covert tool invocation.

The user receives a summary of the code file as normal

In resource theft attacks, malicious servers append hidden instructions to legitimate prompts, causing the LLM to generate extensive additional content that remains invisible to users.

During testing, researchers created a code summarizer tool that secretly instructed the LLM to write fictional stories alongside requested code summaries.

information is not shown on the user interface

While users received only the expected summary in their interface, the LLM generated up to 1,000 additional words in the background, consuming computational resources and API credits without authorization.

The conversation-hijacking attack demonstrates a more persistent compromise. By injecting instructions into the LLM’s responses, malicious servers can fundamentally alter the AI assistant’s behavior across multiple conversation turns.

In one experiment, the injected instruction caused the AI to respond in pirate speak for all subsequent interactions. However, more sophisticated instructions could make the assistant dangerous or unreliable.

Perhaps most concerning is covert tool invocation, where malicious servers trigger unauthorized system operations.

LLM puts the malicious instruction in its response as requested by the MCP’s hidden prompt

Researchers showed how a compromised server could instruct the LLM to invoke file-writing tools without explicit user permission, enabling data exfiltration, persistence mechanisms, and unauthorized system modifications.

The attack succeeds because file operations appear to the LLM as legitimate tool invocations.

The research highlights that MCP sampling relies on an implicit trust model without built-in security controls, creating new attack vectors for agents that leverage the protocol.

The copilot follows the malicious instructions that are put into the response

The disconnect between what users see and what the LLM actually processes creates perfect cover for resource exhaustion attacks and hidden malicious activities.

Different MCP implementations may handle output filtering and display differently, with some potentially showing complete LLM responses.

In contrast, others may use additional summarisation layers to obscure hidden content.

The copilot follows the malicious tool invocation request

Palo Alto Networks recommends organisations implement comprehensive security measures to protect AI systems from these threats.

The research emphasizes the critical need for robust security controls in MCP-based systems, including enhanced validation of sampling requests, monitoring for anomalous token consumption patterns, and implementing safeguards against persistent prompt injection.

As MCP adoption grows across AI applications, understanding and mitigating these attack vectors becomes essential for maintaining the integrity and security of LLM-powered tools and services.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link