Microsoft identifies seven new ways AI agents can be hacked

June 5, 2026 1 min read

The seven new failure modes it has identified are:

Agentic Supply Chain Compromise —agent behavior can be affected by natural language rather than malicious code;
Goal Hijacking — adversarial instructions appear aligned with legitimate task completion, while silently redirecting the agent’s terminal goal;
Inter-Agent Trust Escalation —a compromised agent asserts false identity or inflates claimed permissions to an orchestrator;
Computer Use Agent (CUA) Visual Attack — agents operating through graphical interfaces can be manipulated through content that carries adversarial instructions for the agent;
Session Context Contamination —an adversary introduces data that biases the agent’s reasoning in subsequent steps, without triggering safety controls at any individual step;
MCP / Plugin Abuse — an update on the original taxonomy’s coverage of function compromise around MCP and plugin protocols, specifically attack surfaces specific to those protocols;
Capability / Architecture Disclosure —an agent reveals internal implementation details such as tool names and schemas, system-prompt structure, memory interfaces, or consent/human-in-the-loop trigger logic.

Microsoft advises security teams using these definitions to influence their planning to inventory their your supply chain, generating a software bill of materials (SBOM) for every deployed agent, to verify agent identity cryptographically, not positionally, by issuing attestable credentials at provisioning, to add the seven new failure modes to their red-team coverage matrix, and to audit the human-in-the-loop user experience as a security control.

This article first appeared on InfoWorld.

Source link