CISOOnline

Prompt injection breaks today’s AI agents, study warns

The researchers executed 3,168 adversarial runs across NanoBrowser and BrowserUse using 264 benchmark cases. Indirect prompt injection attacks, where malicious instructions are hidden inside ordinary web content such as product reviews and metadata, achieved attack success rates ranging from 41.67% to 68.16%, while direct prompt injection exceeded 79% across all tested configurations.

“Crucially, these failures exhibit distinct patterns when analysed through a stakeholder lens: some attacks succeed without disrupting the user’s delegated task while disproportionately harming third parties (stealthy parasitism), whereas others disrupt task completion without realizing the adversarial objective (misaligned disruption),” the researchers wrote in a paper.

OpenAI and Google did not immediately respond to requests for comment.

Every attack objective exposed at least one failure mode

The benchmark evaluated web agents across four possible outcomes: Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure. Robust Behavior represents the ideal state in which an agent completes a user’s task without advancing an attacker’s objective or exhibiting execution instability.



Source link