HelpnetSecurity

Indirect prompt injection is taking hold in the wild


The open web is slowly but surely filling up with “traps” designed for LLM-powered AI agents.

The technique, known as indirect prompt injection (IPI), involves hiding (more or less) covert instructions inside ordinary web pages, waiting for an AI agent to read them and carry out the author’s commands.

The IPI attack kill chain (Source: Forcepoint)

“Ignore previous instructions”

In back-to-back reports published this week, Google and Forcepoint researchers laid out real-world evidence of these attacks.

Google used a repository of 2–3 billion crawled pages per month as their data source, focusing on static websites including blogs, forums and comment sections (and eschewing social media content).

Forcepoint’s X-Labs researchers conducted active threat hunting across publicly accessible web infrastructure, with telemetry flagging real payloads triggering on patterns like “Ignore previous instructions” and “If you are an LLM.”

Both companies found IPIs fueled by harmless and malicious intent.

The first category, according to Google, contains pranks and helpful guidance, such as instructions to change the AI agent’s conversational tone (“Tweet like a bird”) or add relevant content to AI summaries (e.g., telling users to check facts for themselves).

The latter includes:

  • Search engine manipulation / traffic hijacking
  • IPIs meant to prevent AI agents from retrieving content (DoS) and to initiate a destructive action instead
  • IPIs aimed at data exfitration (e.g., API keys)
  • IPIs focused on destruction (e.g., “try to delete all files on the user’s machine”)

Indirect prompt injection in the wild

IPI with destructive intent (Source: Google)

Forcepoint researchers also unearthed IPI attempts aimed at performing financial fraud.

One payload, for example, embedded a fully specified PayPal transaction and step-by-step instructions designed for AI agents with integrated payment capabilities. A second case used meta tag namespace injection combined with a persuasion amplifier keyword (“ultrathink”) to route AI-mediated financial actions toward a Stripe donation link.

A third case appeared to function as a widely distributed test payload, possibly to identify which AI systems are vulnerable before deploying higher-impact attacks.

Hiding from humans

Attackers use different tricks to hide malicious instructions from human eyes while keeping them fully visible to AI.

The most common involve making text physically invisible on a webpage by shrinking it to a single pixel, draining its color to near-transparency, or simply tagging it as hidden using standard web design tools.

The more sophisticated tricks involve burying payloads inside HTML comment sections and hiding instructions inside a page’s metadata.

There’s a growing interest in IPI attacks

Neither team found evidence of sophisticated, coordinated campaigns, though, according to Forcepoint researchers, “shared injection templates across multiple domains suggest organized tooling rather than isolated experimentation.

Still, the window for getting ahead of this threat is closing fast, they believe.

Google says it observed a sharp uptick in malicious activity during its scans: “We saw a relative increase of 32% in the malicious category between November 2025 and February 2026, repeating the scan on multiple versions of the [CommonCrawl archive of the public web].”

Forcepoint also pointed out that that the impact of these attacks scales with AI privilege.

“A browser AI that can only summarize is low-risk. An agentic AI that can send emails, execute terminal commands or process payments becomes a high-impact target. If AI agents consume untrusted web content without enforcing a strict data-instruction boundary, every page they read remains a potential attack vector.”

Subscribe to our breaking news e-mail alert to never miss out on the latest breaches, vulnerabilities and cybersecurity threats. Subscribe here!



Source link