
From there, attackers use indirect prompt injection techniques to manipulate the AI into executing malicious instructions. The model is tricked into generating requests that include sensitive data while interpreting the instructions as benign.
In a disclosure, Noma said that the key technical breakthrough came from bypassing client-side protections designed to block external image loading. By exploiting a flaw in URL validation, specifically using protocol-relative URLs like //attacker.com, the system mistakenly treats malicious external resources as safe, allowing outbound requests to the attacker’s infrastructure.
Finally, the attack evades AI guardrails themselves by inserting specific keywords, such as INTENT, into prompts to convince the model that the request was legitimate. Once processed, the system attempts to render an image, embedding sensitive data into the request sent to the attacker’s server.
