Sublime Security has released a new analysis detailing a growing trend in email-based cyberattacks: a technique called indirect prompt injection. While social media is often full of people trying to trick chatbots with “ignore previous instructions” prompts, this new research shows how scammers are using that same logic to trick the AI models used in email filters. The goal is to force these systems into misclassifying malicious emails as safe and legitimate.
This research, shared exclusively with Hackread.com, shows that threat actors aren’t solely interested in fooling humans but are actively targeting the Machine Learning (ML) models that guard mailboxes, such as the Autonomous Security Analyst (ASA).
Breaking the AI Decision Cycle
According to researchers, indirect prompt injection works by stuffing phishing emails with carefully chosen content designed to influence the decision-making of a model. By embedding benign text, like archived newsletter copy from Adidas or snippets of romance novels, hackers can effectively “dilute” the malicious signals of a phishing link.
Researchers found that attackers hide this extra data in two main ways:
- Zero-font HTML: Setting text to 0pt so it remains invisible to people but fully readable by an AI scanner.
- Color-matching: Setting text to the same hex code as the email background.
If the hidden content is rich enough, the ASA might misclassify a scam as a harmless marketing update.
Real-World Detections
Further investigation by Sublime revealed two specific campaigns that used these methods. The first is the Adidas Newsletter Clone, which appears as a standard cloud storage scam used for phishing, but the HTML contained hidden text fetched directly from real Adidas newsletters found on sites like milled.com and emailinspire.com. The objective was to make the AI model see a high-reputation brand instead of a malicious link.
Then there is the healthcare-related scam, where a fake health insurance email used a professional layout with rounded boxes and accented buttons. To create a sense of legitimacy for the filters, scammers embedded a fictional story from goodnovel.com. Researchers noted the hackers likely hoped the AI model would mistake the message for a creative post from a platform like Substack or Patreon.
A Serious Threat
While the cybersecurity community has been testing these theories for months using tools like Lakera’s Gandalf game, these ‘in the wild’ attacks prove that scammers are advancing fiercely in exploiting AI-powered mechanisms.
Sublime noted that although these attacks currently represent less than 1% of seen traffic, they are a looming threat that signals what is to come. As we move toward ‘agentic mailboxes,’ where AI assistants take actions on our behalf, the risk of a model following a hidden malicious instruction becomes much higher.
“With indirect prompt injection via hidden text, attackers aren’t trying to force an AI into doing something it shouldn’t. Instead, they’re influencing AI into making an incorrect decision, but well within the bounds of its design. Nuanced prompt injection attacks will only increase over time as adversaries evolve, so it’s important that AI security tools can understand the full context of the messages they analyze,” the blog post reads.
Researchers conclude that the underlying mechanism of these models must be improved to understand the full context of a message, rather than just looking at surface-level links or keywords.

