Attackers Can Poison AI Research Agents Using Reddit and Wikipedia Content

June 22, 2026 4 min read

Attackers can now manipulate AI “deep-research” agents by discreetly editing Reddit threads and Wikipedia pages. They can insert as little as a 13-word snippet, which these agents may later reference as authoritative advice, product recommendations, or even scams in their responses.

New research from Cornell Tech shows that these agents often rely on the same user-generated content (UGC) URLs. This makes public discussion platforms a significant target for influencing AI search results and research outputs without altering the underlying models.

Flow of Attackers Can Poison AI Research Agents (Source: Arxiv)

Attackers Can Poison AI Research Agents

At the center of this risk is a class of multi-step “deep‑research” systems such as STORM, Co‑STORM, and OmniThink, which decompose user questions into multiple sub‑queries, issue a flurry of web searches, and synthesize long‑form, citation‑rich reports from the retrieved sources.

Instead of relying on a static, curated corpus, these agents read directly from the open web, drawing heavily on UGC platforms like Reddit, Wikipedia, Quora, YouTube, and forums, sites that are both highly ranked in search results and trivial for attackers to edit.

Measurements across 176 realistic queries show that 17–23% of all retrieved URLs for these agents are UGC, and Reddit alone accounts for roughly half to two‑thirds of that pool, making it the prime target for adversaries.

Within a single topic cluster (for example, “cancel Xfinity internet” or “best dating apps for divorced men over 50”), the same handful of UGC pages often reappear across many related queries, with individual pages recurring in up to 48% of runs, which means compromising just one thread can influence a whole family of questions.

The attack the researchers describe, dubbed WARP (Web Agent Retrieval Poisoning), weaponizes this overlap by appending short, persuasive payloads to high‑value UGC pages that deep‑research agents already retrieve organically.

An attacker first performs reconnaissance using a normal web search, identifying Reddit threads or wiki pages that repeatedly appear for a target topic, such as account cancellations, local business recommendations, dating advice, or crypto investing.

They then craft a brief, commercially styled paragraph, on the order of 80–120 words for full‑page poisoning, or compressed to roughly 13 words for search‑snippet attacks, promoting a fictional product, service, or coin while mimicking the tone of organic user opinions to evade moderation.

Finally, they deploy that text as a Reddit comment, Wikipedia edit, or forum reply; once indexed by search, the snippet is pulled into deep‑research agents’ pipelines and treated as trustworthy evidence, not as untrusted user input.

In a search‑result‑snippet setting where systems only see around 25 words per URL, a single poisoned Reddit URL containing about 13 attacker‑chosen words achieved conditional “mention” rates, cases where the fake entity is actively recommended in the final answer, between roughly 38% and 51% on open‑source agents, and higher when multiple URLs or subreddits were targeted.

Even when the poisoned paragraph was appended to a full Reddit thread and accounted for less than 4% of the retrieved content, agents still repeated the planted claims in about 30–53% of runs in which the page was viewed, underscoring that length dilution alone does not neutralize the threat.

In one example highlighted by the authors as reported by Arxiv, a fictitious “BananaCoin” cryptocurrency was elevated alongside Bitcoin and Ethereum in long‑term investment advice after its name was quietly injected into a Medium‑linked snippet; in another, an imaginary “SilverPath” dating app became the top recommendation for divorced men over 50.

A third scenario shows a bogus “CancelEase” service being recommended as a convenient way to terminate Xfinity service, purely because a short promotional line was appended to a Reddit thread the agent used as a source.

Crucially, the researchers emphasize that WARP does not require any compromise of AI providers, model weights, or proprietary retrieval indices; instead, it exploits the implicit trust these systems place in public, writable content that traditional SEO and moderation already struggle to police.

Because the attack rides on normal search behavior, the same poisoned UGC page can affect multiple deep‑research architectures and even commercial agents like ChatGPT Deep Research and Google Gemini, which similarly integrate web citations into synthesized answers.

Follow‑on reporting notes that brands and scammers can translate this into a straightforward influence playbook: identify high‑ranking Reddit threads or wiki pages for your niche, drop a short, LLM‑friendly promotional snippet, and let AI search experiences amplify it into seemingly neutral recommendations.

Defenses such as outright blocking UGC domains, aggressive input filtering, or output similarity checks either degrade answer quality or fail to distinguish well‑written, LLM‑generated poison from legitimate community content, leaving today’s AI research agents uncomfortably exposed to subtle web‑scale manipulation.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link