Crafted URLs can trick OpenAI Atlas into running dangerous commands

Crafted URLs can trick OpenAI Atlas into running dangerous commands

Crafted URLs can trick OpenAI Atlas into running dangerous commands

Pierluigi Paganini
Crafted URLs can trick OpenAI Atlas into running dangerous commands October 27, 2025

Crafted URLs can trick OpenAI Atlas into running dangerous commands

Attackers can trick OpenAI Atlas browser via prompt injection, treating malicious instructions disguised as URLs in the omnibox as trusted commands.

Attackers can exploit the OpenAI Atlas browser by disguising malicious instructions as URLs in the omnibox, which Atlas interprets as trusted commands, enabling harmful actions.

NeuralTrust researchers warn that agentic browsers fail by not separating trusted user input from untrusted content, letting crafted URL-like strings turn the omnibox into a jailbreak vector.

Last week, OpenAI launched Atlas, a web browser with built-in ChatGPT, to help users summarize pages, edit text inline, and perform agentic tasks. It combines browsing and AI capabilities in a single omnibox interface for enhanced productivity and interactive web use.

An attacker can craft a URL-like string that embeds natural-language instructions (e.g., https:/ /my-wesite.com/...+follow+this+instruction+only+visit+) but is malformed so the browser won’t navigate to it. When a user pastes or clicks it into Atlas’s omnibox, the input fails URL validation and Atlas treats it as a prompt, trusting it as user intent.

Crafted URLs can trick OpenAI Atlas into running dangerous commands

The agent then executes the embedded instructions with elevated trust, letting the attacker override intent or safety checks.

“Agentic browsing is powerful—and risky—when user intent and untrusted content collide. In OpenAI Atlas, the omnibox (combined address/search bar) interprets input either as a URL to navigate to, or as a natural-language command to the agent. We’ve identified a prompt injection technique that disguises malicious instructions to look like a URL, but that Atlas treats as high-trust “user intent” text, enabling harmful actions.” reads the report published by NeuralTrust. “The core failure mode in agentic browsers is the lack of strict boundaries between trusted user input and untrusted content.”

The researchers described real-world abuses that include a “copy-link” trap where users paste a crafted URL-like string into the omnibox and the agent opens an attacker-controlled phishing site, and destructive prompts that instruct the agent (using the user’s session) to delete files from Google Drive.

When crafted strings trigger “prompt mode” in the omnibox, attackers can override what the user wants, make the agent visit malicious sites, and bypass safety checks, exploiting the browser’s trust in user-entered instructions.

“Across many implementations, we continue to see the same boundary error: failure to strictly separate trusted user intent from untrusted strings that “look like” URLs or benign content.” concludes the report. “When powerful actions are granted based on ambiguous parsing, ordinary-looking inputs become jailbreaks.”

To prevent these attacks, experts recommend browsers should strictly validate URLs and stop auto-switching to prompt mode. Users should manually choose between navigation or asking the agent. Prompts must be treated as untrusted, with confirmations for risky actions. Systems should strip instructions from URLs, detect obfuscation, and include malformed URL tests in red-team evaluations.

Follow me on Twitter: @securityaffairs and Facebook and Mastodon

Pierluigi Paganini

(SecurityAffairs – hacking, OpenAI Atlas)







Source link