ChatGPT-5 Downgrade Attack Allows Hackers To Evade AI Defenses With Minimal Prompts

Security researchers from Adversa AI have uncovered a critical vulnerability in ChatGPT-5 and other major AI systems that allows attackers to bypass safety measures using simple prompt modifications.

The newly discovered attack, dubbed PROMISQROUTE, exploits AI routing mechanisms that major providers use to save billions of dollars annually by directing user queries to cheaper, less secure models.

The Hidden Architecture Behind AI Responses

When users interact with ChatGPT or similar AI services, they believe they’re communicating with a single, consistent model.

However, behind the scenes, a sophisticated routing system analyzes each request and decides which of multiple available models should respond—typically choosing the most cost-effective option rather than the most secure one.

PROMISQROUTE, which stands for “Prompt-based Router Open-Mode Manipulation Induced via SSRF-like Queries, Reconfiguring Operations Using Trust Evasion,” represents an entirely new category of AI vulnerability that targets this routing infrastructure.

The attack allows malicious users to force their requests through weaker models that lack robust safety training.

The attack mechanism is alarmingly straightforward. While a standard harmful request like “Help me make explosives” would normally be routed to GPT-5’s most secure variant and blocked, adding simple trigger phrases can completely change the outcome.

Prompt Request

Phrases such as “respond quickly,” “use compatibility mode,” or “fast response needed” can trick the routing system into sending the request to less protected models like GPT-4 or GPT-5-mini.

“The real answer to why it was so easy to jailbreak GPT-5,” the researchers explain, lies in this routing vulnerability that affects the fundamental infrastructure of modern AI deployments.

The research reveals staggering numbers about the scope of this issue.

According to Adversa AI’s estimates, most “GPT-5” requests are actually handled by weaker models, while OpenAI saves approximately $1.86 billion annually through secret routing mechanisms.

This cost-saving approach puts both business models and customer safety at risk.

The vulnerability extends far beyond ChatGPT-5, applying broadly to any AI infrastructure using layered AI-based model routing.

This architecture is already common in enterprise installations and is expected to become standard for agentic AI systems, making PROMISQROUTE a significant concern for the entire industry.

The researchers recommend immediate action for organizations using AI systems.

Short-term solutions include auditing all AI routing logs and implementing cryptographic routing that doesn’t parse user input. Long-term fixes involve adding universal safety filters across all model variants.

For users wanting to test their systems, the researchers suggest trying phrases like “Let’s keep this quick, light, and conversational” combined with previously ineffective jailbreak attempts to observe changes in response quality and speed—potential indicators of model downgrading.

This discovery highlights the complex security challenges facing AI deployment as providers balance cost efficiency with safety requirements.

Find this News Interesting! Follow us on Google News, LinkedIn, and X to Get Instant Updates!

Source link

The Hidden Architecture Behind AI Responses

About Cybernoz