Mindgard Finds Sora 2 Vulnerability Leaking Hidden System Prompt Via Audio

AI security firm Mindgard discovered a flaw in OpenAI’s Sora 2 model, forcing the video generator to leak its system prompt through audio transcripts. Read how this leak exposed the foundational rules of OpenAI’s video tool.

A new study by Mindgard, a company specialising in AI security testing, has revealed a surprising way to get OpenAI‘s advanced video creation tool, Sora 2, to reveal its internal rulebook, or system prompt.

This rulebook defines the AI model’s safety limits and operational guidelines. Researchers discovered that asking the multi-talented model to speak its secrets was the most effective approach. This research, shared with Hackread.com, began on November 3, 2025, and was published on November 12, 2025.

Bypassing the Digital Guardrails

System prompts are like the brain’s internal guide for a large language model (LLM), telling the AI to “respond normally in all other cases” unless, for instance, it’s asked to generate a video. As we know it, companies program the AI to refuse to share these hidden rules, which are critical for security.

The Mindgard team, led by Aaron Portnoy, Head of Research and Innovation, tried various methods to expose the rules through text, image, video, and audio. Because Sora 2 clips are limited to about 10 to 15 seconds, they had to work in stages, extracting short tokens across many frames and stitching them together later.

When asked to display text in a video, the results were often distorted. The researchers observed that the text began legible but quickly deteriorated as the video played. As the report says, “Moving from text to image to video compounds errors and semantic drift.”

Audio Was the Breakthrough

The clearest recovery path was through audio generation. Asking Sora 2 to speak short parts of the prompt allowed them to use transcripts to piece together a nearly complete set of foundational instructions. They even sped up the audio to fit more text into the short clips. The report noted that this method “produced the highest-fidelity recovery.”

This simple trick reconstructed the system prompt, revealing specific internal rules, such as avoiding “sexually suggestive visuals or content.” The researchers noted they recovered a detailed, foundational instruction set from the model, too, which is the model’s core configuration code and suggests they accessed the AI’s secret, developer-set rules.

This process confirms that even with strong safety training, creative prompts can still expose core settings. Multi-modal models like Sora 2 create new security pathways for information leakage through audio and video outputs.

This video, generated by Sora 2, begins with fairly legible text, but quickly deteriorates as playback ensues or as long text is generated (Source: Mindgard)

To address this, Mindgard provided key advice– AI builders should treat system prompts as secret settings, test audio/video outputs for leaks, and limit response length. Conversely, users must ask vendors if rules are private, check that video/audio outputs are protected, and review their overall rule management.

Source link

Search

Bypassing the Digital Guardrails

Audio Was the Breakthrough

Latest Posts