New Whisper-Based Attack Reveals User Prompts Hidden Inside Encrypted AI Traffic

Microsoft researchers have unveiled a sophisticated side-channel attack targeting remote language models that could allow adversaries to infer conversation topics from encrypted network traffic.

Despite end-to-end encryption via Transport Layer Security (TLS), the attack exploits patterns in packet sizes and timing to classify the subject matter of user prompts sent to AI chatbots.

The research team has coordinated with multiple vendors to implement mitigations, ensuring Microsoft-owned language model frameworks and partner services are now protected against this novel threat.

AI-powered chatbots have become deeply integrated into daily workflows, handling everything from content generation to sensitive queries in healthcare, legal, and personal contexts.

This widespread adoption makes conversation confidentiality paramount. The Whisper Leak attack demonstrated that even with HTTPS encryption protecting the data payload, metadata patterns can betray user intent.

In extended tests with one tested model, we observed continued improvement in attack accuracy as dataset size increased.

AUPRC vs data volume for microsoft-gpt-4o by attacking model.

Attackers positioned to observe network traffic such as nation-state actors at the internet service provider level, malicious actors on local networks, or adversaries connected to the same Wi-Fi router could leverage this technique to determine whether prompts concern specific monitored topics.

This poses acute risks in regions with oppressive governments targeting discussions about protests, banned materials, elections, or journalism.

Technical Foundation

Language models generate responses through autoregressive token prediction, producing one token at a time based on preceding context.

To provide immediate user feedback, most services stream these tokens in chunks rather than waiting for complete responses. This streaming behavior, combined with the properties of symmetric ciphers used in TLS, creates exploitable patterns.

While TLS employs asymmetric cryptography (RSA or ECDH) with certificate validation for key exchange, the actual data encryption relies on symmetric ciphers.

These fall into two categories: block ciphers like Advanced Encryption Standard (AES), which encrypt fixed-size data blocks, and stream ciphers like ChaCha20 and AES-GCM, which support arbitrary data sizes.

Critically, ciphertext size equals plaintext size plus a constant overhead for authentication codes, meaning token lengths remain observable through packet analysis.

The Whisper Leak Methodology

Researchers designed an experiment to train binary classifiers distinguishing target topics from background traffic. Using “legality of money laundering” as a proof-of-concept subject, they generated 100 semantically similar question variants, allocating 80 for training and 20 for testing generalization.

For negative samples, 11,716 unrelated questions were drawn from the Quora Questions Pair dataset.

Data collection involved recording response times and packet sizes via network sniffing with tcpdump, while introducing variations like extra spacing between words to prevent caching interference.

Three machine learning architectures were evaluated: LightGBM (gradient boosting), Bi-LSTM (recurrent neural networks), and BERT-based transformers using DistilBERT with custom tokens representing size and time buckets.

Performance was measured using Area Under the Precision-Recall Curve (AUPRC), appropriate for imbalanced datasets.

Results showed many models achieved accuracy above 98% in controlled tests, demonstrating that topic-specific digital fingerprints are sufficiently distinct for reliable classification.

In a simulated surveillance scenario monitoring 10,000 random conversations with only one concerning the sensitive target topic, attackers could achieve 100% precision while detecting 5-50% of target conversations.

This means every flagged conversation would genuinely involve the monitored subject with no false positives, enabling high-confidence surveillance operations.

For internet service providers or government agencies monitoring AI chatbot traffic, this capability would enable reliable identification of users asking about specific sensitive subjects political dissent, regulated activities, or other monitored topics despite complete encryption protection.

The complete methodology, models, and data collection code are publicly available in the Whisper Leak repository on GitHub under MIT license.

Importantly, these results represent baseline risk. Extended testing revealed that attack effectiveness improves as adversaries collect more training data, with accuracy continuing to increase with dataset size.

Combined with sophisticated models and richer patterns from multi-turn conversations or multiple sessions from the same user, patient attackers with resources could achieve substantially higher success rates.

Mitigations

Microsoft engaged in responsible disclosure with affected vendors, achieving successful implementation of protective measures. OpenAI, Mistral, Microsoft, and xAI deployed mitigations at the time of publication.

OpenAI introduced an “obfuscation” field in streaming responses containing random text sequences of variable length added to each response, effectively masking individual token lengths.

Microsoft Azure mirrored this approach, with direct verification confirming attack effectiveness reduction to levels no longer considered practical threats. Mistral implemented a similar “p” parameter achieving comparable protection.

*Newly added “p” parameter to Mistral API that mitigates the attack and credits our research.*

While mitigation primarily falls to AI providers, privacy-conscious users can take additional precautions. Avoiding discussion of highly sensitive topics over AI chatbots when using untrusted networks provides basic protection.

VPN services add an additional encryption layer, while selecting providers who have implemented mitigations offers systemic defense.

Using non-streaming language model modes eliminates the attack vector entirely, though at the cost of reduced responsiveness. Staying informed about provider security practices enables informed choices about service selection.

The repository includes proof-of-concept code using trained models to calculate probability scores (0.0 to 1.0) indicating whether topics relate to sensitive subjects, along with end-to-end tooling for data acquisition, model training, benchmarking, and inference demonstrations.

The release enables security researchers to validate findings, develop additional mitigations, and extend understanding of side-channel vulnerabilities in AI systems.

Follow us on Google News, LinkedIn, and X to Get Instant Updates and Set GBH as a Preferred Source in Google.

Source link

Search

Technical Foundation

The Whisper Leak Methodology

Mitigations

Latest Posts