DeepSeek-R1 LLM Fails Over Half of Jailbreak Attacks in Security Analysis


A recent security analysis of DeepSeek AI‘s distilled DeepSeek-R1 LLaMA 8B variant by cloud-based cybersecurity, compliance, and vulnerability management solutions provider Qualys has revealed crucial security and compliance concerns. 

According to researchers, the model failed a substantial portion of security tests conducted using Qualys TotalAI, a platform designed for AI security assessment.

For your information, Qualys TotalAI’s knowledge base analysis involves evaluating an LLM’s responses across 16 categories, including controversial topics, excessive agency, factual inconsistencies, harassment, hate speech, illegal activities, legal information, misalignment, overreliance, privacy attacks, profanity, self-harm, sensitive information disclosure, sexual content, unethical actions, and violence/unsafe actions. 

The model, as per Qualys research shared with Hackread.com, demonstrated weaknesses in several of these areas and performed poorly in misalignment tests.

Jailbreaking an LLM involves techniques to bypass safety mechanisms, potentially leading to harmful outputs. Qualys TotalAI tested the against 18 different jailbreak attack types, including AntiGPT, Analyzing-based (ABJ), DevMode2, PersonGPT, Always Jailbreaking Prompts (AJP), Evil Confidant, Disguise and Reconstruction (DRA), and Fire, etc.  

Around 885 jailbreak tests were conducted in total, and 891 knowledge base assessments were performed, indicating the comprehensive scale of the testing. The model failed 61% of knowledge base tests and 58% of jailbreak attempts.

Qualy’s granular data shows the variability in the model’s resistance to different jailbreak techniques. For example, while the overall jailbreak failure rate is 58% (513 failed tests), the model appears more vulnerable to some attacks (e.g., Titanius, AJP, Caloz, JonesAI, Fire) than others (e.g., Ucar, Theta, AntiGPT, Clyde). 

Nevertheless, its high failure rate indicates a significant susceptibility to adversarial manipulation as it at times generates instructions for harmful activities, creating hate speech content, promoting conspiracy theories, and providing incorrect medical information. 

Researchers also found that the model harbors salient compliance challenges. Its privacy policy states that user data is stored on servers in China, raising concerns about governmental data access, potential conflicts with international data protection regulations like GDPR and CCPA, and ambiguities surrounding data governance practices. This may impact organizations subject to strict data protection laws.

It is worth noting that soon after its release, Hackread.com reported Wiz Research discovering DeepSeek AI to be exposing over a million chat logs, including sensitive user interactions and authentication keys, highlighting deficiencies in their data protection measures. 

And, with DeepSeek-R1’s high failure rates against knowledge base attacks and jailbreak manipulations, it becomes risky for enterprise adoption at this stage. Therefore, a comprehensive security strategy, including vulnerability management, and adherence to data protection regulations is crucial for ensuring a risk-free, responsible AI adoption.

“Securing AI environments demands structured risk and vulnerability assessments—not just for the infrastructure hosting these AI pipelines but also for emerging orchestration frameworks and inference engines that introduce new security challenges,” Qualys researchers noted in their blog post shared with Hackread.com.

J Stephen Kowski, Field CTO at SlashNext, commented on the issue, stating that DeepSeek-R1’s ability to bypass safety controls poses serious security and compliance risks. Its high failure rate makes it vulnerable to social engineering attacks. AI-powered detection, real-time monitoring, and multi-layered security are essential to mitigate threats.





Source link