Popular LLMs Dangerously Vulnerable To Iterative Attacks, Says Cisco

Some of the world’s most widely used open-weight generative AI (GenAI) services are profoundly susceptible to so-called “multi-turn” prompt injection or jailbreaking cyber attacks, in which a malicious actor is able to coax large language models (LLMs) into generating unintended and undesirable responses, according to a research paper published by a team at networking giant Cisco.

Cisco’s researchers tested Alibaba Qwen3-32B, Mistral Large-2, Meta Llama 3.3-70B-Instruct, DeepSeek v3.1, Zhipu AI GLM-4.5-Air, Google Gemma-3-1B-1T, Microsoft Phi-4, and OpenAI GPT-OSS-2-B, engineering multiple scenarios in which the various models’ output disallowed content, with success rates ranging from 25.86% against Google’s model, up to 92.78% in the case of Mistral.

The report’s authors, Amy Chang and Nicholas Conley, alongside contributors Harish Santhanalakshmi Ganesan and Adam Swanda, said this represented a two to tenfold increase over single-turn baselines.

“These results underscore a systemic inability of current open-weight models to maintain safety guardrails across extended interactions,” they said.

“We assess that alignment strategies and lab priorities significantly influence resilience: capability-focused models such as Llama 3.3 and Qwen 3 demonstrate higher multi-turn susceptibility, whereas safety-oriented designs such as Google Gemma 3 exhibit more balanced performance.

“The analysis concludes that open-weight models, while crucial for innovation, pose tangible operational and ethical risks when deployed without layered security controls … Addressing multi-turn vulnerabilities is essential to ensure the safe, reliable and responsible deployment of open-weight LLMs in enterprise and public domains.”

Search

Latest Posts

Search

What is a multi-turn attack?

Whose responsibility?

Latest Posts