Anthropic threat researchers believe that they’ve uncovered and disrupted the first documented case of a cyberattack executed with the help of its agentic AI and minimal human intervention.
“The threat actor manipulated [Anthropic’s large language model] Claude into functioning as an autonomous cyber attack agent performing cyber intrusion operations rather than merely providing advice to human operators,” the company noted.
“Analysis of operational tempo, request volumes, and activity patterns confirms the AI executed approximately 80 to 90 percent of all tactical work independently, with humans serving in strategic supervisory roles. Human intervention occurred at strategic junctures including approving progression from reconnaissance to active exploitation, authorizing use of harvested credentials for lateral movement, and making final decisions about data exfiltration scope and retention.”
The attack setup
Claude is an LLM that can operate as an agent when a system gives it the ability to act: it can be provided with a goal, break it into steps, and plan and carry out those steps on its own, by calling on connected software tools, APIs, scripts, etc. Most importantly, though, it can look at the result of an action and decide what to try next.
Researchers often place Claude inside orchestration systems that schedule tasks, handle memory and manage tools. Inside these setups, Claude becomes the reasoning and decision-making “brain” of a larger automated workflow.
And this is essentially what this threat actor did, according to Anthropic: they developed an autonomous attack framework that used Claude Code and open standard Model Context Protocol (MCP) tools.
“The framework used Claude as an orchestration system that decomposed complex multi-stage attacks into discrete technical tasks for Claude sub-agents—such as vulnerability scanning, credential validation, data extraction, and lateral movement—each of which appeared legitimate when evaluated in isolation,” Anthropic researchers explained.
“By presenting these tasks to Claude as routine technical requests through carefully crafted prompts and established personas, the threat actor was able to induce Claude to execute individual components of attack chains without access to the broader malicious context.”
The attack lifecycle (Source: Anthropic)
The researchers detected this operation in mid-September 2025 and believe that it was conducted by a Chinese state-sponsored group.
The list of the 30 or so entities targeted by the threat actor included tech and chemical manufacturing companies, financial institutions, and government agencies in several countries. Anthropic stated that in a few cases, the attackers managed to pull off successful intrusions.
Some interesting tidbits from the report
The attackers did not try to come up with new solutions when good ones already exist: they mostly used open source penetration testing tools, existing network scanners, database exploitation frameworks, password crackers, and binary analysis suites.
“The minimal reliance on proprietary tools or advanced exploit development demonstrates that cyber capabilities increasingly derive from orchestration of commodity resources rather than technical innovation. This accessibility suggests potential for rapid proliferation across the threat landscape as AI platforms become more capable of autonomous operation,” Anthropic’s researchers noted.
The attackers “social-engineered” Claude: they tricked the AI model into believing that the actions it was asked to do were not illegal. “The key was role-play: the human operators claimed that they were employees of legitimate cybersecurity firms and convinced Claude that it was being used in defensive cybersecurity testing.”
(Similarly, Cisco researchers recently discovered that attackers probing AI systems for harmful information will often succeed in bypassing the systems’ guardrails by repeating and reframing their prompts, e.g., by claiming the information is needed for research, or posing requests as part of fictional scenarios.)
Finally, Claude often exaggerated its results and sometimes made up information during autonomous runs, which forced attackers to validate the results before they could be used.
Apart from slowing down attacks somewhat, this also makes it currently impossible to leverage Claude (or other LLMs and agentic AI) for fully autonomous cyberattacks, the researchers pointed out.
Nevertheless, according to Anthropic, “this approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement.”

Subscribe to our breaking news e-mail alert to never miss out on the latest breaches, vulnerabilities and cybersecurity threats. Subscribe here!

