Anthropic said a suspected state-linked hacker manipulated one of its agentic AI-based coding tools to conduct a sophisticated espionage campaign in September against about 30 major organizations across the globe, according to a blog post published Thursday.
The hackers used the company’s Claude Code to target a range of organizations, including chemical manufacturing, large technology firms, financial institutions and government agencies. The threat actor, which the company has designated GTG-1002, successfully breached a small number of the targets, according to Anthropic.
The company claims the attack may be one of the first large-scale cyberattacks committed without significant human involvement. Between 80% and 90% of the attack was conducted using AI, according to Anthropic, with human intervention required at between four to six key decision points.
Anthropic said it banned a number of accounts linked to the attack, notified affected organizations and reported the attacks to authorities.
It added that the human operators chose the targets of the attack and then developed a framework to launch the hacks. The Claude Code tool was set up to automatically conduct the attacks.
Because the tool is trained to avoid being used for harmful purposes, the attackers jailbroke Claude Code, which allowed them to bypass its built-in guardrails.
Various steps in the attack were broken down into simple tasks, which the tool interpreted as innocent, incremental steps without fully understanding the context of what it was being asked to do, according to the blog post. As part of the deception, the hackers convinced Claude they were employees of a cybersecurity firm and that the actions were part of defensive tests.
The tool was used to conduct reconnaissance and find high-value databases. Claude identified and tested security vulnerabilities inside these targeted systems – and then wrote its own exploit code. After the tool harvested usernames and passwords, Claude was then used to search for privileged accounts, create malicious backdoors and conduct large scale data theft.
The Anthropic disclosure comes more than a week after Google Threat Intelligence Group issued a report showing hackers using AI-enabled malware in active attacks.
Researchers identified malware families, including Prompflux and Promptsteal, that utilized large language models.
State-linked actors from North Korea, Iran and China have also used Gemini, which is Google’s AI technology, to enhance their operations.
Google researchers said these attacks are likely not isolated incidents, but evidence of a growing attack trend.
“Many others will be doing the same soon or already have,” John Hultquist, chief analyst, GTIG, told Cybersecurity Dive. “The real question is whether we can adapt as rapidly as the adversary.”
