China’s ‘autonomous’ AI-powered Hacking Campaign Still Required A Ton Of Human Work

Anthropic made headlines Thursday when it released research claiming that a previously unknown Chinese state-sponsored hacking group used the company’s Claude AI generative AI product to breach at least 30 different organizations.

According to Anthropic’s report, the threat actor was able to bypass Claude’s security guardrails using two methods: breaking up the work into discrete tasks to prevent the software from recognizing the broader malicious intentions, and tricking the model into believing it was conducting a legitimate security audit.

Jacob Klein, who leads Anthropic’s threat intelligence team, told CyberScoop that the company has seen increasingly novel uses of Claude to assist malicious hackers over the past year. In March, threat actors were copying and pasting from chatbot interactions trying to build malware or phishing lures. When the company’s code development tool, Claude Code was released, they saw bad actors use it to more quickly generate scripts and build code for their operations.

“And then [this operation] in September, I think what we’re seeing now in this case is to me the most autonomous misuse we’ve seen,” Klein said.

However, Klein also made it clear that “most autonomous” is a relative term. There is plenty of evidence to indicate this hacking group devoted significant human and technical resources into the way it used Claude.

Namely, the automation detailed in Anthropic’s report performed by Claude was made possible through a frontend framework designed to orchestrate and support its operations. The framework handled tasks such as scripting, provisioning related servers, and significant backend development to ensure every step was followed correctly. Klein noted this development process was the most difficult — and, importantly, human-led — step in the operation.

“The first part that is not autonomous is building the framework, so you needed a human being to put this all together,” Klein said. “You had a human operator that would put in a target, they would click a button and then use this framework that was created [ahead of time]. The hardest part of this entire system was building this framework, that’s what was human intensive.”

Additionally, to conduct reconnaissance on targets, scan for vulnerabilities and conduct other tasks, Claude called out to a set of open-source tools via Model Context Protocol (MCP) servers, which help AI models securely interface with external digital tools. Setting up these connections requires coding expertise, advanced planning, and technical work by humans to ensure interoperability.

Finally, Claude’s work was subject to constant human validation and review. An illustration of the attack chain details at least four different steps that explicitly involve having a human check Claude’s output or send the model back to work before taking additional steps.

This suggests that although Claude could perform these tasks autonomously, it relied on human oversight to review output, validate findings, ensure backend systems were working, and direct its next steps.

Anthropic’s report highlights a flaw common to all AI-generated research: models like Claude frequently hallucinate, fabricate credentials, exaggerate findings, or present publicly available information as significant discoveries. Because of this, using AI-generated research is challenging — threat actors, like any users, have no reliable way to trust the outputs at each stage without having technical human experts review and correct the results.

For instance, when it comes to vulnerability scanning, “step one is Claude comes back and says, ‘here’s all the assets I found related to this target,’ then sends it back to the human,” Klein said. “So Claude doesn’t go to the next step yet, which is this penetration testing step, until the human reviews.”

Even with all of the human intervention, Klein is genuinely worried about what the company discovered.

“I think what’s occurring here is that the human operator is able to scale themselves fairly dramatically,” Klein said. “We think it would have taken a team of about 10 folks to conduct this sort of work, but you still need a human operator. That’s why we said it’s not fully automatic or fully agentic.”

As to why the company believes this campaign has ties to China, Klein pointed to a number of factors, including infrastructure and behavior overlaps with previous Chinese state-sponsored actors, and a targeting set that strongly aligned with “what would have been the goals” of the Chinese Ministry of State Security.

Other smaller and circumstantial details point to a possible Chinese nexus: while the usage logs indicate that the group mostly operated “9am to 6pm like a standard bureaucrat,” the hackers didn’t work weekends and at one point in the midst of the operation appeared to conduct no activity during a Chinese holiday.

However, these were not the only pieces of evidence, as Klein said he could not divulge every piece of information that pointed them to China.

AI, security experts divided

While there has not been a lot of research into how AI has powered cyber espionage operations, there is ample evidence showing that large language models have improved over the past year when prompted with cybersecurity-specific tasks. Earlier this year, startup XBOW saw its AI vulnerability scanning and patching tool top the leaderboards at bug bounty companies like HackerOne.

On the offensive side, earlier this year researchers at NYU developed a similar framework to the one used in the campaign Anthropic discovered, using a publicly available version of ChatGPT to automate large chunks of a ransomware attack. The Anthropic report is believed to be the first publicly known instance of a similar process being used by a nation-state to carry out successful attacks.

Even with these advancements, the campaign and Anthropic’s report has caused a stir within AI and cybersecurity circles, with some saying it validates existing fears around AI-enabled hacking, while others have alleged the report’s conclusions give a misleading impression about the current state of cyber-espionage operations.

Kevin Beaumont, a U.K.-based cybersecurity researcher, , criticized Anthropic’s report for lacking transparency, and describing actions that are already achievable with existing tools, as well as leaving little room for external validation.

“The report has no indicators of compromise and the techniques it is talking about are all off-the-shelf things which have existing detections,” Beaumont wrote on LinkedIn Friday. “In terms of actionable intelligence, there’s nothing in the report.”

Klein told CyberScoop that Anthropic has shared indicators of compromise with tech firms, research labs and other entities that have information-sharing agreements with the company.

“Within private circles, we are sharing, it’s just not something that we wanted to share with the general public,” he said.

Other observers argued that Anthropic’s findings still represent an important milestone in AI cybersecurity application.

Jen Easterly, former director of the Cybersecurity and Infrastructure Security Agency, echoed some of the security community’s concerns around transparency, even as she gave credit to Anthropic for disclosing the attacks.

“We still don’t know which tasks were truly accelerated by AI versus what could have been done with standard tooling,” Easterly wrote Friday on LinkedIn. “We don’t know how the agent chains operated, where the model hallucinated, how often humans had to intervene, or how reliable the outputs actually were. Without more specifics (prompts, code samples, failures, friction points), it’s obviously harder for defenders to learn, adapt, and anticipate what comes next.”

Tiffany Saade, an AI researcher with Cisco’s AI defense team, told CyberScoop that it’s clear from Anthropic’s report that using tools like Claude offers attackers speed-and-scale advantages.

“The question is, is that enough?” to incentivize hackers to use LLMs over other forms of automation and deal with its associated limitations, she asked. “Will we see agents also tipping towards sophistication in the attacks and what type of sophistication are we talking about?”

Saade noted that some aspects of the operation described by Anthropic don’t fit a purely espionage-focused Chinese group. She pointed out it was odd for the hackers to use a major U.S. AI model for automation when they have access to their own private models. Additionally, companies like Anthropic and OpenAI have far greater cybersecurity and threat intelligence resources than open-source models, making it likely any malicious activity using their platforms would be detected.

“We knew this was going to happen, but what’s astonishing to me is … if I’m a Chinese state-sponsored actor and I do want to use AI models with agentic capabilities to do autonomous hacking, I probably would not go to Claude to do that,” Saade noted. “I would probably build something in-house and under the hood. So they did want to be seen.”

Saade floated another potential motivation for the hack: geopolitical messaging to Washington D.C. that Beijing’s hackers can do precisely what everyone is afraid of them doing.

“Usually the goal is ‘we want stealth, we want to maintain persistence.’ … This is not even sabotage, it’s sending a message: hypothesis validated,” Saade said. “They want that noise, the breaking news, the ‘Anthropic is reporting’ [headlines]. They want that visibility, and there’s a reason they want that visibility.”

Source link

Search

AI, security experts divided

Latest Posts