Policymakers and companies are reckoning with increased reports over the past few months showing AI tools being leveraged to conduct cyber attacks on a larger and faster scale.
Most notably, Anthropic reported last month that Chinese hackers had jailbroken and tricked its AI model Claude into assisting with a cyberespionage hacking campaign that ultimately targeted more than 30 entities around the world.
The Claude-enabled Chinese hacks have underscored existing concerns among AI companies and policymakers that the technology’s development and relevance to offensive cybersecurity may be outpacing the cybersecurity, legal and policy responses being developed to defend against them.
At a House Homeland Security hearing this week, Logan Graham, head of Anthropic’s red team, said the Chinese spying campaign demonstrates that worries about AI models being used to supercharge hacking are more than theoretical.
“The proof of concept is there and even if U.S. based AI companies can put safeguards against using their models for such attacks, these actors will find other ways to access this technology,” said Graham.
Graham and others at Anthropic have estimated that the attackers were able to automate between 80-90% of the attack chain, and in some cases at exponentially faster speeds than human operators. He called for more rapid safety and security testing of models by AI companies and government bodies like the National Institute for Standards and Technology and a prohibition on selling high-performance computer chips to China.
Royal Hansen, vice president of security at Google, suggested that defenders needed to use AI to beat AI.
“It’s in many ways using commodity tools we already have to find and fix vulnerabilities,” said Hansen. “Those can be turned from offensive capabilities to patching and fixing, but the defenders have to put shoes on – they have to use AI – in defense.”
Some lawmakers pressed Graham on why it took the company two weeks to identify the attackers using their products and infrastructure. Anthropic officials told CyberScoop at the time that they rely mostly on external monitoring of user behavior rather than internal guardrails to identify malicious activity.
Graham responded that the company’s investigation of the hack concluded
“it was clear this was a highly resourced, sophisticated effort to get around the safeguards in order to conduct the attack.”
Rep. Seth Magaziner, D-R.I., expressed incredulity at the ease by which the attackers were able to jailbreak Claude, and that Anthropic seemingly had no means of automatically flagging and reviewing suspicious requests in real time.
“I would just say as a layperson, that seems like something that ought to be flagged, right?” Magaziner said. “If someone says ‘help me figure out what my vulnerabilities are,’ there should be an instant flag that someone may actually be looking for vulnerabilities for a nefarious purpose.”
An eager dog playing fetch
However, some cybersecurity professionals have presented a more nuanced portrait of the current moment. Many acknowledge that AI tools pose real challenges and are becoming increasingly effective and relevant to hacking and cybersecurity—a trend that is likely to continue. However, they push against what they see as exaggerated claims about the immediate threat AI poses today.
Andy Piazza, director of threat intelligence for Unit 42 at Palo Alto Networks, told CyberScoop that AI tools are definitely lowering the technical bar for threat actors, but are not leading to novel kinds of attacks or the creation of an all-powerful hacking tool. Much of the malware LLMs create, for instance, tend to be drawn from previously published exploits on the internet, and are thus easily detectable by most threat monitoring tools.
According to a KPMG survey of security executives, seven out of 10 businesses are already dedicating 10% or more of their annual cybersecurity budgets to AI-related threats, even as only half that number (38%) see AI-powered attacks as a major challenge over the next 2-3 years.
Executives at XBOW, a startup that has created an AI-powered vulnerability hunting program, represent the defensive side of the same coin: they seek to leverage many of the same capabilities that offensive hackers have found attractive, but in the name of penetrating testing to find, fix and prevent exploitable vulnerabilities.
During a virtual briefing on the Anthropic attack this month, XBOW’s head of AI Albert Zielger said that while the Anthropic report does indeed reveal real advantages in using LLMs to automate and speed up parts of the attack chain, an model’s level of autonomy greatly varies depending on the task its assigned. He called these limitations “uniform,” saying they exist in all current generative AI systems.
To begin with, using just a single model or agent will typically not suffice for more complex hacking tasks, both because of the high-volume of requests needed to successfully direct the model to exploit even a small attack surface and because over time “the agent itself breaks” and loses critical context. Using multiple agents presents other problems, as they will frequently lock out or undermine the work of other agents.
AI tools have gotten good at some tasks, like fine tuning malware payloads and network reconnaissance. They’ve also gotten good at “course correcting” when provided with human feedback.
But that feedback is often critical.
“In some areas the AI is really good with just a bit of scaffolding, and others we need to provide a lot of structure externally,” Zielger said.
Nico Waisman, XBOW’s head of security, said that whether you’re using today’s AI for attack or defense, the main consideration is not the unique capabilities it provides, rather it’s more about the return on investment you’re getting from using it.
There’s one more problem: LLMs are notoriously eager to please, and this causes problems for hackers and bug hunters alike. That means it frequently hallucinates or overstates its evidence to conform to its user’s desire.
“Telling the LLM like ‘go find me an exploit,’ it’s a bit like talking to a dog and telling him ‘hey, fetch me the ball,” said Ziegler. “Now the dog wants to be a good boy, he’s going to fetch you something, and it will insist that it’s the ball.”
But “there may not be a ball there…it might be a clump of red leaves.”
