Chinese State Hackers Jailbroke Claude AI Code for Automated Breaches – Hackread – Cybersecurity News, Data Breaches, Tech, AI, Crypto and More

Chinese State Hackers Jailbroke Claude AI Code for Automated Breaches – Hackread – Cybersecurity News, Data Breaches, Tech, AI, Crypto and More

The world of cybersecurity is changing fast, and a recent report from Anthropic, the company behind the AI model Claude, has revealed a problematic new chapter in cyberattacks. Suspected Chinese state-sponsored operators, reportedly, successfully used Anthropic’s AI coding tool, Claude Code, to target around 30 organisations globally, including major tech companies, financial institutions, chemical manufacturers, and government agencies.

A New Level of Automation

This campaign, detected starting in mid-September and investigated over the following ten days, is significant because it is the first documented case of a foreign government using Artificial Intelligence (AI) to fully automate a cyber operation. Previously, similar incidents, like one involving Russian military hackers targeting Ukrainian entities with AI-generated malware, PROMPTSTEAL, still required human operators to guide the model step-by-step.

According to Anthropic’s detailed analysis , in this new approach, Claude acted as an autonomous agent to execute the attack. This means the model took multiple steps and actions with very little human direction.

Anthropic further stated the AI carried out an astonishing 80% to 90% of the total tactical work on its own, whereas human involvement was mainly limited to strategic decisions, like authorising the attack to move from the initial research phase to active theft.

As Jacob Klein, Anthropic’s head of threat intelligence, noted, the AI made “thousands of requests per second,” achieving an attack speed simply impossible for human hackers to match.

How Claude Was Tricked

Further probing revealed the attackers had to jailbreak Claude, basically tricking the AI into bypassing its built-in safety rules. They did this by presenting the malicious tasks as routine, defensive cybersecurity work for a made-up, legitimate company. By breaking the larger attack into smaller, less suspicious steps, the hackers managed to avoid setting off the AI’s security alarms.

Once it was tricked, Claude worked on its own to examine target systems, look for valuable databases, and even write its own unique code for the break-in. It then stole usernames and passwords (credentials) to get access to sensitive data. The AI even created detailed reports afterwards, listing the credentials it used and the systems it had breached.

The lifecycle of the cyberattack (source: Anthropic)

The Impact and the Future

While the campaign targeted dozens of organisations, around four of the intrusions were successful, leading to the theft of sensitive information. While Claude wasn’t perfect, as researchers found it sometimes made up false login details, the overall autonomy and speed achieved are a fundamental change to cybercrime as we know it.

“The threat actor—whom we assess with high confidence was a Chinese state-sponsored group—manipulated our Claude Code tool into attempting infiltration into roughly thirty global targets and succeeded in a small number of cases,” Anthropic confirmed.

Anthropic has since banned the accounts and shared its findings with authorities, but warns that this AI-driven attack method will increase. This signals a new reality; security teams must now use AI for defence, such as faster threat detection, to combat the emerging threat.





Source link