HelpnetSecurity

Claude Sonnet 5 includes safeguards against dangerous cyber use


Anthropic has introduced Claude Sonnet 5, the latest version of its general-purpose AI model, with improved reasoning, coding, tool use, and knowledge work capabilities. The model can make plans, use tools such as browsers and terminals, and complete tasks autonomously.

Scores for Sonnet 5 on a variety of evaluations compared to those of Sonnet 4.6 and Opus 4.8 (Source: Anthropic)

The company says Sonnet 5 improves reasoning, coding, tool use, and knowledge-work tasks.

“Our safety assessments found that Sonnet 5 shows an overall lower rate of undesirable behaviors than Sonnet 4.6 and is generally safer to use in agentic contexts. Evaluations show that it has a much lower ability to perform cybersecurity tasks than our current Opus models,” Anthropic said.

Cybersecurity safeguards

Cybersecurity safeguards are enabled by default in Sonnet 5. The system detects and blocks dangerous cybersecurity activity in real time using the same protections as Opus 4.7 and 4.8. Anthropic says the safeguards are less restrictive than those in Fable 5 because it considers Sonnet 5 to present a lower overall cybersecurity risk.

Sonnet 5 is part of Anthropic’s Cyber Verification Program, which gives approved organizations access to reduced guardrails for legitimate security research. The company recommends Opus 4.8 for cybersecurity work that requires fewer restrictions.

Performance benchmarks

The company evaluated Sonnet 5 against Sonnet 4.6 and Opus 4.8 using the BrowseComp benchmark for agentic search and OSWorld-Verified for computer-use tasks. According to Anthropic, Sonnet 5 outperformed Sonnet 4.6 across all effort levels while delivering better cost efficiency. At higher effort settings, it matched Opus 4.8 on some tasks.

The latest model is better at refusing malicious requests and resisting prompt injection attacks, according to Anthropic. It hallucinates less, shows lower rates of sycophancy than Sonnet 4.6, and scored lower in the company’s automated behavioral audit, indicating fewer undesirable behaviors.

The model can perform routine, non-harmful security work, but scored lower than Opus 4.8 and Mythos 5 on dangerous cybersecurity tasks. It cannot develop a working exploit, although it achieved a slightly higher rate of partial success than Sonnet 4.6, which Anthropic attributes to improvements in general intelligence.

Availability and pricing

Claude Sonnet 5 is available on all Claude plans. It is the default model for Free and Pro users and is available to Max, Team, and Enterprise customers. It is included in Claude Code and on the Claude Platform.

Through August 31, 2026, API pricing is $2 per million input tokens and $10 per million output tokens. After that, the price will increase to $3 per million input tokens and $15 per million output tokens.



Source link