CISOOnline

Anthropic releases Mythos-class Fable 5 model with safeguards for cyber risks

According to Dianne Penn, Anthropic’s head of product management, research, and labs, the goal was to make Mythos-level intelligence broadly available without exposing users to the risks that previously kept the technology restricted. “We wanted to be able to provide this level of intelligence for general users in a safe manner,” Penn told The Wall Street Journal.

Safeguards may be broader than Anthropic suggests

When Anthropic released Mythos in April, it argued that the model’s capabilities in areas such as vulnerability discovery and offensive cybersecurity created risks that justified restricting access to around 50 recipients. Just a week ago, Anthropic announced it was expanding Mythos access to 150 organizations.

Now Anthropic says it has developed safeguards robust enough to support a broader release. Those safeguards work by routing certain categories of requests — including cybersecurity, biology, chemistry, and model-distillation-related queries — to the less capable Claude Opus 4.8. Anthropic says these fallbacks occur in fewer than 5% of sessions, meaning most users will effectively interact with the full Mythos-class model during ordinary use.

Early testing by security researchers suggests the cyber safeguards may be broader than Anthropic’s description implies. Rob T. Lee, chief AI officer and chief of research at SANS Institute, tells CSO that his routine cybersecurity tasks involving incident response, detection, and basic forensic workflows were automatically routed from Fable 5 to Opus 4.8 during his initial testing. If those observations hold up under broader testing, it could indicate that Anthropic’s classifiers are broadly identifying cybersecurity-related requests rather than attempting to distinguish between benign and malicious cyber activity.



Source link