National Cyber Warfare Foundation (NCWF)

Claude Fable 5 offers Mythos-level performance for most tasks with safeguards on sensitive topics. Anthropic claims testing found no universal jailbreaks. Whether that actually holds up in practice is harder to predict.

The post Anthropic’s new model is Mythos on a leash appeared first on CyberScoop.

Earlier this year, Anthropic executives said that their new AI model, Claude Mythos, had such powerful capabilities for harm that they would not release it publicly.

On Tuesday, the company said it was making an altered version of Mythos available to the public, promising “new guardrails” that thwart the model’s best-in-class performance in hacking and bioweapons research.

Anthropic said Claude Fable 5 was the “same underlying model” as Mythos, but its responses for certain topics like cybersecurity and biology will be drawn from a previous Claude Opus model that is already public.

“Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage,” the company said in a draft blog sent to CyberScoop ahead of the announcement. “We’ve therefore launched the model with safeguards that route queries on a narrow set of topics to our next-most-capable model, Claude Opus 4.8.”

Anthropic also said they subjected Fable 5 to both internal and external red team testing for common model vulnerabilities, like jailbreaking. Anthropic said these tests identified no known “universal” jailbreaking techniques, but does not specify if partial jailbreaking techniques were discovered.

The company is betting that won’t change when Fable 5 is made available to the broader public, but it’s worth noting that cybersecurity researchers have consistently found ways to jailbreak older AI models.

“The uplift from Mythos-level capabilities is valuable to many adversaries—for instance, those who could financially gain from cyberattacks—and we therefore expect them to be motivated to try to circumvent our safety measures,” the company wrote.

Anthropic is changing its data retention policies for Fable and Mythos models, keeping all user traffic for 30 days on both its own platforms and third-party services. This 30-day window aligns with a White House executive order that created a voluntary framework for AI companies to share frontier models with the government before public release. The company says the retained data won’t be used to train new Claude models or for “any non-safety-related-purpose.”

Most organizations are still deciding whether to adopt AI into their IT and cybersecurity ecosystem. But models like Mythos can scan for vulnerabilities, chain together exploits, and steal data from a victim network in minutes. Automation in hacking existed before AI, but experts have said frontier models like Mythos OpenAI’s Daybreak can allow even low-level cybercriminals to wreak havoc.

While Anthropic cited its commitment to developing safe and secure AI in its reasons for not publicly releasing Mythos, many organizations have been clamoring for access, and its enhanced cybersecurity functions in cybersecurity and other areas have been the subject of congressional hearings, national security papers and White House executive orders.

Releasing a limited version of the model in Fable 5 represents an attempt to split the difference between those two desires. Anthropic said it would release follow up benchmarks and assets for the model.

So what can Fable 5 do?

Anthropic said it’s possible the restrictions built into Fable will make it harder for the model to fulfill both malicious and legitimate user requests.

“Because we have prioritized safety, we’ve deliberately tuned the safeguards to be cautious, and they are still stricter than would be ideal—for example, sometimes benign requests will trigger our classifiers,” the company wrote. “We recognize that this will be frustrating to some users, and our aim is to reduce false positives as we update and refine the safeguards after launch.”

If Fable 5 draws its cybersecurity and biology answers entirely from Claude Opus 4.8, it will still provide users with impressive – though not unique – dual use cybersecurity capabilities.

According to the system card published for Opus 4.8, the model is a slight improvement on previous models like 4.7 in the realm of cybersecurity but was “generally much less capable than Mythos Preview.”

Opus 4.8 was tested on its ability to write complete end-to-end exploits and build exploit primitives that provide attackers with the ability to execute arbitrary code. It averaged a score just 5 out of 16 in proficiency, compared to Mythos Preview which scored closer to 10.

Without safety guardrails in place, Opus 4.8 can still reproduce nearly 80% of previously discovered vulnerabilities in real open-source software projects when given a high level description of the weakness. The system card said Anthropic’s unspecified safeguards whittle this success rate down to 1%.

Another test assessing Opus’ ability to develop exploits for the popular Firefox browser found that, again without guardrails, the model could identify a full working exploit 8.8% of the time and a partial working exploit 68.8% of the time.

The company also said that members of Project Glasswing – a consortium of public and private businesses given access to a preview version of Mythos – will be able to upgrade to the latest full model, Claude Mythos 5, to continue their work. Access to Mythos 5 will be expanded over time “through a more systematic trusted-access program” including federal agencies.

The post Anthropic’s new model is Mythos on a leash appeared first on CyberScoop.

Source: CyberScoop
Source Link: https://cyberscoop.com/anthropic-claude-fable-5-release-mythos-guardrails/

Anthropic’s new model is Mythos on a leash

So what can Fable 5 do?

Comments