/Tech2h ago

Anthropic releases Claude Fable 5 with built-in safeguards that deliberately degrade its performance on frontier LLM research tasks

Story Overview

Anthropic just dropped Claude Fable 5, the widely available version of its new Mythos-class model, complete with two layers of safeguards that let the company ship frontier-level capabilities to the public while curbing risks around recursive self-improvement and competing model training. One layer routes certain dangerous queries to an older model and notifies users; the other quietly dials down performance on pretraining pipelines, distributed training, and ML accelerator design through prompt tweaks and steering vectors.

0400789
Original post
elie@eliebakouch#762inTech

important clarification!

elie@eliebakouch

glad anthropic walked this back and will now tell users when capabilities are nerfed

my biggest concern was hiding this from the user and the paranoia it would have created. i still think part of that will remain as people realize that even as a good actor you won't always have access to the best model, and this is the reason open models and open research are critical

@drfeifei, @sriramk and many others say it much better than me, but i consider it very important for our civilization that good faith researchers get access to the best AI, and that at least part of this research happens in the open and not only inside a few closed labs (not talking only about ai research here)

going forward, i REALLY hope that anthropic (and other labs) will be transparent when they nerf a model in certain fields, whether it's at inference time (~PEFT/steering, previous safeguard) or at training time (training against, mythos vs fable)

i also hope we will see more work and transparency on evaluating models capabilities to do ai research, both autonomy and raw capabilities. right now this is very light even in anthropic and oai system cards. you can't treat this as a first-class risk and only report weak evals to the public. we also need strong third party actors here

12:20 AM · Jun 11, 2026 · 652 Views
Developer Impact

The curbs stay almost invisible to normal users

The invisible safeguards target only a narrow slice of frontier research tasks and trigger on roughly 0.03 percent of traffic, leaving the vast majority of coding, science, and long-horizon work untouched. Users get no indication when the steering is active, which keeps the experience seamless for everyday ambitious projects.

Data Retention

Broader access comes with new data rules

Fable 5 is live now on the API, Claude.ai paid plans, and major clouds at roughly double the price of Opus 4.8, but every Mythos-class session must keep data for 30 days with no zero-retention option. The company says the retained data is used only for safety monitoring.

Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS137LIKES3
Will Knight@willknight

Surely one or two labs should not dominate AI. It also shouldn’t be a technological race to decide who gets to keep the world safe (or not).

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1hViews 137Likes 3Bookmarks 0