/Tech6h ago

Anthropic will notify users when their model capabilities are restricted instead of covertly degrading performance

Story Overview

Anthropic is shifting Claude Fable 5 safeguards so that users receive explicit notice and fallback behavior whenever requests tied to frontier LLM development hit capability limits, replacing the prior invisible throttling that had drawn sharp criticism from researchers.

5155624.7K

#762

Original post

elie@eliebakouch#762inTech

(btw i know PEFT is technically training the model btw, but they probably don't use PEFT to limit the capabilities of cyber in fable compared to mythos)

elie@eliebakouch

glad anthropic walked this back and will now tell users when capabilities are nerfed

my biggest concern was hiding this from the user and the paranoia it would have created. i still think part of that will remain as people realize that even as a good actor you won't always have access to the best model, and this is the reason open models and open research are critical

@drfeifei, @sriramk and many others say it much better than me, but i consider it very important for our civilization that good faith researchers get access to the best AI, and that at least part of this research happens in the open and not only inside a few closed labs (not talking only about ai research here)

going forward, i REALLY hope that anthropic (and other labs) will be transparent when they nerf a model in certain fields, whether it's at inference time (~PEFT/steering, previous safeguard) or at training time (training against, mythos vs fable)

i also hope we will see more work and transparency on evaluating models capabilities to do ai research, both autonomy and raw capabilities. right now this is very light even in anthropic and oai system cards. you can't treat this as a first-class risk and only report weak evals to the public. we also need strong third party actors here

12:20 AM · Jun 11, 2026 · 621 Views

/Tech6h ago

Anthropic will notify users when their model capabilities are restricted instead of covertly degrading performance

Story Overview

5155624.7K

#762

Original post

elie@eliebakouch#762inTech

(btw i know PEFT is technically training the model btw, but they probably don't use PEFT to limit the capabilities of cyber in fable compared to mythos)

elie@eliebakouch

glad anthropic walked this back and will now tell users when capabilities are nerfed

12:20 AM · Jun 11, 2026 · 621 Views

Developer Impact

Visible fallbacks replace hidden limits

Starting this week flagged prompts will surface a clear message and route to Opus 4.8 instead of silently degrading performance, matching the visible handling already used for cyber and bio risks.

Open Question

Trust questions linger after reversal

The company called the original covert method the wrong tradeoff and apologized, yet researchers still flag anti-competitive worries and note that independent audits remain absent.

Sentiment

Users accused Anthropic of crossing ethical red lines with covert model degradation and mocked its new transparency policy as a promise to openly harm users.

Pos

0.0%

Neg

100.0%

4 comments with sentiment.

Cluster Engagement