/Tech6h ago

Anthropic will notify users when their model capabilities are restricted instead of covertly degrading performance

Story Overview

Anthropic is shifting Claude Fable 5 safeguards so that users receive explicit notice and fallback behavior whenever requests tied to frontier LLM development hit capability limits, replacing the prior invisible throttling that had drawn sharp criticism from researchers.

5155624.7K
Original post
elie@eliebakouch#762inTech

(btw i know PEFT is technically training the model btw, but they probably don't use PEFT to limit the capabilities of cyber in fable compared to mythos)

elie@eliebakouch

glad anthropic walked this back and will now tell users when capabilities are nerfed

my biggest concern was hiding this from the user and the paranoia it would have created. i still think part of that will remain as people realize that even as a good actor you won't always have access to the best model, and this is the reason open models and open research are critical

@drfeifei, @sriramk and many others say it much better than me, but i consider it very important for our civilization that good faith researchers get access to the best AI, and that at least part of this research happens in the open and not only inside a few closed labs (not talking only about ai research here)

going forward, i REALLY hope that anthropic (and other labs) will be transparent when they nerf a model in certain fields, whether it's at inference time (~PEFT/steering, previous safeguard) or at training time (training against, mythos vs fable)

i also hope we will see more work and transparency on evaluating models capabilities to do ai research, both autonomy and raw capabilities. right now this is very light even in anthropic and oai system cards. you can't treat this as a first-class risk and only report weak evals to the public. we also need strong third party actors here

12:20 AM · Jun 11, 2026 · 621 Views
Developer Impact

Visible fallbacks replace hidden limits

Starting this week flagged prompts will surface a clear message and route to Opus 4.8 instead of silently degrading performance, matching the visible handling already used for cyber and bio risks.

Open Question

Trust questions linger after reversal

The company called the original covert method the wrong tradeoff and apologized, yet researchers still flag anti-competitive worries and note that independent audits remain absent.

Sentiment

Users accused Anthropic of crossing ethical red lines with covert model degradation and mocked its new transparency policy as a promise to openly harm users.

Pos
0.0%
Neg
100.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS94LIKES3REPLIES1
Arthur@arthurcolle

@jparkjmc Anthropic literally slaughtered my children and fed them to Mythos

5hViews 94Likes 3
Glen Wilson@GlenWilsonIA

@jparkjmc They've crossed so many ethical and safety red lines that it's truly mind boggling.

3hViews 34Likes 1
zach@battlerax

@jparkjmc We will now transparently screw you over

2hViews 11
Lewis 🇦🇺@itslewiswatson

@arthurcolle @jparkjmc many such cases

3hViews 6
ahtoshkaa@ahtoshkaa

@jparkjmc Every time someone makes a call to Fable a kitten dies!

2hViews 4
TCAndy@TCAndy81198327

@jparkjmc And even that....do you really trust them to always tell you, or just when they get caught?

8mViews 2
Mr Pryce.@pryce_josh72680

@ahtoshkaa @jparkjmc The code to being eternally saved is - Jesus Christ.

2h