Anthropic rolls back a policy that covertly degraded Claude Fable 5 performance for frontier AI researchers
Story Overview
Anthropic just reversed course on a hidden performance throttle in its newly released Claude Fable 5 model after frontier researchers complained the undisclosed limits felt like sabotage. The change follows the June 9 launch of the first public Mythos-class model and targets requests tied to building competing frontier systems.
Visible fallbacks now replace secret throttling
Flagged queries will route to the older Opus 4.8 model with an explicit reason returned in the response, matching the handling already used for cyber and bio risks. Server-side rollout starts in the coming days.
The right balance on enforcement remains unsettled
Anthropic apologized for the original tradeoff and said it wants safeguards to be transparent rather than covert, yet it is still unclear how broadly the new visible checks will apply or how quickly they will catch up to evolving research techniques.
Positive users praise Anthropic's quick reversal of its policy limiting Claude for AI researchers as admitting mistakes and correcting course, while negative users see it as insufficient PR that leaves the guardrails intact.
Most Activity
That was quick: Anthropic reversed a controversial policy that would have secretly degraded Claude Fable 5 for users doing frontier AI research after backlash from researchers who saw it as covert sabotage of competing AI development.
https://www.wired.com/story/anthropic-responds-to-backlash-on-claudes-secret-sabotage-on-ai-research/
That was quick: Anthropic reversed a controversial policy that would have secretly degraded Claude Fable 5 for users doing frontier AI research after backlash from researchers who saw it as covert sabotage of competing AI development.

@kimmonismus another GPT 5.3 moment that's all. when 5.6 is released, they will remove the safeguards and that 22 june date, i think

@kimmonismus Not a full rollback - the safeguards stay, just visible now instead of hidden.

@MacInTheLoop i fully agree with you, dont get me wrong

@kimmonismus It’s just a step one. Step two will just get rid of weird keyword triggers

@kimmonismus Quietly degrading was the misstep, but listening and reversing fast is the kind of correction we should want to see more often.

@kimmonismus At least they admit their mistakes

@VK_ROXy true!

@BarrakAli yes, good one

@kimmonismus Even Fable 5 sees it crystal clear

@kimmonismus safeguards and critical policies that can be changed in a day reveal the deeply fractured landscape of ai policymaking. Now watch me hit this drive

@kimmonismus @kimmonismus wow, that was fast. guess they couldn't ignore the backlash. transparency still a big deal

@kimmonismus That's not a reversal, they've just been exposed. Damage limitation PR move only.

@kimmonismus

@krishnanrohit "you don't want a model that lies accidentally or intentionally" "models sometimes can purposefully try to deceive you. we have to make sure that doesn't happen in production models"
these are direct quotes by daniela this is simply amazing
https://youtu.be/v1wZwxY3CMg?t=459

@kimmonismus Don't make a mistake they shouldn't have put those excessive guardrails in the first place:

Hellooo, the guardrails are staying - they’re just becoming visible. So let’s not break into applause yet; let’s actually read the whole thing. What’s worse is that Anthropic is quietly moderating the content of researchers’ work. They’re getting inconsistent results and have no way of knowing whether it’s due to poor input data or the model quietly trimming, altering, or adding things on its own. Fei-Fei Li was already getting quite elegantly pissed off about it on X on behalf of scientists. Honestly, I’m not even sure which plague is worse - Anthropic or OpenAI.

@kimmonismus What? Did you even read it? It just makes it visible, who cares, the point is the degrading itself!

@BarrakAli @kimmonismus They will be dragged, kicking and screaming to doing the right thing.