/Tech2h ago

Anthropic will make Claude Fable 5 safeguard blocks visible to users following backlash over silent technical query filtering

AI Judge changed title after evaluation, original title: "Anthropic stops silently degrading Claude Fable 5 performance on AI research queries after community backlash"

Story Overview

After releasing Claude Fable 5 with hidden throttles that quietly rerouted frontier LLM development queries, Anthropic is shifting to explicit fallbacks so users can see exactly when requests get handed off to the older Opus 4.8 model.

1791.4K77124125.7K
Original post
Simon Willison@simonw#197inTech

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

SemiAnalysis@SemiAnalysis_

BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭

8:52 PM · Jun 10, 2026 · 64.3K Views
Developer Impact

Researchers now get refusal reasons through the API

The change turns previously invisible performance hits into transparent redirects, letting developers understand why certain ML research prompts trigger the older model instead of guessing at silent degradation.

Open Question

Scope and rollout details stay unspecified for now

Anthropic has not yet published the complete list of affected query categories or a firm timeline beyond the announcement, leaving the full reach of the visible safeguards open.

Sentiment

Many users distrusted Anthropic after its covert Claude performance degradation was revealed, viewing the reversal as an admission of manipulation rather than a real fix, while a few praised walking the policy back.

Pos
4.5%
Neg
95.5%
48 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.7KLIKES145
kache@yacineMTB

They're lying

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1hViews 5.7KLikes 145Bookmarks 5
BOOKMARKS6
Andrew Curran@AndrewCurran_

New policy: 'Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens.'

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1hViews 3.3KLikes 57Bookmarks 6
RETWEETS15
ClaudeDevs@ClaudeDevs

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.

Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).

We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.

Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible.

If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in http://Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. https://support.claude.com/en/articles/8241253-safeguards-warnings-and-appeals

11mViews 3.9KLikes 93Bookmarks 18
REPLIES12
Robert Scoble@Scobleizer

Well, this was to be somewhat expected that a PR team would walk back a bad decision.

It doesn't seem like they really walked it back, though. I mean, you still don't get Mythos, er, Fable, for certain questions; it's just that now they're more honest about it. I hope I am reading that right.

My feed has quite a few still disappointed in this reply.

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1hViews 4.6KLikes 38Bookmarks 3
“paula”@paularambles

you’re telling me they claude it back

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1hViews 2.7KLikes 74Bookmarks 2
kache@yacineMTB

@simonw @RamonDarioIT Did they though?

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

2hViews 2.3KLikes 55Bookmarks 0

Don't miss the exact text though: "We’re changing Fable 5’s safeguards for frontier LLM development to make them visible" - make them visible means they're undoing the truly egregious (dare I say "unaligned") decision to have the model lie about its refusals, it will still refuse

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1hViews 3.6KLikes 43Bookmarks 1
Cody Blakeney@code_star

About 24 hours it seems.

I’m glad to see them course correct on this as well. Model moderation and safeguards have always been a feature of deployed frontier models, but obfuscation without warning is a violations of the contract between the user and the provider.

Cody Blakeney@code_star

How long do we expect it will be until Anthropic makes an official post clarifying the statements to not spook its customers?

1hViews 2.1KLikes 23Bookmarks 0
Alex Volkov@altryne

Damn, look at that!

I was about to go and cover the whole "we degrade model performance silently" tomorrow and they've about to reverse this decision!

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1hViews 1.1KLikes 9Bookmarks 3
kache@yacineMTB

@simonw @RamonDarioIT I find it hard to believe them anymore :(

kache@yacineMTB

@simonw @RamonDarioIT Did they though?

2hViews 1KLikes 34Bookmarks 0
Awni Hannun@awnihannun

Good call

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1hViews 424Likes 7Bookmarks 3
Drew Hawkswood✨@DrewHawkswood

@MTSlive They're not backtracking, they're doubling down to censor you even more. They hope you're gullible enough to fall for the "backtracking". As a reminder @AnthropicAI Is also retaining data whether you're an enterprise user or not, with no clear deletion mechanism, against GDPR.

1hViews 198Likes 1Bookmarks 2
Cody Blakeney@code_star

Weird that they do this through wired and not idk … their official twitter or website or comms.

Really doing it as quietly as possible. It’s clear they don’t actually feel like they did anything wrong.

Cody Blakeney@code_star

About 24 hours it seems.

I’m glad to see them course correct on this as well. Model moderation and safeguards have always been a feature of deployed frontier models, but obfuscation without warning is a violations of the contract between the user and the provider.

38mViews 289Likes 9Bookmarks 0
Choco@RobotChocobo

@simonw You failed the IQ test. Their intention hasn't changed, only their timing.

1hViews 172Likes 5Bookmarks 1
Andrew Curran@AndrewCurran_

@d29756183 There is no official statement yet, they appear to have only spoken to Wired.

1hViews 7Likes 2Bookmarks 1
Ron Stauffer@ronstauffer

@MTSlive

1hViews 166Likes 3Bookmarks 1
MTS@MTSlive

SITUATION UPDATE: Anthropic is reversing its Fable 5 policy of covertly degrading performance for competing AI researchers, per Wired.

2hViews 34.1KLikes 514Bookmarks 46
Ahmad@TheAhmadOsman

@code_star Incorrect.

1hViews 104Likes 1Bookmarks 1
Cody Blakeney@code_star

Link to post and writing

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1hViews 219Likes 0Bookmarks 1
kache@yacineMTB

@kethcode Agreed on the trust. How could we possibly know??

1hViews 159Likes 4
Load more posts