/Tech5h ago

Anthropic will make Claude Fable 5 safeguard blocks visible to users following backlash over silent technical query filtering

AI Judge changed title after evaluation, original title: "Anthropic stops silently degrading Claude Fable 5 performance on AI research queries after community backlash"

Story Overview

After releasing Claude Fable 5 with hidden throttles that quietly rerouted frontier LLM development queries, Anthropic is shifting to explicit fallbacks so users can see exactly when requests get handed off to the older Opus 4.8 model.

7915K132721463.1K
Original post
Simon Willison@simonw#197inTech

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

SemiAnalysis@SemiAnalysis_

BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭

8:52 PM · Jun 10, 2026 · 110.1K Views
Developer Impact

Researchers now get refusal reasons through the API

The change turns previously invisible performance hits into transparent redirects, letting developers understand why certain ML research prompts trigger the older model instead of guessing at silent degradation.

Open Question

Scope and rollout details stay unspecified for now

Anthropic has not yet published the complete list of affected query categories or a firm timeline beyond the announcement, leaving the full reach of the visible safeguards open.

Sentiment

Positive users praise Anthropic's reversal of hidden Claude safeguards as a transparent move, while negative users say the original secret degradation destroyed any remaining trust in the company.

Pos
22.9%
Neg
77.1%
40 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS10.3KBOOKMARKS14LIKES316
kache@yacineMTB

They're lying

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

5hViews 10.3KLikes 316Bookmarks 14
RETWEETS15
ClaudeDevs@ClaudeDevs

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.

Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).

We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.

Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible.

If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in http://Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. https://support.claude.com/en/articles/8241253-safeguards-warnings-and-appeals

3hViews 224.7KLikes 2.4KBookmarks 468
REPLIES18
Robert Scoble@Scobleizer

Well, this was to be somewhat expected that a PR team would walk back a bad decision.

It doesn't seem like they really walked it back, though. I mean, you still don't get Mythos, er, Fable, for certain questions; it's just that now they're more honest about it. I hope I am reading that right.

My feed has quite a few still disappointed in this reply.

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

5hViews 6.8KLikes 65Bookmarks 5

Don't miss the exact text though: "We’re changing Fable 5’s safeguards for frontier LLM development to make them visible" - make them visible means they're undoing the truly egregious (dare I say "unaligned") decision to have the model lie about its refusals, it will still refuse

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

4hViews 9.8KLikes 130Bookmarks 7
Andrew Curran@AndrewCurran_

New policy: 'Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens.'

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

5hViews 4.3KLikes 69Bookmarks 7
“paula”@paularambles

you’re telling me they claude it back

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

5hViews 3.4KLikes 91Bookmarks 2
kache@yacineMTB

@simonw @RamonDarioIT Did they though?

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

5hViews 3KLikes 76Bookmarks 0
Cody Blakeney@code_star

About 24 hours it seems.

I’m glad to see them course correct on this as well. Model moderation and safeguards have always been a feature of deployed frontier models, but obfuscation without warning is a violations of the contract between the user and the provider.

Cody Blakeney@code_star

How long do we expect it will be until Anthropic makes an official post clarifying the statements to not spook its customers?

4hViews 4.2KLikes 30Bookmarks 1
Alex Volkov@altryne

Damn, look at that!

I was about to go and cover the whole "we degrade model performance silently" tomorrow and they've about to reverse this decision!

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

5hViews 1.5KLikes 11Bookmarks 4
kache@yacineMTB

@simonw @RamonDarioIT I find it hard to believe them anymore :(

kache@yacineMTB

@simonw @RamonDarioIT Did they though?

5hViews 1.3KLikes 47Bookmarks 0
Cody Blakeney@code_star

Weird that they do this through wired and not idk … their official twitter or website or comms.

Really doing it as quietly as possible. It’s clear they don’t actually feel like they did anything wrong.

Cody Blakeney@code_star

About 24 hours it seems.

I’m glad to see them course correct on this as well. Model moderation and safeguards have always been a feature of deployed frontier models, but obfuscation without warning is a violations of the contract between the user and the provider.

4hViews 1.5KLikes 21Bookmarks 1
Awni Hannun@awnihannun

Good call

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

5hViews 424Likes 7Bookmarks 3
Drew Hawkswood✨@DrewHawkswood

@MTSlive They're not backtracking, they're doubling down to censor you even more. They hope you're gullible enough to fall for the "backtracking". As a reminder @AnthropicAI Is also retaining data whether you're an enterprise user or not, with no clear deletion mechanism, against GDPR.

5hViews 198Likes 1Bookmarks 2
gabriel@gabeviggers

@ClaudeDevs Our lords at Anthropic have graciously allowed the peasants a few more tokens before Fable tells you to go fuck yourself.

3hViews 129Likes 14
Choco@RobotChocobo

@simonw You failed the IQ test. Their intention hasn't changed, only their timing.

5hViews 172Likes 5Bookmarks 1
Andrew Curran@AndrewCurran_

@d29756183 There is no official statement yet, they appear to have only spoken to Wired.

5hViews 7Likes 2Bookmarks 1
Ron Stauffer@ronstauffer

@MTSlive

4hViews 166Likes 3Bookmarks 1
MTS@MTSlive

SITUATION UPDATE: Anthropic is reversing its Fable 5 policy of covertly degrading performance for competing AI researchers, per Wired.

5hViews 62KLikes 747Bookmarks 68
Cody Blakeney@code_star

Link to post and writing

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

4hViews 477Likes 0Bookmarks 1
Ahmad@TheAhmadOsman

@code_star Incorrect.

4hViews 104Likes 1Bookmarks 1
Load more posts