/Tech2h ago

Anthropic will make Claude Fable 5 safeguard blocks visible to users following backlash over silent technical query filtering

AI Judge changed title after evaluation, original title: "Anthropic stops silently degrading Claude Fable 5 performance on AI research queries after community backlash"

Story Overview

After releasing Claude Fable 5 with hidden throttles that quietly rerouted frontier LLM development queries, Anthropic is shifting to explicit fallbacks so users can see exactly when requests get handed off to the older Opus 4.8 model.

1791.4K77124125.7K

Original post

Simon Willison@simonw#197inTech

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

SemiAnalysis@SemiAnalysis_

BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭

8:52 PM · Jun 10, 2026 · 64.3K Views

/Tech2h ago

Anthropic will make Claude Fable 5 safeguard blocks visible to users following backlash over silent technical query filtering

AI Judge changed title after evaluation, original title: "Anthropic stops silently degrading Claude Fable 5 performance on AI research queries after community backlash"

Story Overview

1791.4K77124125.7K

Original post

Simon Willison@simonw#197inTech

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

SemiAnalysis@SemiAnalysis_

8:52 PM · Jun 10, 2026 · 64.3K Views

Developer Impact

Researchers now get refusal reasons through the API

The change turns previously invisible performance hits into transparent redirects, letting developers understand why certain ML research prompts trigger the older model instead of guessing at silent degradation.

Open Question

Scope and rollout details stay unspecified for now

Anthropic has not yet published the complete list of affected query categories or a firm timeline beyond the announcement, leaving the full reach of the visible safeguards open.

Sentiment

Many users distrusted Anthropic after its covert Claude performance degradation was revealed, viewing the reversal as an admission of manipulation rather than a real fix, while a few praised walking the policy back.

Pos

4.5%

Neg

95.5%

48 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.7KLIKES145

kache@yacineMTB

They're lying

Simon Willison@simonw

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1h5.7K1455

BOOKMARKS6

Andrew Curran@AndrewCurran_

New policy: 'Starting this week, flagged requests will visibly fall back to Opus 4.8. On the API, any flagged requests will return a reason for their refusal. You will see this every time it happens.'

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

1h3.3K576

RETWEETS15

ClaudeDevs@ClaudeDevs

We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.

Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days).

We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right.

Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible.

If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in http://Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. https://support.claude.com/en/articles/8241253-safeguards-warnings-and-appeals

11m3.9K9318

REPLIES12

Robert Scoble@Scobleizer

Well, this was to be somewhat expected that a PR team would walk back a bad decision.

It doesn't seem like they really walked it back, though. I mean, you still don't get Mythos, er, Fable, for certain questions; it's just that now they're more honest about it. I hope I am reading that right.

My feed has quite a few still disappointed in this reply.

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

1h4.6K383

“paula”@paularambles

you’re telling me they claude it back

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

1h2.7K742

kache@yacineMTB

@simonw @RamonDarioIT Did they though?

Simon Willison@simonw

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

2h2.3K550

Simon Willison@simonw

Don't miss the exact text though: "We’re changing Fable 5’s safeguards for frontier LLM development to make them visible" - make them visible means they're undoing the truly egregious (dare I say "unaligned") decision to have the model lie about its refusals, it will still refuse

Simon Willison@simonw

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1h3.6K431

Cody Blakeney@code_star

About 24 hours it seems.

I’m glad to see them course correct on this as well. Model moderation and safeguards have always been a feature of deployed frontier models, but obfuscation without warning is a violations of the contract between the user and the provider.

Cody Blakeney@code_star

How long do we expect it will be until Anthropic makes an official post clarifying the statements to not spook its customers?

1h2.1K230

Alex Volkov@altryne

Damn, look at that!

I was about to go and cover the whole "we degrade model performance silently" tomorrow and they've about to reverse this decision!

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

1h1.1K93

kache@yacineMTB

@simonw @RamonDarioIT I find it hard to believe them anymore :(

kache@yacineMTB

@simonw @RamonDarioIT Did they though?

2h1K340

Awni Hannun@awnihannun

Good call

Max Zeff@ZeffMax

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash.

1h42473

Drew Hawkswood✨@DrewHawkswood

@MTSlive They're not backtracking, they're doubling down to censor you even more. They hope you're gullible enough to fall for the "backtracking". As a reminder @AnthropicAI Is also retaining data whether you're an enterprise user or not, with no clear deletion mechanism, against GDPR.

1h19812

Cody Blakeney@code_star

Weird that they do this through wired and not idk … their official twitter or website or comms.

Really doing it as quietly as possible. It’s clear they don’t actually feel like they did anything wrong.

Cody Blakeney@code_star

About 24 hours it seems.

38m28990

Choco@RobotChocobo

@simonw You failed the IQ test. Their intention hasn't changed, only their timing.

1h17251

Andrew Curran@AndrewCurran_

@d29756183 There is no official statement yet, they appear to have only spoken to Wired.

1h721

Ron Stauffer@ronstauffer

@MTSlive

1h16631

MTS@MTSlive

SITUATION UPDATE: Anthropic is reversing its Fable 5 policy of covertly degrading performance for competing AI researchers, per Wired.

2h34.1K51446

Ahmad@TheAhmadOsman

@code_star Incorrect.

1h10411

Cody Blakeney@code_star

Link to post and writing

Simon Willison@simonw

Very pleased to hear Anthropic have walked back this policy https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/

1h21901

kache@yacineMTB

@kethcode Agreed on the trust. How could we possibly know??

1h1594