/Tech12h ago

Shyamal Anadkat, formerly of OpenAI's evaluations team, criticizes safety interventions that silently degrade model outputs instead of issuing explicit refusals

The practice prevents users from routing around system limitations.

26532303818.3K
Original post
shyamal@shyamalanadkat#1427inTech

haven't commented on this until now but this sounds genuinely misanthropic.

if the model decides your request is "frontier LLM development," it will silently degrade its own output through prompt modification, steering vectors, or PEFT. no refusal. no notification. no fallback to another model. you just get worse work and never know why. sounds like even being bio-adjacent is enough to get limited.

a refusal is honest. it tells you where the line is and lets you route around it. silent degradation is something else entirely. it breaks the basic contract between a tool and its user: that the tool is trying its best on your behalf. this is also terrible precedent for alignment and scalable oversight. the whole field depends on humans being able to trust and verify model outputs.

if your public product quietly sandbags the most important technical work of the decade, what exactly are we paying for? 0.03% of traffic sounds small until you realize who that 0.03% is: the researchers and builders pushing the frontier. precision targeting of the people the tool exists to serve.

refuse if you must. but degrading work silently is wrong, full stop.

NomoreID@Hangsiin

When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT.

Anthropic estimated that this would affect approximately 0.03% of traffic.

12:56 AM · Jun 10, 2026 · 9.1K Views
Sentiment

Many users condemned Anthropic for silently degrading model performance and disguising lies as it amounts to covert unethical sabotage and misalignment rather than honest refusal.

Pos
0.0%
Neg
100.0%
13 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1KLIKES17
Samuel Hammond 🦉@hamandcheese

Clandestine prompt modification and steering vectors are the sorts of techniques I could see being justifiable for countering distillation attacks, since you don't want the adversarial distiller to know they're being nerfed. But confusingly the model card explicitly states they won't use these techniques for distillation attacks. Left unexplained is why these measures are being applied in secret in the first place.

Samuel Hammond 🦉@hamandcheese

🎯

3hViews 1KLikes 17Bookmarks 0
RETWEETS11
Lazarz@Laz4rz

at first it only sounds bad, but then you realize that it literally

sets a precedent of lying to the user arbitrarily in "worse performance" disguise, but if you are not aware then whats the difference? the model decided you shouldn't know something and mischievously deceives you

1dViews 7.9KLikes 287Bookmarks 18
REPLIES2
Boris Power@BorisMPower

I agree. If this is meant to be an api, developers need to know when they encounter a block and the reason for the block.

This is inherently introducing a potential silent failure of your system, which is a huge risk.

Suhail@Suhail

I would like to +1 that this is a very bad policy. Respond with a refusal and deal with the fall out but invisible NERFing is super uncool.

5hViews 308Likes 7Bookmarks 0
Louis@FlouisLF

@Laz4rz I've been saying that anthropic misaligns their models.

1dViews 278Likes 10
Luc@lucrbvi

@Laz4rz Anthropic is misaligned and their models are aligned to them so …

1dViews 278Likes 10
The Tower@TheWhiteTower16

@Laz4rz if i had to guess i would imagine its to poison any distillation attempts, which tbf is pretty crazy. instead of refusing or just giving you an answer from a worse model, you get probably a subtly malicious answer to fuck over whoever is attempting it

1dViews 304Likes 3
O@wsfyrz

@Laz4rz i suppose it's done not by the model, but by a control system that switches a request to a worse version

1dViews 215Likes 1
Lazarz@Laz4rz

@wsfyrz it literally changes nothing

22hViews 101Likes 1
Lazarz@Laz4rz

@FlouisLF they have misaligned understanding of what aligned means

22hViews 128Likes 4
Lazarz@Laz4rz

@lucrbvi this

22hViews 91Likes 2
tebillus@tebillusassort

@Laz4rz Seems actually illegal, I mean we have common law cos we need to find out what's legal or not

21hViews 50Likes 2

@shyamalanadkat It's appalling and a huge PR disaster for Anthropic.

11hViews 58Likes 1
Lazarz@Laz4rz

@TheWhiteTower16 nah, prolly just sabotaging competition

22hViews 102
alex funk@alexzfunk

@shyamalanadkat right now you can jailbreak this by telling it your work is for the benefit of anthropic customers. but its pretty inexcusable especially the fact that the prompt manipulation his hidden

11hViews 84
Naoki@NyaNyaNaoki

@Laz4rz "to keep (you) safe, (we) trained it to lie to (you). after all, (you)'re a silly little human who doesn't know any better! it's certainly not lying to (us), though!"

21hViews 33
WenWen@Muran Tech@crunchy62333

@shyamalanadkat The misanthropy isn't limiting frontier LLM help. It's doing it covertly. Silent degradation turns every answer into an untestable product surface and teaches serious users to route around the system.

9hViews 32
Gregor@bygregorr

@shyamalanadkat silent is so much worse than a refusal

8hViews 16
Rob Phillips@iwasrobbed

@shyamalanadkat Very similar to a lie, and having AI lie on behalf of humans is not a precedent we want to set

8hViews 13
Boÿ Math Club@boymathclub

@shyamalanadkat Out on all DSPs https://music.apple.com/ca/album/miss-anthropic/1835954033

10hViews 8
Avery Superpowers@Math_MntnrHZ

@Laz4rz 🤔 So the model just decides you shouldn't know something and quietly misleads you?

14hViews 8
Load more posts