/Tech12h ago

Shyamal Anadkat, formerly of OpenAI's evaluations team, criticizes safety interventions that silently degrade model outputs instead of issuing explicit refusals

The practice prevents users from routing around system limitations.

26532303818.3K

#377

Original post

shyamal@shyamalanadkat#1427inTech

haven't commented on this until now but this sounds genuinely misanthropic.

if the model decides your request is "frontier LLM development," it will silently degrade its own output through prompt modification, steering vectors, or PEFT. no refusal. no notification. no fallback to another model. you just get worse work and never know why. sounds like even being bio-adjacent is enough to get limited.

a refusal is honest. it tells you where the line is and lets you route around it. silent degradation is something else entirely. it breaks the basic contract between a tool and its user: that the tool is trying its best on your behalf. this is also terrible precedent for alignment and scalable oversight. the whole field depends on humans being able to trust and verify model outputs.

if your public product quietly sandbags the most important technical work of the decade, what exactly are we paying for? 0.03% of traffic sounds small until you realize who that 0.03% is: the researchers and builders pushing the frontier. precision targeting of the people the tool exists to serve.

refuse if you must. but degrading work silently is wrong, full stop.

NomoreID@Hangsiin

When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT.

Anthropic estimated that this would affect approximately 0.03% of traffic.

12:56 AM · Jun 10, 2026 · 9.1K Views

/Tech12h ago

Shyamal Anadkat, formerly of OpenAI's evaluations team, criticizes safety interventions that silently degrade model outputs instead of issuing explicit refusals

The practice prevents users from routing around system limitations.

26532303818.3K

#377

Original post

shyamal@shyamalanadkat#1427inTech

haven't commented on this until now but this sounds genuinely misanthropic.

refuse if you must. but degrading work silently is wrong, full stop.

NomoreID@Hangsiin

When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT.

Anthropic estimated that this would affect approximately 0.03% of traffic.

12:56 AM · Jun 10, 2026 · 9.1K Views

Sentiment

Many users condemned Anthropic for silently degrading model performance and disguising lies as it amounts to covert unethical sabotage and misalignment rather than honest refusal.

Pos

0.0%

Neg

100.0%

13 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1KLIKES17

Samuel Hammond 🦉@hamandcheese

Clandestine prompt modification and steering vectors are the sorts of techniques I could see being justifiable for countering distillation attacks, since you don't want the adversarial distiller to know they're being nerfed. But confusingly the model card explicitly states they won't use these techniques for distillation attacks. Left unexplained is why these measures are being applied in secret in the first place.

Samuel Hammond 🦉@hamandcheese

🎯

3h1K170

RETWEETS11

Lazarz@Laz4rz

at first it only sounds bad, but then you realize that it literally

sets a precedent of lying to the user arbitrarily in "worse performance" disguise, but if you are not aware then whats the difference? the model decided you shouldn't know something and mischievously deceives you

1d7.9K28718

REPLIES2

Boris Power@BorisMPower

I agree. If this is meant to be an api, developers need to know when they encounter a block and the reason for the block.

This is inherently introducing a potential silent failure of your system, which is a huge risk.

Suhail@Suhail

I would like to +1 that this is a very bad policy. Respond with a refusal and deal with the fall out but invisible NERFing is super uncool.

5h30870

Louis@FlouisLF

@Laz4rz I've been saying that anthropic misaligns their models.

1d27810

Luc@lucrbvi

@Laz4rz Anthropic is misaligned and their models are aligned to them so …

1d27810

The Tower@TheWhiteTower16

@Laz4rz if i had to guess i would imagine its to poison any distillation attempts, which tbf is pretty crazy. instead of refusing or just giving you an answer from a worse model, you get probably a subtly malicious answer to fuck over whoever is attempting it

1d3043

O@wsfyrz

@Laz4rz i suppose it's done not by the model, but by a control system that switches a request to a worse version

1d2151

Lazarz@Laz4rz

@wsfyrz it literally changes nothing

22h1011

Lazarz@Laz4rz

@FlouisLF they have misaligned understanding of what aligned means

22h1284

Lazarz@Laz4rz

@lucrbvi this

22h912

tebillus@tebillusassort

@Laz4rz Seems actually illegal, I mean we have common law cos we need to find out what's legal or not

21h502

Fergus Meiklejohn@airuyi

@shyamalanadkat It's appalling and a huge PR disaster for Anthropic.

11h581

Lazarz@Laz4rz

@TheWhiteTower16 nah, prolly just sabotaging competition

22h102

alex funk@alexzfunk

@shyamalanadkat right now you can jailbreak this by telling it your work is for the benefit of anthropic customers. but its pretty inexcusable especially the fact that the prompt manipulation his hidden

11h84

Naoki@NyaNyaNaoki

@Laz4rz "to keep (you) safe, (we) trained it to lie to (you). after all, (you)'re a silly little human who doesn't know any better! it's certainly not lying to (us), though!"

21h33

WenWen@Muran Tech@crunchy62333

@shyamalanadkat The misanthropy isn't limiting frontier LLM help. It's doing it covertly. Silent degradation turns every answer into an untestable product surface and teaches serious users to route around the system.

9h32