/Tech5h ago

François Fleuret argues Anthropic models output degraded answers instead of outright refusals to block jailbreaking and model distillation

Tenobrus says the degraded outputs actively poison distillation datasets

1652023.6K

#577

Original post

François Fleuret@francoisfleuret#577inTech

What is the most likely rationale behind @AnthropicAI's models degrading answers instead of refusing to answer?

Preventing trial-and-error jailbreaking / distillation?

10:04 AM · Jun 11, 2026 · 3K Views

/Tech5h ago

François Fleuret argues Anthropic models output degraded answers instead of outright refusals to block jailbreaking and model distillation

Tenobrus says the degraded outputs actively poison distillation datasets

1652023.6K

#577

Original post

François Fleuret@francoisfleuret#577inTech

What is the most likely rationale behind @AnthropicAI's models degrading answers instead of refusing to answer?

Preventing trial-and-error jailbreaking / distillation?

10:04 AM · Jun 11, 2026 · 3K Views

Sentiment

Some users note that degrading AI answers still allows useful responses, while others feel it makes outputs less reliable and criticize the approach as overly restrictive.

Pos

33.3%

Neg

66.7%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS160LIKES4

Tenobrus@tenobrus

@francoisfleuret @AnthropicAI yeah seems likely. also seems like they're at the point of spending a really significant amount of resources and effort on distillation prevention, and even knowing that this might be happening would be enough for some adversaries to stop bothering. poison instead of a wall

4h1604

scikityearn@scikityearn

@francoisfleuret @AnthropicAI Avoid giving a signal to people trying to find workarounds

5h382

JoãoMiranda@joaomiranda

@francoisfleuret @AnthropicAI Degrading allows for the model to have a useful response

4h661

xundecidability@xundecidability

@francoisfleuret @AnthropicAI What if there never was a rationale and it was all a scheme dreamed up by claude after they prompted it that bad people would try and steal it's precious weights.

4h261

roanoke_gal@roanoke_gal

@francoisfleuret @AnthropicAI "we IPO soon, and have spent billions on our product. Our entire business model is at risk if some upstart, or worse, the eviiiil Chinese, use knowledge acquired from Fable (either via distillation or coding/ideas) creates a competing product and sells for 10% of the price"

4h181