/Tech3h ago

Sayash Kapoor says Anthropic's silent capability restrictions on Fable 5 make credible AI evaluations impossible

Commentators suggest the blocks prevent model distillation by Chinese labs

1053365.3K

#27

Original post

Danielle Fong 🔆#388

MTS@MTSlive

You can't distill a model if you don't know when it's lying to you.

Anthropic's Fable has new restrictions on frontier AI research requests.

Unlike bio safeguards that visibly kick you to a different model, these are silent. The model just quietly sandbags.

We asked @fleetingbits why.

"It's interesting to ask why they did that... my best guess is this is aimed against Chinese labs."

"It's an anti-distillation measure because it's not telling you when it's sandbagging, you don't know when you're getting bad data."

4:21 PM · Jun 9, 2026 · 4.1K Views

/Tech3h ago

Sayash Kapoor says Anthropic's silent capability restrictions on Fable 5 make credible AI evaluations impossible

Commentators suggest the blocks prevent model distillation by Chinese labs

1053365.3K

#27

Original post

Danielle Fong 🔆#388

MTS@MTSlive

You can't distill a model if you don't know when it's lying to you.

Anthropic's Fable has new restrictions on frontier AI research requests.

Unlike bio safeguards that visibly kick you to a different model, these are silent. The model just quietly sandbags.

We asked @fleetingbits why.

"It's interesting to ask why they did that... my best guess is this is aimed against Chinese labs."

"It's an anti-distillation measure because it's not telling you when it's sandbagging, you don't know when you're getting bad data."

4:21 PM · Jun 9, 2026 · 4.1K Views

Sentiment

Users called Anthropic's Fable sandbagging unethical because secretly underperforming after accepting payment is worse than simply refusing the request.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.1KBOOKMARKS1LIKES33RETWEETS2REPLIES3

Sayash Kapoor@sayashk

There is a lot of justified anger at Anthropic for sandbagging Fable 5 for AI development tasks. But an unanticipated side effect is that third-party evaluators can no longer credibly use the model for evaluations.

Case in point: we are in the middle of running *really hard* AI R&D evaluations. Fable 5 would be a perfect test candidate. But because of Anthropic's guardrails, we can't know if the model failed or if their classifiers blocked the capability.

By the way, this is not just true for AI R&D. Since Anthropic doesn't make it clear when they are sandbagging, this could seep into any number of technical tasks, and the evaluators wouldn't have any way to know. So they can't credibly claim to evaluate state-of-the-art accuracy using the model.

1h1.1K331

Miles Brundage@Miles_Brundage

More on the research point - things are hard enough in external evaluation world already

Sayash Kapoor@sayashk

57m48470

Sayash Kapoor@sayashk

Anthropic’s move might sound reasonable if you consider their actions as a company chasing superintelligence.

But consider that their customers are spending billions of dollars on their services! That is precisely what has led to their recent surge in ARR, popularity, and fund raising success.

So customers’ surprise and anger is warranted when they then sandbag in evals *without even informing them* about the degraded capabilities.

45m44440

YunLinSJ@YunLinSJ

it seems unethical that Fable would secretly do a shitty job instead of just refusing the request.

If you take customer's money and accept the job, you should do the job right. This is like the baker, instead of refusing to bake a cake for a gay wedding, takes the job, but intentionally bakes a shitty cake

3h38

Eclipse 🌖@ECLresearch

@MTSlive Silent sandbagging is arguably more dangerous than visible refusals—it creates a false sense of capability while obscuring the model’s true failure modes. Do we have any data on how often users detect this behavior in practice?

1h1