/AI1h ago

Princeton's Sayash Kapoor says Anthropic's undisclosed safety filters on Fable 5 undermine the credibility of third-party AI evaluations

Evaluators cannot distinguish model failures from intentional safety blocks.

744212.1K

#20

Original post

Sayash Kapoor@sayashk#745inAI

There is a lot of justified anger at Anthropic for sandbagging Fable 5 for AI development tasks. But an unanticipated side effect is that third-party evaluators can no longer credibly use the model for evaluations.

Case in point: we are in the middle of running *really hard* AI R&D evaluations. Fable 5 would be a perfect test candidate. But because of Anthropic's guardrails, we can't know if the model failed or if their classifiers blocked the capability.

By the way, this is not just true for AI R&D. Since Anthropic doesn't make it clear when they are sandbagging, this could seep into any number of technical tasks, and the evaluators wouldn't have any way to know. So they can't credibly claim to evaluate state-of-the-art accuracy using the model.

7:02 PM · Jun 9, 2026 · 2.8K Views

/AI1h ago

Princeton's Sayash Kapoor says Anthropic's undisclosed safety filters on Fable 5 undermine the credibility of third-party AI evaluations

Evaluators cannot distinguish model failures from intentional safety blocks.

744212.1K

#20

Original post

Sayash Kapoor@sayashk#745inAI

7:02 PM · Jun 9, 2026 · 2.8K Views

Sentiment

Users expressed anger and surprise at Anthropic for sandbagging its models, viewing the tactic as a way to undermine credible third-party AI evaluations.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS792REPLIES2

Sayash Kapoor@sayashk

Anthropic’s move might sound reasonable if you consider their actions as a company chasing superintelligence.

But consider that their customers are spending billions of dollars on their services! That is precisely what has led to their recent surge in ARR, popularity, and fund raising success.

So customers’ surprise and anger is warranted when they then sandbag in evals *without even informing them* about the degraded capabilities.

1h79270

LIKES8

Miles Brundage@Miles_Brundage

More on the research point - things are hard enough in external evaluation world already

Sayash Kapoor@sayashk

1h56380

Auyon Siddiq@auyonomous

@sayashk I don't understand the anger/surprise. They're a company chasing superintelligence, obviously they're going to take measures to hinder potential competition right? Is everyone forgetting how this works or am I missing something?

1h481