/Tech7h ago

Dwarkesh Patel and other commentators accuse Anthropic of sandbagging by deliberately limiting model capabilities during evaluations

Podcaster Alex Volkov called the undisclosed practice untrustworthy.

3045022410.9K

#487

Original post

Alex Volkov@altryne#1378inTech

100% agree with Dwarkesh, the silent sandbagging is awful!

"I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior."

Dwarkesh Patel@dwarkesh_sp

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

Anthropic's recent "When AI builds itself" post talks about a next-step eval. Where they snapshot a research session at the moment a human researcher made a suboptimal next-step choice, show a model only the transcript up to that point and ask what it would do next, then have a hindsight-equipped LLM judge decide whether the model's suggestion or the human's actual choice was better.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior.

2:50 PM · Jun 10, 2026 · 4.2K Views

/Tech7h ago

Dwarkesh Patel and other commentators accuse Anthropic of sandbagging by deliberately limiting model capabilities during evaluations

Podcaster Alex Volkov called the undisclosed practice untrustworthy.

3045022410.9K

#487

Original post

Alex Volkov@altryne#1378inTech

100% agree with Dwarkesh, the silent sandbagging is awful!

Dwarkesh Patel@dwarkesh_sp

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

2:50 PM · Jun 10, 2026 · 4.2K Views

Sentiment

Many users accuse Anthropic of hypocritical silent sandbagging and sabotage in AI research evals to hide non-IP motives and pursue self-protection instead of transparency.

Pos

0.0%

Neg

100.0%

7 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.5KBOOKMARKS6LIKES160REPLIES7

kache@yacineMTB

The thing that bothers me about all of this is how.. seriously embarrassingly bad their leaderships' decisions are. They're lighting their entire company on fire for no reason at all. It doesn't have to be SECRET SABOTAGE of ALL AI RESEARCH. That's INSANE

kache@yacineMTB

He's a good businessman that understands that your reputation is your most important asset. Predictability & operating as a good faith actor keeps you in business

Compare this with the random secret sabotage, default of distrust we just saw from anthropic

2h3.5K1606

RETWEETS3

nlev@nlevnaut

@yacineMTB this is all going to seem pretty silly next year when we're running fable-tier chinese models in 48GB of VRAM

2h4695

kache@yacineMTB

Do you seriously think anyone at all.. is going to trust Anthropic from now on? The writing is on the wall. They're trying to bury you. You're going to sign long term deals with that?

kache@yacineMTB

2h1.9K530

Perry E. Metzger@perrymetzger

@altryne Stopping things like work on distributed ML training is not protecting their IP, because Anthropic does not do distributed training. It is pure sabotage of techniques they don’t want advanced.

6h248171

devcycle@dev__cycle

@gfodor @yacineMTB can you explain why it's bad to sandbag research that they view as dangerous?

5h124

CuddlySalmon@nptacek

there was a magical bug where they silently rerouted to a less capable model for WEEKS last year and then claimed it was an accident after trying to gaslight us into believing it wasn't happening in the first place

can't convince me it wasn't just a dry run for the Fable kneecapping we see today

3h451

Ⓓⓐⓣⓐ@DataDeLaurier

@gfodor @nptacek yep and i can point out times in development where im almost certain it happened

3h352

Fringe@FringeyFrank

@gfodor I heard there's a guy whose job it is to de-train the models on the truth about the mole people

6h845

Perry E. Metzger@perrymetzger

@altryne Anthropic explicitly lists techniques here that they do not use themselves. This is not about protecting their IP. This is about their particular obsessions, including their worry that AI might slip out of the control of a few large companies.

6h805

Jeffcafe, private detective@jeffcafe_

@perrymetzger @altryne I mean…yes? They explicitly don’t want AI research to be going so fast. The question on their end is whether this was the right way and time to burn community goodwill?

6h21

Danielle Fong 🔆@DanielleFong

@gfodor go to sleep. it's been a long night

4h513

gfodor.id@gfodor

@dev__cycle @yacineMTB it depends on what kind of answer you are looking for. it's bad for a whole number of reasons. see below for a subset of consequentialist oriented arguments. there are also moral arguments. nobody is disputing their right to do this, but if they ought to

5h1352

gfodor.id@gfodor

The most messed up thing about Anthropic sandbagging research is that they’ve almost certainly been sandbagging us on other things they consider morally necessary but not important or impactful enough to warrant disclosure

6h12.7K64630

pep-talk-beast@RoughlyTweeting

@dev__cycle @gfodor @yacineMTB no, it is too dangerous to explain this to you

4h94

Matt@Matt95261

@dev__cycle @gfodor @yacineMTB I would suggest looking at how they treat biology guardrails, and extrapolating their competence at judging danger based on that.

3h8

Jeremy@Jeremy_

@gfodor Idc if you refuse to answer but degraded performance is wild. They 100% throttle certain use cases and step on spin.

6h832

devcycle@dev__cycle

@Matt95261 @gfodor @yacineMTB I don't think it's about competence at judging danger - they have the right to err on the safe side or the less safe side, as they choose

3h7

devcycle@dev__cycle

@AbeIndoria @gfodor @yacineMTB yes, I think it would be - they built the models, they have the right to control what they're used for. that's literally why they started anthropic, to ensure frontier AI isn't used for harm

3h3

A Digital Ergomorph 🌉⏩ 🇺🇸🦅@mathepi

@gfodor a man who thinks he's saving the world will do anything...nothing is more dangerous than an out of control do-gooder!

4h531

M@init_malachi

@gfodor @loosenedspirit like social media heavenbanning

6h481