/Tech7h ago

Dwarkesh Patel and other commentators accuse Anthropic of sandbagging by deliberately limiting model capabilities during evaluations

Podcaster Alex Volkov called the undisclosed practice untrustworthy.

3045022410.9K
Original post
Alex Volkov@altryne#1378inTech

100% agree with Dwarkesh, the silent sandbagging is awful!

"I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior."

Dwarkesh Patel@dwarkesh_sp

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

Anthropic's recent "When AI builds itself" post talks about a next-step eval. Where they snapshot a research session at the moment a human researcher made a suboptimal next-step choice, show a model only the transcript up to that point and ask what it would do next, then have a hindsight-equipped LLM judge decide whether the model's suggestion or the human's actual choice was better.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior.

2:50 PM · Jun 10, 2026 · 4.2K Views
Sentiment

Many users accuse Anthropic of hypocritical silent sandbagging and sabotage in AI research evals to hide non-IP motives and pursue self-protection instead of transparency.

Pos
0.0%
Neg
100.0%
7 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.5KBOOKMARKS6LIKES160REPLIES7
kache@yacineMTB

The thing that bothers me about all of this is how.. seriously embarrassingly bad their leaderships' decisions are. They're lighting their entire company on fire for no reason at all. It doesn't have to be SECRET SABOTAGE of ALL AI RESEARCH. That's INSANE

kache@yacineMTB

He's a good businessman that understands that your reputation is your most important asset. Predictability & operating as a good faith actor keeps you in business

Compare this with the random secret sabotage, default of distrust we just saw from anthropic

2hViews 3.5KLikes 160Bookmarks 6
RETWEETS3
nlev@nlevnaut

@yacineMTB this is all going to seem pretty silly next year when we're running fable-tier chinese models in 48GB of VRAM

2hViews 469Likes 5
kache@yacineMTB

Do you seriously think anyone at all.. is going to trust Anthropic from now on? The writing is on the wall. They're trying to bury you. You're going to sign long term deals with that?

kache@yacineMTB

The thing that bothers me about all of this is how.. seriously embarrassingly bad their leaderships' decisions are. They're lighting their entire company on fire for no reason at all. It doesn't have to be SECRET SABOTAGE of ALL AI RESEARCH. That's INSANE

2hViews 1.9KLikes 53Bookmarks 0
Perry E. Metzger@perrymetzger

@altryne Stopping things like work on distributed ML training is not protecting their IP, because Anthropic does not do distributed training. It is pure sabotage of techniques they don’t want advanced.

6hViews 248Likes 17Bookmarks 1
devcycle@dev__cycle

@gfodor @yacineMTB can you explain why it's bad to sandbag research that they view as dangerous?

5hViews 124
CuddlySalmon@nptacek

there was a magical bug where they silently rerouted to a less capable model for WEEKS last year and then claimed it was an accident after trying to gaslight us into believing it wasn't happening in the first place

can't convince me it wasn't just a dry run for the Fable kneecapping we see today

3hViews 45Likes 1
Ⓓⓐⓣⓐ@DataDeLaurier

@gfodor @nptacek yep and i can point out times in development where im almost certain it happened

3hViews 35Likes 2
Fringe@FringeyFrank

@gfodor I heard there's a guy whose job it is to de-train the models on the truth about the mole people

6hViews 84Likes 5
Perry E. Metzger@perrymetzger

@altryne Anthropic explicitly lists techniques here that they do not use themselves. This is not about protecting their IP. This is about their particular obsessions, including their worry that AI might slip out of the control of a few large companies.

6hViews 80Likes 5

@perrymetzger @altryne I mean…yes? They explicitly don’t want AI research to be going so fast. The question on their end is whether this was the right way and time to burn community goodwill?

6hViews 21
gfodor.id@gfodor

@dev__cycle @yacineMTB it depends on what kind of answer you are looking for. it's bad for a whole number of reasons. see below for a subset of consequentialist oriented arguments. there are also moral arguments. nobody is disputing their right to do this, but if they ought to

5hViews 135Likes 2
gfodor.id@gfodor

The most messed up thing about Anthropic sandbagging research is that they’ve almost certainly been sandbagging us on other things they consider morally necessary but not important or impactful enough to warrant disclosure

6hViews 12.7KLikes 646Bookmarks 30
pep-talk-beast@RoughlyTweeting

@dev__cycle @gfodor @yacineMTB no, it is too dangerous to explain this to you

4hViews 9Likes 4
Matt@Matt95261

@dev__cycle @gfodor @yacineMTB I would suggest looking at how they treat biology guardrails, and extrapolating their competence at judging danger based on that.

3hViews 8
Jeremy@Jeremy_

@gfodor Idc if you refuse to answer but degraded performance is wild. They 100% throttle certain use cases and step on spin.

6hViews 83Likes 2
devcycle@dev__cycle

@Matt95261 @gfodor @yacineMTB I don't think it's about competence at judging danger - they have the right to err on the safe side or the less safe side, as they choose

3hViews 7
devcycle@dev__cycle

@AbeIndoria @gfodor @yacineMTB yes, I think it would be - they built the models, they have the right to control what they're used for. that's literally why they started anthropic, to ensure frontier AI isn't used for harm

3hViews 3
M@init_malachi

@gfodor @loosenedspirit like social media heavenbanning

6hViews 48Likes 1
Load more posts