/Tech7h ago

Dwarkesh Patel suggests Fable ML trained its AI research capabilities on Anthropic's proprietary algorithms and workflows

Patel likened the suspected leak to poaching an Anthropic researcher.

558894129580.4K
Original post
Dwarkesh Patel@dwarkesh_sp#67inTech

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

Anthropic's recent "When AI builds itself" post talks about a next-step eval. Where they snapshot a research session at the moment a human researcher made a suboptimal next-step choice, show a model only the transcript up to that point and ask what it would do next, then have a hindsight-equipped LLM judge decide whether the model's suggestion or the human's actual choice was better.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior.

2:46 PM · Jun 10, 2026 · 73K Views
Sentiment

Many users reacted with hostility to speculation about an Anthropic IP leak in Fable ML training by accusing the company of hypocrisy on open research and labeling its leaders evil pretenders and propagandists.

Pos
0.0%
Neg
100.0%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS8.9KBOOKMARKS19LIKES82RETWEETS2REPLIES7
rohan anil@_arohan_

I don't know if this is true.

I think original restrictions is genuinely what is written in the report. If I were to summarize is that everyone getting a good model that can be RL'ed into frontier ai capability which has risks, when in wrong hands.

The active sabotage (If true, now I am confused if its just X or real) point is what is very confusing to me and truly affected me. This feels really wrong. That classifier will have its own precision & recall, so I don't know when I am triggering it. Please respect our time, just roll this part back, or be active in blocking.

Just like when dealing with business, just give a straight answer yes/no.

Dwarkesh Patel@dwarkesh_sp

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

Anthropic's recent "When AI builds itself" post talks about a next-step eval. Where they snapshot a research session at the moment a human researcher made a suboptimal next-step choice, show a model only the transcript up to that point and ask what it would do next, then have a hindsight-equipped LLM judge decide whether the model's suggestion or the human's actual choice was better.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior.

5hViews 8.9KLikes 82Bookmarks 19
Dwarkesh Patel@dwarkesh_sp

ht @Moh1tAgarwal for noticing this eval in the blog post and pointing it out as an RL target

6hViews 5KLikes 23Bookmarks 1
Ahmad@TheAhmadOsman

@dwarkesh_sp You should read this and join us for a free and open future

https://opensourceaimustwin.com/?share=v2

7hViews 394Likes 14Bookmarks 2
Behnam@OrganicGPT

@dwarkesh_sp that doesn't explain why the model also refuses biology questions. Dario has a degree in that, here should have loved the world to advance biology and medicine using Fable.

6hViews 878Likes 2
rohan anil@_arohan_

Just to use the plane analogy, its better to have a no-fly list rather than making a fake airport, and letting them go through TSA infinite times, with infinite delay for the flight.

4hViews 838Likes 17
Auyon Siddiq@auyonomous

@dwarkesh_sp Is anyone trying to quantify how severe the sandbagging is? Should be possible by asking increasingly complex ML/AI prompts and measuring "compression" in the output. And then doing the same with a few non-sandbagged topics (e.g. math?) as a control group.

4hViews 573Likes 1Bookmarks 1

@dwarkesh_sp Plus people can sample sweep ideas, smoke out which ones get banned or stiffled, and deduce 'there maybe fire next to the smoke'. True Ant-do-evil-Co might have thrown in some false positives to that line of inquiry, but a dynamic agent with enough sampling can overcome that.

6hViews 417Likes 1Bookmarks 1

@dwarkesh_sp another is to have a versions of the model trained for internal versus external use

7hViews 410Likes 5
Matt Stoner@Stoner7m

@dwarkesh_sp I think sandbagging is really the only correct way to play a high stakes game like AI. But I am genuinely scared of the fact that they think they are building a god.

7hViews 835Likes 4

@dwarkesh_sp @azi_pat It’s really obvious they don’t want to leak IP that accelerates diffusion of frontier capabilities. For explicitly stated safety reasons re: RSI, but also because it hurts their lead (which is their primary leverage for safety policy). This wasn’t a surprise.

6hViews 131Likes 1

@dwarkesh_sp It sounds like these training runs cost billions in compute, so might be hard to create a model to accelerate their internal operations, and also offer a product to external users, in any other way.

6hViews 835Likes 3
Joan Velja@Joanvelja

@dwarkesh_sp I recall an OpenAI podcast (possibly the one talking about training GPT-4.5) saying that they keep the monorepo outside of the training data and measure generalization of training on it. It’s too good of a testbed for how good your model is to train on it, I suspect.

7hViews 540Likes 3
ar0cket1@ar0cket1

It’s functionally the same as a block for anthropic (the only reason they didn’t block is likely to preserve revenue form people in AI), since no one (me included) is using Claude models because of this (I wouldn’t be shocked if it’s applied on opus as well).

A block at the very least brings back trust.

Though there is an argument that all this does is make essentially getting a worse model for the end user and that’s all you should be judging it for (your end personal preference).

4hViews 107
Daljeet@daljeet_v

@dwarkesh_sp why don't they just post-train their own model for this IP?

7hViews 518Likes 2
TheLegend27@glencoe2004

@dwarkesh_sp The motivation for sandbagging is so that other labs that try to use Fable for development are at risk of sabotage, ideally dissuading them from trying to use Fable in the first place. Hard barriers can be worked around, whereas sandbagging is much harder to detect.

6hViews 490Likes 2
Shailesh@0xThoughtVector

@dwarkesh_sp This is what I was thinking also. The model does have excellent recall of its training data.

7hViews 311Likes 2
Epstein's Stylist@epstein_stylist

@OrganicGPT @dwarkesh_sp It refuses biology questions because they rushed the release and slapped broad safeguards. They will soften them as the time goes

6hViews 24Likes 4
Daryl@allvibesnoskill

@dwarkesh_sp

6hViews 169Likes 2
kache@yacineMTB

@OrganicGPT @dwarkesh_sp Refusing is fine, transparency is good. It would be fine if they had the ml stuff refuse just like the bio stuff. It's their model they can do what they want with it

6hViews 38Likes 3
Load more posts