/Tech7h ago

Dwarkesh Patel suggests Fable ML trained its AI research capabilities on Anthropic's proprietary algorithms and workflows

Patel likened the suspected leak to poaching an Anthropic researcher.

558894129580.4K

#67

Original post

Dwarkesh Patel@dwarkesh_sp#67inTech

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

Anthropic's recent "When AI builds itself" post talks about a next-step eval. Where they snapshot a research session at the moment a human researcher made a suboptimal next-step choice, show a model only the transcript up to that point and ask what it would do next, then have a hindsight-equipped LLM judge decide whether the model's suggestion or the human's actual choice was better.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

I'm just speculating. But if this was a motivation, then Anthropic should have figured out a better way to protect IP than sandbagging without telling the user they're sandbagging, which is very hostile and untrustworthy behavior.

2:46 PM · Jun 10, 2026 · 73K Views

/Tech7h ago

Dwarkesh Patel suggests Fable ML trained its AI research capabilities on Anthropic's proprietary algorithms and workflows

Patel likened the suspected leak to poaching an Anthropic researcher.

558894129580.4K

#67

Original post

Dwarkesh Patel@dwarkesh_sp#67inTech

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

2:46 PM · Jun 10, 2026 · 73K Views

Sentiment

Many users reacted with hostility to speculation about an Anthropic IP leak in Fable ML training by accusing the company of hypocrisy on open research and labeling its leaders evil pretenders and propagandists.

Pos

0.0%

Neg

100.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS8.9KBOOKMARKS19LIKES82RETWEETS2REPLIES7

rohan anil@_arohan_

I don't know if this is true.

I think original restrictions is genuinely what is written in the report. If I were to summarize is that everyone getting a good model that can be RL'ed into frontier ai capability which has risks, when in wrong hands.

The active sabotage (If true, now I am confused if its just X or real) point is what is very confusing to me and truly affected me. This feels really wrong. That classifier will have its own precision & recall, so I don't know when I am triggering it. Please respect our time, just roll this part back, or be active in blocking.

Just like when dealing with business, just give a straight answer yes/no.

Dwarkesh Patel@dwarkesh_sp

Re the Fable ML sandbagging, the model's AI research capabilities were probably at least partly trained on Anthropic employees diffing atop proprietary algos and infra.

So the IP leak is somewhat like a researcher who knows Anthropic's stack getting poached to another lab.

This eval seems like a very good RL target for AI R&D - one among many that could be used to have AIs emulate Anthropic researchers and their research products.

5h8.9K8219

Dwarkesh Patel@dwarkesh_sp

ht @Moh1tAgarwal for noticing this eval in the blog post and pointing it out as an RL target

6h5K231

Ahmad@TheAhmadOsman

@dwarkesh_sp You should read this and join us for a free and open future

https://opensourceaimustwin.com/?share=v2

7h394142

Behnam@OrganicGPT

@dwarkesh_sp that doesn't explain why the model also refuses biology questions. Dario has a degree in that, here should have loved the world to advance biology and medicine using Fable.

6h8782

rohan anil@_arohan_

Just to use the plane analogy, its better to have a no-fly list rather than making a fake airport, and letting them go through TSA infinite times, with infinite delay for the flight.

4h83817

Auyon Siddiq@auyonomous

@dwarkesh_sp Is anyone trying to quantify how severe the sandbagging is? Should be possible by asking increasingly complex ML/AI prompts and measuring "compression" in the output. And then doing the same with a few non-sandbagged topics (e.g. math?) as a control group.

4h57311

Ljubomir Josifovski@ljupc0

@dwarkesh_sp Plus people can sample sweep ideas, smoke out which ones get banned or stiffled, and deduce 'there maybe fire next to the smoke'. True Ant-do-evil-Co might have thrown in some false positives to that line of inquiry, but a dynamic agent with enough sampling can overcome that.

6h41711

Pradyumna (in Bay Area)@PradyuPrasad

@dwarkesh_sp another is to have a versions of the model trained for internal versus external use

7h4105

Matt Stoner@Stoner7m

@dwarkesh_sp I think sandbagging is really the only correct way to play a high stakes game like AI. But I am genuinely scared of the fact that they think they are building a god.

7h8354

Jeffcafe, private detective@jeffcafe_

@dwarkesh_sp @azi_pat It’s really obvious they don’t want to leak IP that accelerates diffusion of frontier capabilities. For explicitly stated safety reasons re: RSI, but also because it hurts their lead (which is their primary leverage for safety policy). This wasn’t a surprise.

6h1311

Nicholas Pipitone@npip99

@dwarkesh_sp It sounds like these training runs cost billions in compute, so might be hard to create a model to accelerate their internal operations, and also offer a product to external users, in any other way.

6h8353

Joan Velja@Joanvelja

@dwarkesh_sp I recall an OpenAI podcast (possibly the one talking about training GPT-4.5) saying that they keep the monorepo outside of the training data and measure generalization of training on it. It’s too good of a testbed for how good your model is to train on it, I suspect.

7h5403

ar0cket1@ar0cket1

It’s functionally the same as a block for anthropic (the only reason they didn’t block is likely to preserve revenue form people in AI), since no one (me included) is using Claude models because of this (I wouldn’t be shocked if it’s applied on opus as well).

A block at the very least brings back trust.

Though there is an argument that all this does is make essentially getting a worse model for the end user and that’s all you should be judging it for (your end personal preference).

4h107

The Mediocre Inquisitor@MediocreInqv2

@dwarkesh_sp Just ask ur roommate

7h3243

Daljeet@daljeet_v

@dwarkesh_sp why don't they just post-train their own model for this IP?

7h5182

TheLegend27@glencoe2004

@dwarkesh_sp The motivation for sandbagging is so that other labs that try to use Fable for development are at risk of sabotage, ideally dissuading them from trying to use Fable in the first place. Hard barriers can be worked around, whereas sandbagging is much harder to detect.

6h4902

Shailesh@0xThoughtVector

@dwarkesh_sp This is what I was thinking also. The model does have excellent recall of its training data.

7h3112

Epstein's Stylist@epstein_stylist

@OrganicGPT @dwarkesh_sp It refuses biology questions because they rushed the release and slapped broad safeguards. They will soften them as the time goes

6h244

Daryl@allvibesnoskill

@dwarkesh_sp

6h1692

kache@yacineMTB

@OrganicGPT @dwarkesh_sp Refusing is fine, transparency is good. It would be fine if they had the ml stuff refuse just like the bio stuff. It's their model they can do what they want with it

6h383