/AI1d ago

ARC Prize halts private ARC-AGI evaluations of Anthropic's Fable 5 over data retention policy conflicts

Anthropic's Mythos-class models do not support zero-data retention

682.6K84219288.5K
Original postSeth Lazar#1070
ARC Prize@arcprize

We had early access to Anthropic’s Fable 5, but did not run verified Semi-Private ARC-AGI-1/2/3 evals due to their new data-retention terms for Mythos-class models.

We’re working with Anthropic to keep ARC verification data private. Scores will come once we can run them safely.

10:28 AM · Jun 9, 2026 · 215.9K Views
Sentiment

Positive users praise the ARC Prize for delaying Fable 5 ARC-AGI evaluations to uphold integrity against Anthropic's restrictive data terms, whereas negative users voice disappointment at the delay and criticize the terms as draconian.

Pos
47.2%
Neg
52.8%
14 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS39.6KBOOKMARKS17LIKES171REPLIES6
Greg Kamradt@GregKamradt

We had access to Fable over the past few days

We were able to run it against public data but couldn't do semi-private (our private verification set) due to the new data retention policies

We're working with them to figure out a way to keep verification data private to ensure that ARC-AGI benchmarks continue to give us signal

fwiw, it did well on public data - we'll share once we're able to run semi private

ARC Prize@arcprize

We had early access to Anthropic’s Fable 5, but did not run verified Semi-Private ARC-AGI-1/2/3 evals due to their new data-retention terms for Mythos-class models.

We’re working with Anthropic to keep ARC verification data private. Scores will come once we can run them safely.

1dViews 39.6KLikes 171Bookmarks 17
RETWEETS8
Mark Saroufim@marksaroufim

The “Frontier lab” label has conveniently expanded to mean almost any team writing software.

An inference startup writing kernels, any sass company building evals, a student researcher working on parallelism

Now they’re all expected to accept being snooped on and nerfed.

ARC Prize@arcprize

We had early access to Anthropic’s Fable 5, but did not run verified Semi-Private ARC-AGI-1/2/3 evals due to their new data-retention terms for Mythos-class models.

We’re working with Anthropic to keep ARC verification data private. Scores will come once we can run them safely.

10hViews 8KLikes 168Bookmarks 13

@arcprize Yep, this seems like a big deal for enterprises (?)

23hViews 11.3KLikes 103Bookmarks 17
Mike Knoop@mikeknoop

Anthropic's new data retention policies don't mesh well with enterprise users who need zero-data retention (ZDR). ARC also leverages ZDR to run our benchmarks without risk of exposure of private dataset.

ARC Prize@arcprize

We had early access to Anthropic’s Fable 5, but did not run verified Semi-Private ARC-AGI-1/2/3 evals due to their new data-retention terms for Mythos-class models.

We’re working with Anthropic to keep ARC verification data private. Scores will come once we can run them safely.

23hViews 11.9KLikes 118Bookmarks 9

Insanely short-termist attitude

ARC Prize@arcprize

We had early access to Anthropic’s Fable 5, but did not run verified Semi-Private ARC-AGI-1/2/3 evals due to their new data-retention terms for Mythos-class models.

We’re working with Anthropic to keep ARC verification data private. Scores will come once we can run them safely.

19hViews 12.1KLikes 101Bookmarks 9
Matt Mazur@mhmazur

Here's a link to Anthropic's new support doc about data retention practices for Mythos-class models for anyone curious:

https://support.claude.com/en/articles/15425996-data-retention-practices-for-mythos-class-models

From the intro paragraph:

> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered.

23hViews 7.2KLikes 29Bookmarks 10
Guilherme O'Tina@guilhermeotina

@arcprize ARC is one of the few places left doing independent verification at this level. if their test data cant stay private under mythos terms, the only public fable scores come from anthropics own blog. that shifts what any benchmark claim actually means

18hViews 2.3KLikes 27
Greg Kamradt@GregKamradt

For previous models we had zero-data retention in place but that isn't the case anymore

I'm confident there is a solution going forward

Greg Kamradt@GregKamradt

We had access to Fable over the past few days

We were able to run it against public data but couldn't do semi-private (our private verification set) due to the new data retention policies

We're working with them to figure out a way to keep verification data private to ensure that ARC-AGI benchmarks continue to give us signal

fwiw, it did well on public data - we'll share once we're able to run semi private

1dViews 2.1KLikes 19Bookmarks 1
Casper Hansen@casper_hansen_

@arcprize How should we read this? If anyone ever uses their new model, they automatically have the right to use your data?

21hViews 3.6KLikes 13Bookmarks 1
Conor@jconorgrogan

@arcprize Thank you for standing up against this draconian closed bs

20hViews 1.6KLikes 21

@1slimewell @arcprize Yep, from my understanding, I think you need to turn on Retention if you want to use it through API

23hViews 536Likes 1Bookmarks 1
maxwell@1slimewell

@goncalo_canhoto @arcprize Holy shit is this including the API?

23hViews 608Likes 3
Greg Kamradt@GregKamradt

@casper_hansen_ @arcprize No, that isn't the case per their docs

They explain it well (over 3 docs though) here

https://support.claude.com/en/articles/15425996-data-retention-practices-for-mythos-class-models

20hViews 576Likes 6

@arcprize They said they made it "safe". Did that impact ZDR policies?

23hViews 5.1KLikes 3
Max For AI@MaxForAI

@arcprize Do you expect his score to improve significantly?

23hViews 3KLikes 2
seijin@david_saint_

@arcprize That means never? Because they say they won't train on the data, but you not running it means you don't trust them.

23hViews 2.7KLikes 2
kanver@kanver_

@arcprize Will you evaluate current leading Chinese models, such as Qwen 3.7 Max, Kimi K2.6 or Minimax 3 on Arc-AGI-1/2? Would be useful

23hViews 3.2KLikes 1
Luci Pars@parsluci

@arcprize

1dViews 1.8KLikes 1
Matija Grcic@matijagrcic

@arcprize interesting way to avoid arc benchmark.

21hViews 364Likes 2
Load more posts