Anthropic Merges Fable And Mythos Eval Results, Risking Inflated Scores · Digg

/Tech1h ago

Anthropic Merges Fable And Mythos Eval Results, Risking Inflated Scores

8988129.4K

Original post

banteg@banteg

anthropic has published merged eval numbers for fable/mythos. not only they picked the better result of the two (this could overstate results up to +3%), they also used mythos as a fallback for fable refusals, which are shown here as 0%.

1:25 PM · Jun 9, 2026 · 9K Views

Sentiment

Some users defended Anthropic's merged Fable Mythos benchmark reporting as good, while others dismissed the model as useless investor bait for an IPO and highlighted worsening issues.

Pos

33.3%

Neg

66.7%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Most Activity

VIEWS449REPLIES3

Dwayne@CtrlAltDwayne

@banteg Completely useless model. It's investor bait for the IPO they have coming.

1h|Views 449Likes 4

LIKES5RETWEETS1

iloccorb@dotnet_enjoyer

@banteg It gets worse. This is how it pushes back when you mention these

1h|Views 219Likes 5

Andrey 🦃 Petrov@shazow

@banteg time to add a new benchmark

1h|Views 305Likes 4

Wariohead@Wariohead_

@CtrlAltDwayne @banteg How is it useless? Have you even tried it?

1h|Views 35Likes 1

X@TeachLearnGrow2

@CtrlAltDwayne @banteg Come on we need more commentary than that. How is it a useless model?!

1h|Views 33

Krakovia@krakovia_evm

@CtrlAltDwayne @banteg Nah it's good.

1h|Views 24Likes 1

88@hydra88

@banteg don't ask how but i got the lil shid running

1h|Views 12

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

/Tech1h ago

Anthropic Merges Fable And Mythos Eval Results, Risking Inflated Scores

8988129.4K

Original post

banteg@banteg

anthropic has published merged eval numbers for fable/mythos. not only they picked the better result of the two (this could overstate results up to +3%), they also used mythos as a fallback for fable refusals, which are shown here as 0%.

1:25 PM · Jun 9, 2026 · 9K Views