/AI1h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

12202173119.1K
Original post

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

Generally I feel more sympathy for OpenAI lately. Here they are, trying to RLVR towards actual scientific AGI that'll solve problems directly, as on CritPT. And their great safety-conscious competition: code, code, B2B, B2B, attack, exploit, chyna hawkery, RSI. Not equal.

2:05 PM · Jun 9, 2026 · 16.7K Views
Sentiment

Some users voiced optimism that Claude 5.6 will compete well in coding benchmarks, while many criticized its vision as hilariously bad and worse than Qwen and insulted Dario as petty.

Pos
25.0%
Neg
75.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.4KLIKES30

similar story with vision OpenAI's big model will be much more of a shock.

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

1hViews 1.4KLikes 30Bookmarks 3
BOOKMARKS5RETWEETS1REPLIES1

it stands to reason that OpenAI is much more spiritually (and demographically) Chinese than Anthropic, which is basically committed to Total CCP Death. Except general STEM intelligence is not "goofy ahh ideas". We'll see if scaling that wins over scaling muh agentic coding alone.

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

55mViews 1.1KLikes 13Bookmarks 5
Anime fan@badboy999654

@teortaxesTex It's worse than qwen 3.6 in vision 💀

1hViews 41
cqk@cqkten

@teortaxesTex But what will they have by the time GPT-6 is out 🧐

1hViews 19
Jake Halloran@jakehalloran1

@teortaxesTex The vision is like hilariously bad versus everything else for sure

1hViews 47
Paul Marin@paulmarin90

@teortaxesTex Dario is the Xiaoren King.

1hViews 21
Anime fan@badboy999654

@teortaxesTex See

1hViews 10
cqk@cqkten

@teortaxesTex I do think 5.6 will actually compete fairly well with Fable

1hViews 7