/Tech23h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

23357217045.6K
Original post

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

Generally I feel more sympathy for OpenAI lately. Here they are, trying to RLVR towards actual scientific AGI that'll solve problems directly, as on CritPT. And their great safety-conscious competition: code, code, B2B, B2B, attack, exploit, chyna hawkery, RSI. Not equal.

2:05 PM · Jun 9, 2026 · 39.6K Views
Sentiment

Some users voiced optimism that Claude 5.6 will compete well in coding benchmarks, while many criticized its vision as hilariously bad and worse than Qwen and insulted Dario as petty.

Pos
25.0%
Neg
75.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.6KBOOKMARKS17RETWEETS2REPLIES2

it stands to reason that OpenAI is much more spiritually (and demographically) Chinese than Anthropic, which is basically committed to Total CCP Death. Except general STEM intelligence is not "goofy ahh ideas". We'll see if scaling that wins over scaling muh agentic coding alone.

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

22hViews 3.6KLikes 37Bookmarks 17
LIKES46

similar story with vision OpenAI's big model will be much more of a shock.

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

23hViews 2.4KLikes 46Bookmarks 3
Anime fan@badboy999654

@teortaxesTex It's worse than qwen 3.6 in vision 💀

23hViews 41
cqk@cqkten

@teortaxesTex But what will they have by the time GPT-6 is out 🧐

22hViews 19
Jake Halloran@jakehalloran1

@teortaxesTex The vision is like hilariously bad versus everything else for sure

23hViews 47
Paul Marin@paulmarin90

@teortaxesTex Dario is the Xiaoren King.

22hViews 21
Anime fan@badboy999654

@teortaxesTex See

22hViews 10
cqk@cqkten

@teortaxesTex I do think 5.6 will actually compete fairly well with Fable

22hViews 7