/Tech23h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

23357217045.6K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#440inTech

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Generally I feel more sympathy for OpenAI lately. Here they are, trying to RLVR towards actual scientific AGI that'll solve problems directly, as on CritPT. And their great safety-conscious competition: code, code, B2B, B2B, attack, exploit, chyna hawkery, RSI. Not equal.

2:05 PM · Jun 9, 2026 · 39.6K Views

/Tech23h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

23357217045.6K

#440

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#440inTech

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2:05 PM · Jun 9, 2026 · 39.6K Views

Sentiment

Some users voiced optimism that Claude 5.6 will compete well in coding benchmarks, while many criticized its vision as hilariously bad and worse than Qwen and insulted Dario as petty.

Pos

25.0%

Neg

75.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.6KBOOKMARKS17RETWEETS2REPLIES2

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

it stands to reason that OpenAI is much more spiritually (and demographically) Chinese than Anthropic, which is basically committed to Total CCP Death. Except general STEM intelligence is not "goofy ahh ideas". We'll see if scaling that wins over scaling muh agentic coding alone.