/AI1h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

12202173119.1K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#421inAI

Mythos/Fable destroys everything else in everything all at once, with large gaps. Except this "everything" is 20 different evals of agentic coding. On CritPt, they are merely equal with 5.5 (and below 5.5-Pro.) Similar in vision. Gigantic next gen model, made for ONE thing.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Generally I feel more sympathy for OpenAI lately. Here they are, trying to RLVR towards actual scientific AGI that'll solve problems directly, as on CritPT. And their great safety-conscious competition: code, code, B2B, B2B, attack, exploit, chyna hawkery, RSI. Not equal.

2:05 PM · Jun 9, 2026 · 16.7K Views

/AI1h ago

Claude Mythos Dominates Coding Benchmarks But Ties GPT On Physics Research Eval

12202173119.1K

#421

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#421inAI

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2:05 PM · Jun 9, 2026 · 16.7K Views

Sentiment

Some users voiced optimism that Claude 5.6 will compete well in coding benchmarks, while many criticized its vision as hilariously bad and worse than Qwen and insulted Dario as petty.

Pos

25.0%

Neg

75.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.4KLIKES30

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

similar story with vision OpenAI's big model will be much more of a shock.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

1h1.4K303

BOOKMARKS5RETWEETS1REPLIES1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

it stands to reason that OpenAI is much more spiritually (and demographically) Chinese than Anthropic, which is basically committed to Total CCP Death. Except general STEM intelligence is not "goofy ahh ideas". We'll see if scaling that wins over scaling muh agentic coding alone.