/AI3d ago

Anthropic Opus 4.8 Boosts Math and Coding but Trails GPT-5.5

--0--
Original posts
Quote posts
Original post
Mikhail Parakhin@MParakhin#906inAI

OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!

12:54 PM · May 30, 2026 · 82.3K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS3.3KBOOKMARKS4LIKES25RETWEETS2REPLIES2

And I was right! Toloka Arena finished testing - Claude 4.8 did take the first place. And, just as I saw, it did so through higher reasoning budgets - look at the number of tokens used.

OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!

1hViews 3.3KLikes 25Bookmarks 4