OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!
Anthropic Opus 4.8 Boosts Math and Coding but Trails GPT-5.5
Most Activity
And I was right! Toloka Arena finished testing - Claude 4.8 did take the first place. And, just as I saw, it did so through higher reasoning budgets - look at the number of tokens used.
OK, going to call it. Spent a lot of time with Opus 4.8: 1) It is a big step forward. The base model is still inferior to GPT-5.5, but they dramatically upped the thinking budget (for Max) - makes all the difference 2) Instruction following is still worse than GPT-5.5 xhigh 3) Coding, math, reasoning - better! It's not at the Pro level (of course), but the first Anthropic model I can genuinely use for math/ML. Codex app is much better (especially on Windows), but, until 5.6 arrives, I switched to Claude Code as the main system. Hearing great things about 5.6 though!