Claude Sonnet 5 upgrades are not uniform across every skill. e.g. its weaker than Sonnet 4.6 on CyberGym 🤔
Here, CyberGym is testing vulnerability discovery and exploit-finding behavior, not general reasoning or normal coding.
Anthropic also explicitly said in its announcment blog that Sonnet 5 was not deliberately trained for cyber tasks, so its cyber ability likely comes from general intelligence rather than targeted optimization.
So Sonnet 5's performance on CyberGym comes from general reasoning rather than specialized exploit skill.
---
From System Card of Claude Sonnet 5
And Claude Sonnet 5 just launched.
Closes the gap with Opus 4.8, and is cheap until August.
This makes agentic AI much cheaper, with $2 input tokens and $10 output tokens per 1M through Aug-26. Price rises after 08-26 to $3 input and $15 output per 1M.
They call Sonnet 5 its “most agentic Sonnet model yet,”
Its coding score hit 63.2% on SWE-bench Pro, versus 58.1% for Sonnet 4.6.
Sonnet 5 gets 63.2% in agentic coding, while Opus 4.8 reaches 69.2% and Sonnet 4.6 hits 58.1%.
But in knowledge work, Sonnet 5 slightly beats Opus 4.8, even though Opus is known for tough judgment and deep research tasks.








