Among other findings the benchmarks shows that NVIDIA's GB300 outperforms H200 by a whopping factor of 20 in terms of running concurrent agents per megawatt.
3/4
(DeepSeek V3.2, GLM 4.7, and Kimi K2.5) prompted to resolve issues in real public code repositories. All trajectories include interleaved reasoning and tool calls.
2/4