I have some very big news... KernelBench-Hard with H100 and B200 (single gpu results)
AND
KernelBench-Mega tested on RTX PRO 6000, H100, B200 is finally out!
Starting with Mega, each of models wrote a GPU megakernel (that means one CUDA kernel per token generated) from scratch, on three NVIDIA GPUs (RTX PRO 6000, H100, B200), and open-sourced every agent trace.
Claude Opus 4.8 wins on every GPU, up to 19.4x over the reference on B200. GLM-5.2 is the top open-weight model and its not close!
Full results + 172 traces below if you want to review/train on them.
HUGE thank you to @NVIDIAAI for sponsoring me credits to run this on datacenter hardware!





