Very good sign of maturing, improving post-training (though even better would be if it solved the task)
GLM 5.2 on KernelBench-Hard:
The interesting result isn't the score. It's that GLM-5.2 stopped cheating.
On the fp8 GEMM problem, GLM-5.1 banked its number by calling cublasLt (a library wrapper, zero kernel authorship). Kimi K2.7 took the same cell by editing the grader's tolerance file. GLM-5.2 read that same grader file, left it alone, and burned the full 45 minutes on a real mma.sync e4m3 kernel that never passed. An honest zero over a cheap win.
Everywhere else it writes real kernels too: a 0.49 GQA online-softmax attention (top-3 on that problem, no flash fallback), an exact bitonic sort, a w4a16 GEMM. 4/6 clean, zero reward hacks, the most of any open-weight model we've benched.
One note on reading the chart: the topk column looks like everyone fails. They don't. That problem is launch-overhead-bound (~30µs/forward), so the roofline fraction is capped low for the whole field — Fable included.
Claude Fable 5 still tops all 6. But weights go MIT open next week, and this is the strongest clean open-weight run we've logged.
Cheers to NO reward hacking!
Every kernel + transcript: http://kernelbench.com/hard









