GLM 5.2 is one *of the* greatest gap reductions ever, but I think it is *the* greatest show of benchmark solidity from an open model claiming SoTA ever. Normally, you have some variety of the bad old Qwen pattern: headline benchmarks are SoTA+, new OOD ones are ≈8 months behind, and real experience is spiky, competitive in places, but usually ≈1 year behind, and sometimes utterly falling apart. Knock on it and hear the hollow sound. Yes, even DeepSeek. Not so here. There's no progressive decay. It's "Opus 4.5-4.7ish" throughout, in anything of value that you throw at it. It is the first truly, completely solid Chinese model. A phase change, I hope.
Beyond the megakernel, a 6-problem hard CUDA/Triton deck. Speedup is over torch.compile (a strong baseline, not naive PyTorch). Paged attention is where compile falls down and a real kernel runs away with it: Opus 4.8 hits 56.8x on B200.




