I am waiting for @htihle's WeirdML results one of the few truly hard, fully private benchmarks that really humble Chinese models so far. Glm-5.1 is the best they could do so far. 5.1 was, I think, a lesser step-up vs 5.0 than 5.2 is to 5.1. I brazenly predict 0.725.
When Chinese bros have more compute, starting in H2 2026… they won't race ahead, because American bros (at least smart ones) have been investing lots of compute into experiments, so they'll have an easier time with large-scale training. This remains a close competition.