I think GLM 5.2 makes the gap at present equal to roughly 7 months, all things considered. But what is remarkable: the gap being *much greater* on hard private evals led some people to assume that the primary differentiator is compute. GLM reduces those gaps just as well.
I was interviewed for this piece in The Economist, where I pushed back against the idea that Chinese models are only 4 months behind the frontier. The gap is likely quite a bit larger on real-world tasks, even though GLM 5.2 is a really strong model and an important update for me.





