GLM-5.2 results were sus, so I looked into how the models post-train
and it's slop the results would be useless in the real world
it's just another benchmark that GLM bros hillclimbed
mind you, GLM-5 was in 22nd place and then a few months later it's suddenly in 1st
part of the problem is the benchmark, because there are no hidden evals and models are training one model for one eval at a time, so they are kind of encouraged to build overfit slop
GLM 5.2 is 5x cheaper than Opus 4.8 and 11x than Fable 5, yet it tops PostTrainBench.
That’s exciting because lower costs make personalized intelligence economically viable. Every company and country should be able to own models trained on its own data and have sovereignty over it. The future is millions of models, each crafted around the data, values, and decisions of the people who rely on them.











