GLM 5.2 (max) scores 70.1% on WeirdML, narrowly beating to Genini 3 Pro, from 7 months ago.
It uses ~22k output tokens on average, compared to ~12k for the (high) setting. This gives a fairly clear but modest increase (3%) in score, showing that results scale with output tokens.
Runs without thinking are under way.
GLM 5.2 (high) scores 67.3% on WeirdML, a score between Opus 4.5 and Gemini 3 Pro.
This is a much higher score than I expected, and GLM 5.2 max (still running) could score even better.
It looks like a very solid model!






