GLM 5.2 (max) scores 70.1% on WeirdML benchmark, requiring 22,000 output tokens per run · Digg