💥NEW: Despite impressive performance on PostTrainBench and InferenceBench, GLM-5.2 still has high hallucination rates on HalluHard when used without web search (74.8% vs. 46.8% of GPT-5.4-Thinking).
HalluHard update: We’ve added GLM-5.2, using adaptive thinking with maximum reasoning effort, to our leaderboard. Despite its impressive performance on other benchmarks, GLM-5.2 still hallucinates frequently on our challenging multiturn benchmark.


