There is a simple reason why Gemini is so much worse than GPT or Claude
engineers at OpenAI or Ant can read incoming user queries. all the data is visible
but at Google there are tons of privacy restrictions preventing ppl from looking at data
basically building a model blind
GLM-5.2 (Max) by @Zai_org ranks #10 on the new Agent Arena leaderboard, closely matching Claude-Opus-4.8 (non-thinking) and is the #1 open model by a wide margin!
In Agent Arena, we measure models on millions of real-world, long-horizon agentic tasks from a global community of users. Models can access web search, filesystem, and terminal tools to complete complex workflows. The leaderboard measures model performance on outcomes relative to the average model using a causal tracing methodology.
Compared to 5.1, GLM-5.2 (Max) climbs from #13 to #10. Its clearest gains are confirmed task success, and user praise vs. complaint. Bash capabilities and tool hallucination remain stable. There is a tradeoff in steerability compared to the previous model (-6.0% vs. +1.2%).
GLM-5.2 remains the same price as GLM-5.1, $1.4/$4.4 per input/output MTokens. 1M context window.
Huge congrats @Zai_org for the incredible release!
See thread for details on how GLM-5.2 (Max) performs across 5 different signals.




