/Tech2h ago

Z.ai's GLM-5.2 Max takes third on Artificial Analysis Index, delivering 112 tokens per second for $0.41 per task

Story Overview

Z.ai's GLM-5.2 Max lands in third on the Artificial Analysis Index by pairing strong reasoning scores with practical speed, spitting out 112 tokens per second while charging roughly 41 cents per indexed task, all while the standard variant edges ahead of Gemini 3.5 Flash on the same board.

25296151915.6K

#501

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

> GLM-5.2 scores 1524 on GDPval-AA v2 In retrospect I should have known that GLM also works on GDPeval, so there was no reason to expect it to flop there. Yeah peope saying 51-52 were right. This is about fair. It is stronger than Gemini 3.5 Flash.

Artificial Analysis@ArtificialAnlys

Z ai’s GLM-5.2 is the new leading open weights model on the Artificial Analysis Intelligence Index scoring 51 and it sits on the Pareto frontier of Intelligence vs Cost per Task

@Zai_org’s GLM-5.2 is the same size as GLM-5.1 (744B total / 40B active parameters) but scores 11 points higher on the Intelligence Index v4.1, placing ahead of MiniMax-M3 (44) and DeepSeek V4 Pro (max, 44). On the first-party API it is priced in line with GLM-5.1 at $1.4/$4.4/$0.26 per 1M input/output/cache hit tokens

Key results:

➤ GLM-5.2 is the leading open weights model on the Intelligence Index v4.1. At 51, it leads MiniMax-M3 (44), DeepSeek V4 Pro (max, 44) and Kimi K2.6 (43)

➤ Improvements across most evaluations, particularly scientific reasoning: GLM-5.2 gains over GLM-5.1 on most evaluations, led by scientific reasoning on CritPt (+16 points to 21%) and HLE (+12 points to 40%), alongside AA-LCR (+9 points to 71%), tau3 banking (+15 points to 27%) and SciCode (+7 points to 50%). TerminalBench v2.1 also improves (+16 points to 78%) and GPQA Diamond gains 3 points to 89%

➤ Leading open weights model on GDPval-AA v2 and competitive with proprietary models: GLM-5.2 scores 1524 on GDPval-AA v2, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro (max, 1328). This impressive result places GLM-5.2 in-line with proprietary models including GPT-5.5 (xhigh reasoning). GDPval-AA v2 builds on the original GDPval-AA by baselining Elo to human performance at 1000, introducing a rotating panel of frontier-model judges, and raising the turn limit from 100 to 250 for longer-horizon agent trajectories

➤ GLM-5.2 uses more output tokens per task than other leading open weights models: the model uses 43k output tokens per Intelligence Index task, up from GLM-5.1 (26k) and above MiniMax-M3 (24k), Kimi K2.6 (35k) and DeepSeek V4 Pro (max, 37k)

➤ On the Intelligence vs. Cost per Task Pareto Frontier: GLM-5.2 is on the Pareto frontier of the Intelligence vs Cost per Task chart, with the lowest cost per task among models at its intelligence level. GLM-5.2 costs ~$0.46 per task, compared to GLM-5.1 ($0.25), Kimi K2.6 ($0.31), MiniMax-M3 ($0.18) and DeepSeek V4 Pro (max, $0.05)

Additional Model Details:

➤ License: MIT

➤ Size: 744B total parameters, 40B active parameters, equivalent to GLM-5.1

➤ Context window: 1M tokens, up from 200K on GLM-5.1

➤ Pricing: $1.4/$0.26/$4.4 per 1M input/cache hit/output tokens

➤ Availability: Alongside Z ai's first-party API, GLM-5.2 is available across third-party providers including @DeepInfra, @novita_labs, @nebiusai, @parasailnetwork , @SiliconFlowAI , @gmi_cloud , @Baseten and @FireworksAI_HQ

6:09 AM · Jun 17, 2026 · 4.9K Views

Developer Impact

Open weights keep closing the gap

With roughly 753 billion total parameters and an MIT license, the model gives developers a self-hostable option that leads other openly available weights on the composite intelligence metrics without requiring proprietary API keys.

Performance Watch

Metrics worth watching on real workloads

The reported throughput and sub-dollar task cost sit on the competitive edge of the current Pareto frontier, though exact placement can shift as the index evolves its emphasis toward longer agentic evaluations.

Sentiment

Positive users are excited about GLM-5.2 topping open-weight rankings and matching top models due to its performance and cost advantages, while negative users dismiss the benchmarks as unreliable or unrepresentative.

Pos

74.2%

Neg

25.8%

18 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

ARTIFICIALANALYSIS.AIVia

#1360

Posts from X

Most Activity

VIEWS7.6KBOOKMARKS16LIKES199RETWEETS13REPLIES16

Chubby♨️@kimmonismus

GLM-5.2 max is currently the third best model available, across both open and proprietary options.

And that's fantastic. Open source is fundamentally important and must continue to hold a strong position so that everyone has open alternatives. And I say that with conviction.

1h7.6K19916

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

«Sonnet-medium» lol No, we're not dealing with Sonnet level here. China bros want the frontier.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2h1.3K240

Chubby♨️@kimmonismus

https://artificialanalysis.ai/models/glm-5-2

Chubby♨️@kimmonismus

GLM-5.2 max is currently the third best model available, across both open and proprietary options.

And that's fantastic. Open source is fundamentally important and must continue to hold a strong position so that everyone has open alternatives. And I say that with conviction.

1h2.7K71

Umid KD@UmidKD

@teortaxesTex Wait, what? Sonnet? Medium? Who said that? For me it has been nearly indistinguishable from Opus 4.8 so far.

2h932

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@UmidKD one tard did

2h684

Milan@influenist

@teortaxesTex It's a pitty they showing around this bench. Would rather see SWE or smth.

2h62

Kick@Kick4642

@kimmonismus But are you impressed?

1h53

Vishal Singh 🥑@vishalsingh2972

@kimmonismus Even price point view token/$ they have unbeatable cost advantage.

1h26

Philow🇬🇬@Phi10w

@teortaxesTex They beat the US on video gen, they will def come for frontier

2h13

MecComp@engMecComp

@teortaxesTex have you tried this? for me its better than Opus and GPT

2h801

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@influenist it's Opus class on those too

2h611

Chubby♨️@kimmonismus

@calhim7 hmmm good call. Let me check!

1h421

egesea@egesea009

@kimmonismus Open source doesn't need to be #1 to matter. Its greatest value may be forcing everyone else to keep improving. Competition isn't just about creating winners. It's about preventing permanent winners.

39m52