/Tech5h ago

Open-source GLM-5.2 hits 318 tokens per second, tripling the speed of proprietary frontier models

Databricks CEO Ali Ghodsi highlighted the breakthrough for agentic workloads.

631.1K75163101.9K

#441

Original post

Ali Ghodsi@alighodsi#1613inTech

Congrats! Open source GLM model is really a game changer! Extremely fast, cheap, and high quality!

Dmytro Dzhulgakov@dzhulgakov

you may have heard that glm-5.2 at 280 token/s is cool, how about 318

and we still have room to go

8:16 AM · Jun 25, 2026 · 17.4K Views

Sentiment

Many users praised GLM-5.2's open-source 318 tokens-per-second speed and usefulness for building apps, while some criticized its quantization, token efficiency, and deployment uptime.

Pos

85.0%

Neg

15.0%

25 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS4.2K

Yuchen Jin@Yuchenj_UW

@jietang Databricks 🩷 GLM

5h4.2K36

BOOKMARKS6REPLIES2

naeem@identity_matrix

@jietang This one is a must read for you

How we built the world’s fastest API for GLM-5.2 https://share.google/VTQ84a7BtWc8eN6om

5h73976

LIKES74

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@jietang B300s bro

5h2.7K741

RETWEETS27

jietang@jietang

318 tps.....crazy..... how can you make this happen... we have to work harder even more...

Dmytro Dzhulgakov@dzhulgakov

you may have heard that glm-5.2 at 280 token/s is cool, how about 318

and we still have room to go

5h79.2K858127

Ali Ghodsi@alighodsi

@heng_yan Oh you're right! Key point being that open source GLM is over 300 tokens per second. This matters for agents (we all hate waiting 10 minutes for responses). Proprietary frontier models are at best at 100 tokens per second. So 3x speedup really matters for agentic workloads.

4h53239

Heng Yan@heng_yan

@alighodsi I thought it is 392, 😉

5h31981

Ali Ghodsi@alighodsi

4h3.6K667

Di Zhang@di_zhang_fdu

@jietang As I Know, Dflash + TileRT is a possible way

4h1.7K71

Dmytro Dzhulgakov@dzhulgakov

@jietang Thanks and thank you for cooking an incredible model, GLM-5.2 rocks

4h33131

Ventus@BrainleapLab

@jietang And Databricks reaching 392 just now… the inference community is cooking 🔥 Power of OPEN SOURCE！

5h37921

ueaj@_ueaj

@jietang TPSmogged

5h26311

zeni@jaga_prasanna

@jietang Get ur kernels and PD disaggregation optimized bro

3h4181

Eric@Ex0byt

@jietang @jietang you didn't know about this??

5h3.1K1

Alex@Alex_m

@jietang They aren't specifying if it's FP8 or something. I hope they aren't just hyping up @UnslothAI's 1-bit version, considering it only hits 76% accuracy.

5h6935