Congrats! Open source GLM model is really a game changer! Extremely fast, cheap, and high quality!
you may have heard that glm-5.2 at 280 token/s is cool, how about 318
and we still have room to go
Databricks CEO Ali Ghodsi highlighted the breakthrough for agentic workloads.
Congrats! Open source GLM model is really a game changer! Extremely fast, cheap, and high quality!
you may have heard that glm-5.2 at 280 token/s is cool, how about 318
and we still have room to go
Many users praised GLM-5.2's open-source 318 tokens-per-second speed and usefulness for building apps, while some criticized its quantization, token efficiency, and deployment uptime.
No Digg Deeper questions have been answered for this story yet.

@jietang Databricks 🩷 GLM

@jietang This one is a must read for you
How we built the world’s fastest API for GLM-5.2 https://share.google/VTQ84a7BtWc8eN6om

@jietang B300s bro
318 tps.....crazy..... how can you make this happen... we have to work harder even more...
you may have heard that glm-5.2 at 280 token/s is cool, how about 318
and we still have room to go

@heng_yan Oh you're right! Key point being that open source GLM is over 300 tokens per second. This matters for agents (we all hate waiting 10 minutes for responses). Proprietary frontier models are at best at 100 tokens per second. So 3x speedup really matters for agentic workloads.

@alighodsi I thought it is 392, 😉
@heng_yan Oh you're right! Key point being that open source GLM is over 300 tokens per second. This matters for agents (we all hate waiting 10 minutes for responses). Proprietary frontier models are at best at 100 tokens per second. So 3x speedup really matters for agentic workloads.

@jietang As I Know, Dflash + TileRT is a possible way

@jietang Thanks and thank you for cooking an incredible model, GLM-5.2 rocks

@jietang And Databricks reaching 392 just now… the inference community is cooking 🔥 Power of OPEN SOURCE!

@jietang TPSmogged

@jietang Get ur kernels and PD disaggregation optimized bro

@jietang @jietang you didn't know about this??

@jietang They aren't specifying if it's FP8 or something. I hope they aren't just hyping up @UnslothAI's 1-bit version, considering it only hits 76% accuracy.

@jietang Thanks for your work on the GLM models.. I have your coding plan and am amazed at what i've built!

@jietang bro is running 40tps as 1st party provider😂

@jietang

@Yuchenj_UW @jietang All these token/s numbers aren’t on OpenRouter though…. Are you just doing super low batch size for benches 🫠

@jietang hello, would you like some help with that?

@jietang americah