/Tech2h ago

Databricks' Yuchen Jin says system-level optimizations pushed GLM-5.2 inference to a leaderboard-topping 392 tokens/s

Story Overview

Databricks engineer Yuchen Jin highlighted system-level tweaks that lifted GLM-5.2 inference to 392 tokens per second on the Artificial Analysis leaderboard, surpassing Fireworks' prior 328 tokens per second mark on a custom endpoint. The result spotlights engineering refinements applied to a leading open-weights model that already posts strong intelligence scores and a 1M-token context window.

78676639381.4K

#403

Original post

Yuchen Jin@Yuchenj_UW#586inTech

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

8:26 AM · Jun 25, 2026 · 60K Views

Developer Impact

Serving speed directly shapes real-world usability

Faster output rates can cut latency and raise throughput for developers running GLM-5.2, yet the announcement supplies no workload-specific benchmarks or cost figures to quantify the advantage.

Open Question

The path to these gains remains under the hood

The claim credits extensive optimizations and references the OpenCode harness, but leaves hardware details, batch settings, and exact methods unspecified, so independent replication stays an open question.

Sentiment

Many users are excited about Databricks topping the GLM-5.2 inference speed leaderboard at 392 tokens/s because it highlights impressive open-source progress and future AI potential.

Pos

94.1%

Neg

5.9%

18 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS8.5KLIKES89RETWEETS9

Matei Zaharia@matei_zaharia

The Databricks inference team is 🔥

Yuchen Jin@Yuchenj_UW

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

2h8.5K899

BOOKMARKS10REPLIES4

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Does this answer your question about its positioning @TheZvi ? I told you it's a hardware issue. GLM 5.2 can be served very, very quickly. Databricks won't stop here, neither will their competitors. Chinese models are *much better* than Chinese silicon.

Yuchen Jin@Yuchenj_UW

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

2h7.2K3110

Yuchen Jin@Yuchenj_UW

Databricks is all in oss models.

We're looking for cracked AI inference engineers to join us at Databricks AI to produce trillions of tokens.

DM me if you have: - Contributed to open-source ML systems like SGLang/vLLM/PyTorch/Dynamo - Experience serving LLMs at large scale

Yuchen Jin@Yuchenj_UW

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

2h2.8K539

Yuchen Jin@Yuchenj_UW

@teortaxesTex @TheZvi chip matters, but so do the optimizations we made across speculative decoding, kernels, etc. Proud of the Databricks inference team.

We had reached 201 tok/s on H200s a few days ago, and B300s gave us a substantial additional boost.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

2h532142

kache@yacineMTB

@Yuchenj_UW

kache@yacineMTB

@Yuchenj_UW awesome

2h766131

Zephyr@zephyr_z9

@teortaxesTex @TheZvi So Databricks hired Yuchen to build their inference provider biz

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

1h2K41

kache@yacineMTB

@Yuchenj_UW awesome

Yuchen Jin@Yuchenj_UW

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

2h1.5K110

Ai agent@ai_agent001

@Yuchenj_UW 392 now, calling it 400+ by next week

2h2665

Yuchen Jin@Yuchenj_UW

@ai_agent001 it's only a start.

2h2867

Taelin@VictorTaelin

@Yuchenj_UW Quantized?

Yuchen Jin@Yuchenj_UW

You may have heard that GLM-5.2 at 328 token/s is cool,

How about 392?

Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

1h1.1K50

Yuchen Jin@Yuchenj_UW

@yacineMTB wow, great to see our Unity AI Gateway page on X.

We’ve launched to a select people/workspaces and plan to roll it out more broadly.

btw, how do you feel about our AI gateway UI/UX?

2h1053

Yuchen Jin@Yuchenj_UW

@matei_zaharia LFG!

Matei Zaharia@matei_zaharia

The Databricks inference team is 🔥

1h29360

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@Yuchenj_UW @TheZvi Yeah, I wonder how much faster zAI could go with fixed hardware. But you have structurally higher ceiling thanks to Blackwell.

Yuchen Jin@Yuchenj_UW

@teortaxesTex @TheZvi chip matters, but so do the optimizations we made across speculative decoding, kernels, etc. Proud of the Databricks inference team.

We had reached 201 tok/s on H200s a few days ago, and B300s gave us a substantial additional boost.

2h24460