You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.
Databricks engineer Yuchen Jin highlighted system-level tweaks that lifted GLM-5.2 inference to 392 tokens per second on the Artificial Analysis leaderboard, surpassing Fireworks' prior 328 tokens per second mark on a custom endpoint. The result spotlights engineering refinements applied to a leading open-weights model that already posts strong intelligence scores and a 1M-token context window.
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.
Faster output rates can cut latency and raise throughput for developers running GLM-5.2, yet the announcement supplies no workload-specific benchmarks or cost figures to quantify the advantage.
The claim credits extensive optimizations and references the OpenCode harness, but leaves hardware details, batch settings, and exact methods unspecified, so independent replication stays an open question.
Many users are excited about Databricks topping the GLM-5.2 inference speed leaderboard at 392 tokens/s because it highlights impressive open-source progress and future AI potential.
No Digg Deeper questions have been answered for this story yet.
The Databricks inference team is 🔥
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.
Does this answer your question about its positioning @TheZvi ? I told you it's a hardware issue. GLM 5.2 can be served very, very quickly. Databricks won't stop here, neither will their competitors. Chinese models are *much better* than Chinese silicon.
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.
Databricks is all in oss models.
We're looking for cracked AI inference engineers to join us at Databricks AI to produce trillions of tokens.
DM me if you have: - Contributed to open-source ML systems like SGLang/vLLM/PyTorch/Dynamo - Experience serving LLMs at large scale
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.
@teortaxesTex @TheZvi chip matters, but so do the optimizations we made across speculative decoding, kernels, etc. Proud of the Databricks inference team.
We had reached 201 tok/s on H200s a few days ago, and B300s gave us a substantial additional boost.
Does this answer your question about its positioning @TheZvi ? I told you it's a hardware issue. GLM 5.2 can be served very, very quickly. Databricks won't stop here, neither will their competitors. Chinese models are *much better* than Chinese silicon.
@teortaxesTex @TheZvi So Databricks hired Yuchen to build their inference provider biz
Does this answer your question about its positioning @TheZvi ? I told you it's a hardware issue. GLM 5.2 can be served very, very quickly. Databricks won't stop here, neither will their competitors. Chinese models are *much better* than Chinese silicon.
@Yuchenj_UW awesome
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

@Yuchenj_UW 392 now, calling it 400+ by next week

@ai_agent001 it's only a start.
@Yuchenj_UW Quantized?
You may have heard that GLM-5.2 at 328 token/s is cool,
How about 392?
Databricks is now #1 in inference speed for GLM-5.2 on Artificial Analysis. It's a great model, and we did a lot of optimizations.

@yacineMTB wow, great to see our Unity AI Gateway page on X.
We’ve launched to a select people/workspaces and plan to roll it out more broadly.
btw, how do you feel about our AI gateway UI/UX?
@matei_zaharia LFG!
The Databricks inference team is 🔥
@Yuchenj_UW @TheZvi Yeah, I wonder how much faster zAI could go with fixed hardware. But you have structurally higher ceiling thanks to Blackwell.
@teortaxesTex @TheZvi chip matters, but so do the optimizations we made across speculative decoding, kernels, etc. Proud of the Databricks inference team.
We had reached 201 tok/s on H200s a few days ago, and B300s gave us a substantial additional boost.

@Yuchenj_UW holy shiet

@teortaxesTex Certainly helps since they're charging the same price.

@Yuchenj_UW Things are getting out of hand now 😂

@Yuchenj_UW get started with databricks is unecessary. remove that hop. it will increase your conversion by 30%

@Yuchenj_UW Funny how fast this flips, whoever's on top here barely holds it for a day

@Yuchenj_UW where do i buy

@Yuchenj_UW B300s and speculative decoding go brr