/Tech5h ago

SGLang-JAX releases TPU v7 serving optimizations for the 1-trillion-parameter Ling-2.6-1T model, speeding up prefill by 53%

A custom Pallas kernel overlaps data movement with compute

6647219.5K

#851

Original post

LMSYS Org@lmsysorg

🚀 Our new blog: Optimizing Ling-2.6-1T on TPU with SGLang-JAX: Hiding MoE Data Movement Behind Compute with One Pallas Kernel

Ling-2.6-1T, a 1T hybrid MoE model, now serves on TPU v7x with SGLang-JAX. The SGLang-JAX team worked together with @inclusionAI on two fronts: upgrading the fused MoE kernel for deeper compute/comms overlap, and bringing up the full hybrid backbone.

1️⃣ Fused MoE V2: keeps tokens + accumulators VMEM-resident and double-buffers expert weights, hiding routing & prefetch behind compute → MoE prefill −53% 2️⃣ Hybrid memory pools: per-token MLA KV for 10 full-attn layers + per-request recurrent state for 70 GLA layers 3️⃣ GLA linear attention via chunk-wise parallel prefill 4️⃣ Single-controller DP keeps grouped RMSNorm chip-local, no per-layer cross-chip reduce

10:07 AM · Jun 17, 2026 · 6.1K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.9KBOOKMARKS5LIKES16

Ying Sheng@ying11231

Impressive performance on TPU.

LMSYS Org@lmsysorg

🚀 Our new blog: Optimizing Ling-2.6-1T on TPU with SGLang-JAX: Hiding MoE Data Movement Behind Compute with One Pallas Kernel

5h1.9K165

RETWEETS2

Banghua Zhu@BanghuaZ

People are underestimating how fast TPU inference is moving. In just a few months, SGLang + TPU ecosystem has matured significantly, and this Ling-2.6-1T work shows very promising results: serious MoE overlap, hybrid memory management, and production-scale serving on TPU.

Thanks for the collaborations with @TheInclusionAI and @Google!

LMSYS Org@lmsysorg

🚀 Our new blog: Optimizing Ling-2.6-1T on TPU with SGLang-JAX: Hiding MoE Data Movement Behind Compute with One Pallas Kernel

2h1.5K135

LMSYS Org@lmsysorg

Full writeup: https://www.lmsys.org/blog/2026-06-17-ling-2-6-tpu/

6h3283