/Tech6h ago

SGLang, led by Banghua Zhu, releases version 0.5.13 with Breakable CUDA Graphs to optimize prefill operations

New supported models include Nemotron 3 Ultra and Cosmos3.

19111123516.8K

#851

Original post

SGLang@sgl_project

🎉 SGLang v0.5.13 is out!

First, new model support!

Nemotron 3 Ultra, Step-3.7-Flash, Command A+, plus new diffusion models: Cosmos3, FLUX.2-Klein, Ideogram 4, LingBot-World, SANA-WM, and Ernie-Image.

Here are the highlights for this release: - Speculative Decoding V2 is now the default! Tree drafting (topk>1) for faster generation - Breakable CUDA Graphs now make prefill faster - Qwen 3.5 runs faster on NVIDIA Blackwell with new GDN kernels - HiCache with UnifiedTree on by default for hybrid SWA/Mamba models - SGLang-Diffusion now supports realtime generation! Plus progressive resolution - Multiple performance and feature updates for DeepSeek V4

Thanks to our amazing partners and model makers: @NVIDIAAI @AMD @intel @awscloud @boson_ai @cohere @bfl_ml @ideogram_ai @deepseek_ai @Kimi_Moonshot @Alibaba_Qwen @StepFun_ai @Baidu_Inc @robbyant_brain

7:03 PM · Jun 12, 2026 · 7.3K Views

Sentiment

Users are praising contributor @ChengWan17 and expressing excitement about the breakable CUDA graph feature added in SGLang V0.5.13 for enabling more flexible prefill.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.5KBOOKMARKS3LIKES20REPLIES1

Ying Sheng@ying11231

Optimizing prefill with flexibility. Cool work @csy_789 @Oasis_a19 , and nice name @ChengWan17 haha!

Cheng Wan@ChengWan17

The standout feature in SGLang v0.5.13 is BCG (Breakable CUDA Graph). It delivers prefill efficiency comparable to PCG (Piecewise CUDA Graph), while being significantly more flexible and compatible with advanced optimizations.

BCG is also a powerful debugging tool — it enables eager execution inside CUDA graph replay, so you can easily print debug info or inspect intermediate states.

Prototype by @csy_789. @Oasis_a19 turned it into a production-ready prefill optimization. I was fortunate to pick the name for this awesome technique. 😊

3h2.5K203