🎉 SGLang v0.5.13 is out!
First, new model support!
Nemotron 3 Ultra, Step-3.7-Flash, Command A+, plus new diffusion models: Cosmos3, FLUX.2-Klein, Ideogram 4, LingBot-World, SANA-WM, and Ernie-Image.
Here are the highlights for this release: - Speculative Decoding V2 is now the default! Tree drafting (topk>1) for faster generation - Breakable CUDA Graphs now make prefill faster - Qwen 3.5 runs faster on NVIDIA Blackwell with new GDN kernels - HiCache with UnifiedTree on by default for hybrid SWA/Mamba models - SGLang-Diffusion now supports realtime generation! Plus progressive resolution - Multiple performance and feature updates for DeepSeek V4
Thanks to our amazing partners and model makers: @NVIDIAAI @AMD @intel @awscloud @boson_ai @cohere @bfl_ml @ideogram_ai @deepseek_ai @Kimi_Moonshot @Alibaba_Qwen @StepFun_ai @Baidu_Inc @robbyant_brain



