1d ago

SGLang releases version 0.5.12 with a TokenSpeed MLA attention backend optimized for NVIDIA Blackwell GPUs and FP8 KV cache support

Gains target DeepSeek V3.2 and GLM-5 via GEMM optimizations.

016524.1K

——0——

Original post

#1158@BANGHUAZOP

Xiaoyong@XYLIU

sglang had the right performance foundations if you’re not seeing sota performance for your deployment with oss models try again and ask for help since you should get it

9:35 PM · May 17, 2026

Reposted by

#603@YING11231