3h ago

SGLang V0.5.12.Post1 Releases Stability Fixes For Deepseek V4 And Blackwell

โ€”โ€”0โ€”โ€”
Original post

๐Ÿš€ SGLang v0.5.12.post1 is live This is a stability patch on top of v0.5.12, with 12 cherry-picks focused on DeepSeek V4, NIXL PD disaggregation, and Blackwell. DeepSeek V4 ๐Ÿ”ธ Fixed V4-Pro garbled text on single-token decode (B200/B300) ๐Ÿ”ธ Fixed EAGLE/MTP disagg decode crash at ~2000 reqs (SWA allocator) ๐Ÿ”ธ Fixed NSA prefill context-parallel scheduler crash at startup ๐Ÿ”ธ Fixed HiSparse + Compressor V2 accuracy (GSM8K 0.825 โ†’ 0.960) ๐Ÿ”ธ Enabled PD disaggregation under pp_size > 1 ๐Ÿ”ธ Fixed V4-Flash dummy-load + FlashInfer mxfp4 illegal memory access in CUDA-graph capture ๐Ÿ”ธ Fixed HiCache + SWA stale translation indices after cache rebuild NIXL PD ๐Ÿ”ธ Fixed aux-state transfer (send on is_last, expect state only when truthy) Blackwell / B300 ๐Ÿ”ธ Defaulted nvidia-cutlass-dsl to cu13, fixing Qwen 3.5 crash on sm_103 in FlashAttention-4 Performance ๐Ÿ”ธ Eliminated 20โ€“40s cold-bucket stalls on DSV4 via MHC token-count prewarm ๐Ÿ”ธ Cut V4-Pro JIT cost by precompiling DeepGEMM bf16/fp32 dispatch branch Other ๐Ÿ”ธ Fixed missing group arg in get_dp_buffer

9:42 PM ยท May 26, 2026 View on X
SGLang V0.5.12.Post1 Releases Stability Fixes For Deepseek V4 And Blackwell ยท Digg