๐ SGLang v0.5.12.post1 is live
This is a stability patch on top of v0.5.12, with 12 cherry-picks focused on DeepSeek V4, NIXL PD disaggregation, and Blackwell.
DeepSeek V4
๐ธ Fixed V4-Pro garbled text on single-token decode (B200/B300)
๐ธ Fixed EAGLE/MTP disagg decode crash at ~2000 reqs (SWA allocator)
๐ธ Fixed NSA prefill context-parallel scheduler crash at startup
๐ธ Fixed HiSparse + Compressor V2 accuracy (GSM8K 0.825 โ 0.960)
๐ธ Enabled PD disaggregation under pp_size > 1
๐ธ Fixed V4-Flash dummy-load + FlashInfer mxfp4 illegal memory access in CUDA-graph capture
๐ธ Fixed HiCache + SWA stale translation indices after cache rebuild
NIXL PD
๐ธ Fixed aux-state transfer (send on is_last, expect state only when truthy)
Blackwell / B300
๐ธ Defaulted nvidia-cutlass-dsl to cu13, fixing Qwen 3.5 crash on sm_103 in FlashAttention-4
Performance
๐ธ Eliminated 20โ40s cold-bucket stalls on DSV4 via MHC token-count prewarm
๐ธ Cut V4-Pro JIT cost by precompiling DeepGEMM bf16/fp32 dispatch branch
Other
๐ธ Fixed missing group arg in get_dp_buffer
