3h ago

SGLang v0.5.12 Merges DeepSeek V4 With Optimized Kernels And Hardware Support

โ€”โ€”0โ€”โ€”
Original post

๐Ÿ‹ DeepSeek V4 is now merged into SGLang main with v0.5.12. What we shipped at launch: ๐Ÿ”น ShadowRadix: native prefix caching for V4's hybrid attention ๐Ÿ”น HiSparse: CPU-extended KV for sparse attention (up to 3ร— long-context throughput) ๐Ÿ”น MTP speculative decoding with in-graph metadata preparation ๐Ÿ”น W4A8 MegaMoE kernel ๐Ÿ”น Flash Compressor + Lightning TopK kernels ๐Ÿ”น Multiple parallelism methods: Tensor Parallelism/Expert Parallelism/Context Parallelism/Data Parallelism Attention ๐Ÿ”น Prefill Decode Disaggregation ๐Ÿ”น Hardware: H100, H200, B200, B300, GB200, GB300, MI35X And what we added since: ๐Ÿ”น HiCache for V4 under UnifiedRadixTree ๐Ÿ”น W4A4 MegaMoE kernels for faster MegaMoE ๐Ÿ”น Marlin/FlashInfer MXFP4 (W4A16) MoE on Hopper ๐Ÿ”น Hierarchical multi-stream overlap for small-batch decode ๐Ÿ”น Optimized mHC pipeline: DeepGemm + fused norm + fused hc_head ๐Ÿ”น Faster KV Compression V2 kernel ๐Ÿ”น Fused SiLU+clamp+FP8 quantization kernel ๐Ÿ”น Support TP16 on H100/H20 ๐Ÿ”น Support Multiple Detokenizers ๐Ÿ”นPipeline Parallelism ๐Ÿ”นOne docker image for all supported Nvidia hardware Thanks to @NVIDIAAI, @AMD, @ant_oss, @alibaba_cloud, ByteDance, @iFLYTEKLab, @radixark, and @pranjalssh for the work we shipped together on V4 ๐Ÿ™Œ More in 0.5.12 ๐Ÿ‘‡

12:10 PM ยท May 16, 2026 View on X
Reposted by
SGLang v0.5.12 Merges DeepSeek V4 With Optimized Kernels And Hardware Support ยท Digg