🚀 New blog: Agent-Assisted SGLang Development, the story of how we turn benchmarking, profiling, and kernel optimization know-how into executable agent skills.
Agent-assisted workflows are saving our team massive engineering hours while delivering major gains across the stack: ⚡️ +71.4% throughput & TTFT 456→168ms for Qwen3-Next via allreduce fusion ⚡️ 29–49% TTFT reduction on long-context prompts via router tokenization deduplication ⚡️ Up to 2.32x diffusion denoising speedup via Spectral Progressive Diffusion ⚡️ 10 B200 kernel tasks at 1.13x–2.75x speedups via KDA-Pilot; 3 PRs merged upstream ⚡️ 1.41x faster LTX-2 VAE decode, saving 9.7 GiB peak memory
And rigor is built into every step: benchmarks are fixed before any patching, baseline and candidate share the same ABI, and every change must be backed by profile evidence, eliminating benchmark reward hacking. Each iteration passes a Humanize/RLCR review loop before proceeding.
Read the full blog to see how we're rethinking development workflow 👇
