🐤 Inference-time sparse attention for RL is finally working.
Turns out the real challenge wasn’t sparsity—it was getting asymmetric off-policy rollouts to stay stable.
Sparrow discovers a robust stability threshold that lets us push sparsity to the limit: ~2× faster rollouts, minimal tuning, and stable long-CoT RL.
RL is painfully slow 😭 — bottlenecked by super-long CoT rollout.
🔭 Sparse attention should help, but naive sparse rollout hits a brutal efficiency–stability tradeoff: A tedious trial-and-error sparsity sweep for each dense policy is required before an actual RL run.
🐤Sparrow chirps no more pain! Introduce Sparrow: Sparse Rollout for stable and efficient long-context RL.
Sparrow finds that: 💡As long as we keep the tail distribution mismatch throughout the sparse rollout above a critical threshold, the RL training will be stable. 💡Even cooler! Through comprehensive control studies of Qwen3-1.7B, 4B, 8B thinking models RL with 40K rollout max length, the critical threshold stays constant across model sizes. 💡Sparrow then finds the optimal dynamic sparse schedule to reach the threshold with minimal cost. 💡Sparrow's findings are empirically validated to generalize in Qwen3-14B, and hold on both Math and Coding RL.
🐤Sparrow empirically helps achieve 2.2× / 2.4× / 2.0× rollout speedup on Qwen3 1.7B / 4B / 8B thinking models, while keeping training stability over extended RL steps. We release the 🐤bird in the following formats. [1/n] Paper: https://arxiv.org/abs/2606.08446 Code: https://github.com/Infini-AI-Lab/Sparrow Blog: https://infini-ai-lab.github.io/sparrow_project_release/