/Tech7h ago

Sparrow Delivers Stable Sparse Rollouts for 2x Faster Long-Context RL

1575438K
Original post
Beidi Chen@BeidiChen#574inTech

🐤 Inference-time sparse attention for RL is finally working.

Turns out the real challenge wasn’t sparsity—it was getting asymmetric off-policy rollouts to stay stable.

Sparrow discovers a robust stability threshold that lets us push sparsity to the limit: ~2× faster rollouts, minimal tuning, and stable long-CoT RL.

Infini-AI-Lab@InfiniAILab

RL is painfully slow 😭 — bottlenecked by super-long CoT rollout.

🔭 Sparse attention should help, but naive sparse rollout hits a brutal efficiency–stability tradeoff: A tedious trial-and-error sparsity sweep for each dense policy is required before an actual RL run.

🐤Sparrow chirps no more pain! Introduce Sparrow: Sparse Rollout for stable and efficient long-context RL.

Sparrow finds that: 💡As long as we keep the tail distribution mismatch throughout the sparse rollout above a critical threshold, the RL training will be stable. 💡Even cooler! Through comprehensive control studies of Qwen3-1.7B, 4B, 8B thinking models RL with 40K rollout max length, the critical threshold stays constant across model sizes. 💡Sparrow then finds the optimal dynamic sparse schedule to reach the threshold with minimal cost. 💡Sparrow's findings are empirically validated to generalize in Qwen3-14B, and hold on both Math and Coding RL.

🐤Sparrow empirically helps achieve 2.2× / 2.4× / 2.0× rollout speedup on Qwen3 1.7B / 4B / 8B thinking models, while keeping training stability over extended RL steps. We release the 🐤bird in the following formats. [1/n] Paper: https://arxiv.org/abs/2606.08446 Code: https://github.com/Infini-AI-Lab/Sparrow Blog: https://infini-ai-lab.github.io/sparrow_project_release/

2:36 PM · Jun 10, 2026 · 4.2K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.8KBOOKMARKS21LIKES24RETWEETS3
Beidi Chen@BeidiChen

Rollout generation is becoming a central bottleneck in long-context RL says everyone — and recent efforts from Cursor @cursor_ai and Fireworks @FireworksAI_HQ show how just important disaggregating rollout and training is today.

Excited to share Sparrow🐤, which makes sparse rollout practical and accelerate stable long-context RL.

Check it out!

Infini-AI-Lab@InfiniAILab

RL is painfully slow 😭 — bottlenecked by super-long CoT rollout.

🔭 Sparse attention should help, but naive sparse rollout hits a brutal efficiency–stability tradeoff: A tedious trial-and-error sparsity sweep for each dense policy is required before an actual RL run.

🐤Sparrow chirps no more pain! Introduce Sparrow: Sparse Rollout for stable and efficient long-context RL.

Sparrow finds that: 💡As long as we keep the tail distribution mismatch throughout the sparse rollout above a critical threshold, the RL training will be stable. 💡Even cooler! Through comprehensive control studies of Qwen3-1.7B, 4B, 8B thinking models RL with 40K rollout max length, the critical threshold stays constant across model sizes. 💡Sparrow then finds the optimal dynamic sparse schedule to reach the threshold with minimal cost. 💡Sparrow's findings are empirically validated to generalize in Qwen3-14B, and hold on both Math and Coding RL.

🐤Sparrow empirically helps achieve 2.2× / 2.4× / 2.0× rollout speedup on Qwen3 1.7B / 4B / 8B thinking models, while keeping training stability over extended RL steps. We release the 🐤bird in the following formats. [1/n] Paper: https://arxiv.org/abs/2606.08446 Code: https://github.com/Infini-AI-Lab/Sparrow Blog: https://infini-ai-lab.github.io/sparrow_project_release/

7hViews 3.8KLikes 24Bookmarks 21