6h ago

RL Optimization Cuts Step Time 3.5x for Long Prompts or High Rollouts

0
Original post

@PandaAshwinee Shouldn't matter what type of RL as long as either the prompt is long or instead your rollout n is quite high you can easily compute the potential savings by feeding 3 numbers into this: n_rollout x (prompt_len + response_len) / (prompt_len + n_rollout x response_len)

12:35 PM · May 27, 2026 View on X

@_lewtun Thank you for the kind words and the correction, Lewis!

Lewis TunstallLewis Tunstall@_lewtun

@StasBekman Very cool work Stas! Turbo nit: it's arguably special relativity (not general) which states massive particles can't travel faster than the speed of light :)

9:10 PM · May 27, 2026 · 467 Views
10:53 PM · May 27, 2026 · 342 Views