You're wasting FLOPs when scaling inference compute: by independently sampling parallel attempts, you burn compute rediscovering the same solutions.
Introducing QuasiMoTTo: we scale parallel sampling with correlated samples instead! These samples have higher coverage, are marginally exact draws from the LLM, and can be generated in parallel.
Result: same performance with 25-47% fewer samples in test-time scaling + 50% fewer training steps in RL!
In our new paper, we explore the design space of correlated samplers. Work with co-authors @probablynotaz9 (co-lead), @gandhikanishk, @noahdgoodman, and Emily Fox!





