Omar Khattab signals new variance reduction method surpassing state of the art
Omar Khattab, assistant professor at the MIT CSAIL NLP group, signaled that a new variance reduction method will surpass the current state-of-the-art result within approximately one day. Research engineer Will Brown responded to the approach, describing it as effective within a constant factor and expressing interest in examining the details further. The exchange occurred in a reply thread discussing optimization techniques in machine learning.
@willccbb until ~tomorrow
SOTA method for variance reduction:
@willccbb "~Tomorrow" ~= 10 days, give or take - right?
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL
@lateinteraction constant factor, close enough :) veeeery nice approach, really excited to dig into it further!
@willccbb "~Tomorrow" ~= 10 days, give or take - right?