Being this excited about five rather unexpected research projects simultaneously is almost too painful.
Assuming that we figure out how to sequence these releases, y’all are going to thoroughly love each of these.
alright, two of the five are out since April 24 :D
funnily enough, i'm still somehow excited about five (3+2) again currently. y'all will find them really really nice.
Being this excited about five rather unexpected research projects simultaneously is almost too painful. Assuming that we figure out how to sequence these releases, y’all are going to thoroughly love each of these.
from the ones released recently, read #2 Pedagogical RL at:
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL
from the ones released recently, read #1 OBLIQ-Bench at:
We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.