We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.
In my biased view, they are the kinds of things that help start mini-fields around new problems and new algorithmic paradigms. Stay tuned.
We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.
The first one is out. I can't quite describe how deeply I've hoped for anything like this since 2021. A retrieval benchmark that is actually, genuinely, hard for first-stage retrieval reasons -- not due to multi-hop tricks, or due to being ill-posed, etc.
We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.
The second effort promised above is now public. I think this paradigm might be a key step toward transforming how we teach new skills to LLMs.
🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL