12d ago

Researcher Gears Up to Release Two Exciting Research Projects

0
Original post

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

2:14 PM · May 4, 2026 View on X

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

9:14 PM · May 4, 2026 · 38.3K Views

In my biased view, they are the kinds of things that help start mini-fields around new problems and new algorithmic paradigms. Stay tuned.

Omar KhattabOmar Khattab@lateinteraction

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

9:14 PM · May 4, 2026 · 38.3K Views
9:18 PM · May 4, 2026 · 8.7K Views

The first one is out. I can't quite describe how deeply I've hoped for anything like this since 2021. A retrieval benchmark that is actually, genuinely, hard for first-stage retrieval reasons -- not due to multi-hop tricks, or due to being ill-posed, etc.

DianeDiane@dianetc_

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

3:52 PM · May 6, 2026 · 75.2K Views
4:00 PM · May 6, 2026 · 1.3K Views

The second effort promised above is now public. I think this paradigm might be a key step toward transforming how we teach new skills to LLMs.

Souradip ChakrabortySouradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

10:46 PM · May 14, 2026 · 81.9K Views
11:02 PM · May 14, 2026 · 8.3K Views
Researcher Gears Up to Release Two Exciting Research Projects · Digg