12d ago

Researcher Gears Up to Release Two Exciting Research Projects

277243512456.6K

——0——

Original post

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

2:14 PM · May 4, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

ORIGINAL POST

#160Omar Khattab@LATEINTERACTION

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

9:14 PM · May 4, 2026 · 38.3K Views

#160Omar Khattab@LATEINTERACTION

In my biased view, they are the kinds of things that help start mini-fields around new problems and new algorithmic paradigms. Stay tuned.

Omar Khattab@lateinteraction

We're gearing up to release two research efforts I've been extremely excited about for quite some time. Y'all will really love these.

9:14 PM · May 4, 2026 · 38.3K Views

9:18 PM · May 4, 2026 · 8.7K Views

QUOTE POST

#160Omar Khattab@LATEINTERACTION

The first one is out. I can't quite describe how deeply I've hoped for anything like this since 2021. A retrieval benchmark that is actually, genuinely, hard for first-stage retrieval reasons -- not due to multi-hop tricks, or due to being ill-posed, etc.

Diane@dianetc_

We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroom remained by running oracle reranking with a frontier LLM. Most had little room left! So we built OBLIQ-Bench to study much harder search queries than before.

3:52 PM · May 6, 2026 · 75.2K Views

4:00 PM · May 6, 2026 · 1.3K Views

QUOTE POST

#160Omar Khattab@LATEINTERACTION

The second effort promised above is now public. I think this paradigm might be a key step toward transforming how we teach new skills to LLMs.

Souradip Chakraborty@SOURADIPCHAKR18

🚨Typical RL algorithms and on-policy distillation methods are blind samplers: they use privileged info to score rollouts, but not to *find* them. We ask: can we use privileged info to *actively sample* the rollouts RL wishes it can stumble upon with compute? ⤵️ Pedagogical RL

10:46 PM · May 14, 2026 · 81.9K Views

11:02 PM · May 14, 2026 · 8.3K Views