19h ago

Dwarkesh Podcast analysis identifies selection of research questions and detection of unproductive paths as primary bottlenecks to automating AI research

Models already execute experiments reliably but lack strategic oversight.

0
Original post

What's the current bottleneck to automating AI research? @ericjang11's report: today's models are already good at implementing and running experiments, but still can't reliably pick the right question to investigate next or tell when they're stuck down a dead end.

3:01 PM · May 19, 2026 View on X
Reposted by

interesting results on this new benchmark

hyperparam search > sonnet 4.6 > glm-5 > gpt-5.5 > vLLM default > Opus 4.7

lol

Jehyeok Yeon @ ICML 2026 🇰🇷Jehyeok Yeon @ ICML 2026 🇰🇷@jehyeoky248

AI R&D agents look great in demos. They write code, fix bugs, and propose research-shaped ideas. But what do researchers actually spend their time doing? Fighting dependency conflicts, noisy metrics, and configs that I’m pretty sure worked 20 minutes ago. Can agents do that? 🧵

2:19 PM · May 20, 2026 · 5.3K Views
3:33 PM · May 20, 2026 · 1.4K Views