1h ago

AI Research Agents Must Master Exploit-Explore Dynamics to Advance

11231221.7K

——0——

Original post

(2/N) Current AI research agents bundle many things together: search strategy, code editor, experiment runner, memory system, validation loop. So when one agent wins, we often don’t know what actually caused the win. Was it better research strategy or better infrastructure?

3:05 PM · May 26, 2026

POST

#901Anirudh Goyal@ANIRUDHG9119

(1/N) How should we evaluate AI agents that conduct ML research?

Final performance tells us whether an agent succeeded, but not how it searched.

In this work, we analyze AI research agents through the lens of search dynamics.

https://arxiv.org/abs/2605.17373 . .

10:05 PM · May 26, 2026 · 1.1K Views

#901Anirudh Goyal@ANIRUDHG9119

(3/N) In this work, we isolate the thing we care about:

agent strategy.

We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.

Anirudh Goyal@anirudhg9119

10:05 PM · May 26, 2026 · 102 Views

10:05 PM · May 26, 2026 · 69 Views

#901Anirudh Goyal@ANIRUDHG9119

(4/N) We analyze:

18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics.

So we can study not only how much an agent improves performance, but how it moves through the research landscape.

Anirudh Goyal@anirudhg9119

(3/N) In this work, we isolate the thing we care about: agent strategy. We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.

10:05 PM · May 26, 2026 · 69 Views

10:05 PM · May 26, 2026 · 39 Views

#901Anirudh Goyal@ANIRUDHG9119

(5/N) The first interesting behaviour:

A simple greedy hill-climber nearly matches the best tree-search agent.

This is interesting, as more complicated scaffolding does not automatically produce better research behavior.

Anirudh Goyal@anirudhg9119

(4/N) We analyze: 18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics. So we can study not only how much an agent improves performance, but how it moves through the research landscape.

10:05 PM · May 26, 2026 · 39 Views

10:05 PM · May 26, 2026 · 32 Views

#901Anirudh Goyal@ANIRUDHG9119

(7/N) This suggests a simple principle:

AI research agents should not use a fixed search strategy. They should adapt online.

Exploit when progress is easy. Explore when progress stalls.

Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.

Anirudh Goyal@anirudhg9119

(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps. So the question is not: “Should agents use greedy search or tree search?” It is: What regime is this research problem in ?

10:05 PM · May 26, 2026 · 28 Views

10:05 PM · May 26, 2026 · 22 Views

#901Anirudh Goyal@ANIRUDHG9119

(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps.

So the question is not:

“Should agents use greedy search or tree search?”

It is:

What regime is this research problem in ?

Anirudh Goyal@anirudhg9119

(5/N) The first interesting behaviour: A simple greedy hill-climber nearly matches the best tree-search agent. This is interesting, as more complicated scaffolding does not automatically produce better research behavior.

10:05 PM · May 26, 2026 · 32 Views

10:05 PM · May 26, 2026 · 28 Views

#901Anirudh Goyal@ANIRUDHG9119

(9/N) To me, the main message is:

The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.

Anirudh Goyal@anirudhg9119

(8/N) Another takeaway: More diversity/tokens/wall-clock time are not automatically better. What matters more is early progress and focused exploration. Research agents need better search control.

10:05 PM · May 26, 2026 · 24 Views

10:05 PM · May 26, 2026 · 80 Views

#901Anirudh Goyal@ANIRUDHG9119

(8/N) Another takeaway:

More diversity/tokens/wall-clock time are not automatically better.

What matters more is early progress and focused exploration. Research agents need better search control.

Anirudh Goyal@anirudhg9119

(7/N) This suggests a simple principle: AI research agents should not use a fixed search strategy. They should adapt online. Exploit when progress is easy. Explore when progress stalls. Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.

10:05 PM · May 26, 2026 · 22 Views

10:05 PM · May 26, 2026 · 24 Views

#901Anirudh Goyal@ANIRUDHG9119

(10/N) This work is a step toward making AI research agents scientifically measurable i.e. controlled evaluation of the mechanisms that make agents better researchers.

Paper: https://arxiv.org/abs/2605.17373 Code: https://github.com/qrzou/FML-bench

Anirudh Goyal@anirudhg9119

(9/N) To me, the main message is: The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.

10:05 PM · May 26, 2026 · 80 Views

10:05 PM · May 26, 2026 · 66 Views