1h ago

AI Research Agents Must Master Exploit-Explore Dynamics to Advance

0
Original post

(2/N) Current AI research agents bundle many things together: search strategy, code editor, experiment runner, memory system, validation loop. So when one agent wins, we often don’t know what actually caused the win. Was it better research strategy or better infrastructure?

3:05 PM · May 26, 2026 View on X

(1/N) How should we evaluate AI agents that conduct ML research?

Final performance tells us whether an agent succeeded, but not how it searched.

In this work, we analyze AI research agents through the lens of search dynamics.

https://arxiv.org/abs/2605.17373 . .

10:05 PM · May 26, 2026 · 1.1K Views

(3/N) In this work, we isolate the thing we care about:

agent strategy.

We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(2/N) Current AI research agents bundle many things together: search strategy, code editor, experiment runner, memory system, validation loop. So when one agent wins, we often don’t know what actually caused the win. Was it better research strategy or better infrastructure?

10:05 PM · May 26, 2026 · 102 Views
10:05 PM · May 26, 2026 · 69 Views

(4/N) We analyze:

18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics.

So we can study not only how much an agent improves performance, but how it moves through the research landscape.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(3/N) In this work, we isolate the thing we care about: agent strategy. We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.

10:05 PM · May 26, 2026 · 69 Views
10:05 PM · May 26, 2026 · 39 Views

(5/N) The first interesting behaviour:

A simple greedy hill-climber nearly matches the best tree-search agent.

This is interesting, as more complicated scaffolding does not automatically produce better research behavior.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(4/N) We analyze: 18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics. So we can study not only how much an agent improves performance, but how it moves through the research landscape.

10:05 PM · May 26, 2026 · 39 Views
10:05 PM · May 26, 2026 · 32 Views

(7/N) This suggests a simple principle:

AI research agents should not use a fixed search strategy. They should adapt online.

Exploit when progress is easy. Explore when progress stalls.

Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps. So the question is not: “Should agents use greedy search or tree search?” It is: What regime is this research problem in ?

10:05 PM · May 26, 2026 · 28 Views
10:05 PM · May 26, 2026 · 22 Views

(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps.

So the question is not:

“Should agents use greedy search or tree search?”

It is:

What regime is this research problem in ?

Anirudh GoyalAnirudh Goyal@anirudhg9119

(5/N) The first interesting behaviour: A simple greedy hill-climber nearly matches the best tree-search agent. This is interesting, as more complicated scaffolding does not automatically produce better research behavior.

10:05 PM · May 26, 2026 · 32 Views
10:05 PM · May 26, 2026 · 28 Views

(9/N) To me, the main message is:

The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(8/N) Another takeaway: More diversity/tokens/wall-clock time are not automatically better. What matters more is early progress and focused exploration. Research agents need better search control.

10:05 PM · May 26, 2026 · 24 Views
10:05 PM · May 26, 2026 · 80 Views

(8/N) Another takeaway:

More diversity/tokens/wall-clock time are not automatically better.

What matters more is early progress and focused exploration. Research agents need better search control.

Anirudh GoyalAnirudh Goyal@anirudhg9119

(7/N) This suggests a simple principle: AI research agents should not use a fixed search strategy. They should adapt online. Exploit when progress is easy. Explore when progress stalls. Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.

10:05 PM · May 26, 2026 · 22 Views
10:05 PM · May 26, 2026 · 24 Views

(10/N) This work is a step toward making AI research agents scientifically measurable i.e. controlled evaluation of the mechanisms that make agents better researchers.

Paper: https://arxiv.org/abs/2605.17373 Code: https://github.com/qrzou/FML-bench

Anirudh GoyalAnirudh Goyal@anirudhg9119

(9/N) To me, the main message is: The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.

10:05 PM · May 26, 2026 · 80 Views
10:05 PM · May 26, 2026 · 66 Views