AI Research Agents Must Master Exploit-Explore Dynamics to Advance
(1/N) How should we evaluate AI agents that conduct ML research?
Final performance tells us whether an agent succeeded, but not how it searched.
In this work, we analyze AI research agents through the lens of search dynamics.
https://arxiv.org/abs/2605.17373 . .

(3/N) In this work, we isolate the thing we care about:
agent strategy.
We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.

(2/N) Current AI research agents bundle many things together: search strategy, code editor, experiment runner, memory system, validation loop. So when one agent wins, we often don’t know what actually caused the win. Was it better research strategy or better infrastructure?
(4/N) We analyze:
18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics.
So we can study not only how much an agent improves performance, but how it moves through the research landscape.
(3/N) In this work, we isolate the thing we care about: agent strategy. We keep the execution infrastructure shared, while letting agents differ in how they generate ideas, choose experiments, remember results, and decide what to try next.
(5/N) The first interesting behaviour:
A simple greedy hill-climber nearly matches the best tree-search agent.
This is interesting, as more complicated scaffolding does not automatically produce better research behavior.

(4/N) We analyze: 18 ML research tasks, 10 domains, real public research codebases, held-out test evaluation, and 12 process-level metrics. So we can study not only how much an agent improves performance, but how it moves through the research landscape.
(7/N) This suggests a simple principle:
AI research agents should not use a fixed search strategy. They should adapt online.
Exploit when progress is easy. Explore when progress stalls.
Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.
(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps. So the question is not: “Should agents use greedy search or tree search?” It is: What regime is this research problem in ?
(6/N) When useful improvements are dense, greedy search works well. When useful improvements are sparse, broader exploration helps.
So the question is not:
“Should agents use greedy search or tree search?”
It is:
What regime is this research problem in ?

(5/N) The first interesting behaviour: A simple greedy hill-climber nearly matches the best tree-search agent. This is interesting, as more complicated scaffolding does not automatically produce better research behavior.
(9/N) To me, the main message is:
The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.
(8/N) Another takeaway: More diversity/tokens/wall-clock time are not automatically better. What matters more is early progress and focused exploration. Research agents need better search control.
(8/N) Another takeaway:
More diversity/tokens/wall-clock time are not automatically better.
What matters more is early progress and focused exploration. Research agents need better search control.
(7/N) This suggests a simple principle: AI research agents should not use a fixed search strategy. They should adapt online. Exploit when progress is easy. Explore when progress stalls. Our AdaptiveSearch agent implements this idea and outperforms the fixed strategies.
(10/N) This work is a step toward making AI research agents scientifically measurable i.e. controlled evaluation of the mechanisms that make agents better researchers.
Paper: https://arxiv.org/abs/2605.17373 Code: https://github.com/qrzou/FML-bench
(9/N) To me, the main message is: The next generation of AI research agents will be systems that understand the dynamics of research search: when to exploit, when to explore, when to switch, and when to stop.