/AI17h ago

Research Finds Mid-Tier Models Best at Evolving AI Agent Skills

--0--
Comments
Reposts
Original post
elvis@omarsar0#475inAI

This is something I have been thinking about after that @karpathy post on LLM Knowledge Bases. Fine-tuning models for maintaining better agent skills, memory, context engineering, routing efficiency, and knowledge bases is going to be huge.

You might also find this read interesting too:

elvis@omarsar0

Very good advice on self-improving agents.

(bookmark it)

This is something I am seeing in my own experiments with coding agents and harnesses for long-horizon tasks.

What I have found is that stronger models do not always evolve better agents.

The current believe in self-evolving agents is that a bigger model writes better prompt and skill edits, so devs put their best model in the evolver seat.

New research shows that intuition is mostly wrong.

The work separates two abilities that usually get conflated. Producing harness updates stays flat across model capability, so Qwen3.5-9B writes edits roughly as good as Claude Opus 4.6. Benefiting from those updates follows an inverted-U that peaks at mid-tier models, while weak models fail to even activate the edits and strong models have little headroom left.

This is important to understand as it tells you where to spend. Put a cheap model on the evolver and your expensive model on the solver, because the gains land solver-side, not evolver-side.

Paper: https://arxiv.org/abs/2605.30621

Learn to build effective AI agents in our academy: https://academy.dair.ai/

8:28 AM · Jun 1, 2026 · 3K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
RETWEETS59
elvis@omarsar0

Very good advice on self-improving agents.

(bookmark it)

This is something I am seeing in my own experiments with coding agents and harnesses for long-horizon tasks.

What I have found is that stronger models do not always evolve better agents.

The current believe in self-evolving agents is that a bigger model writes better prompt and skill edits, so devs put their best model in the evolver seat.

New research shows that intuition is mostly wrong.

The work separates two abilities that usually get conflated. Producing harness updates stays flat across model capability, so Qwen3.5-9B writes edits roughly as good as Claude Opus 4.6. Benefiting from those updates follows an inverted-U that peaks at mid-tier models, while weak models fail to even activate the edits and strong models have little headroom left.

This is important to understand as it tells you where to spend. Put a cheap model on the evolver and your expensive model on the solver, because the gains land solver-side, not evolver-side.

Paper: https://arxiv.org/abs/2605.30621

Learn to build effective AI agents in our academy: https://academy.dair.ai/

17hViews 40.8KLikes 607Bookmarks 970