Harness-1 makes search agents better by moving memory work out of the model and into a helper system.
Shows that intelligence performs better when the environment stops forcing it to spend cognition on bookkeeping.
That search agents should stop using the LLM as the notebook and let a separate harness track the search state.
The paper proved that a 20B model improved search by doing less inside its own head.
The problem is that normal search agents must both think about the next search and remember every document, clue, failed path, and remaining check inside the same limited context.
This formulation puts too much routine state management inside the policy.
Harness-1 separates those jobs.
The model keeps the hard semantic choices: what to search, what to inspect, what to verify, and when the evidence is good enough.
The harness keeps the recoverable state: candidate pools, curated documents, importance tags, evidence links, verification records, deduplicated observations, and budget-aware memory rendering.
That sounds minor until you look at reinforcement learning.
RL works poorly when every failure looks the same, because an empty or wrong final set does not reveal whether the agent searched badly, forgot evidence, skipped verification, or curated carelessly.
By externalizing state, Harness-1 gives the policy a cleaner learning problem: improve decisions over a visible search workspace.
For Harness-1, its gains were larger on held-out benchmarks than on source-family tasks, suggesting the model learned reusable search moves rather than memorized domain habits.
----
Link – arxiv. org/abs/2606.02373
Title: "Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses"




