// State-Externalizing Harnesses //
A new paradigm is emerging on how to effectively build agents and harnesses.
If there is a state that the environment can maintain reliably, it probably doesn't belong inside the policy. Move it into the harness, and a 20B model trains better and generalizes further.
Search agents are usually trained on one policy over a growing transcript, so RL has to learn semantic search and routine bookkeeping at the same time. This model, Harness-1, splits those apart.
The harness keeps the working memory (candidate pool, evidence links, verification records, deduplicated observations, budget-aware context) outside the policy, and the 20B model only decides what to search, what to keep, what to verify, and when to stop.
Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, it reaches 0.730 average curated recall, beating the next-best open search agent by 11.4 points and staying competitive with much larger frontier searchers. The gains are largest on the held-out transfer.
Paper: https://arxiv.org/abs/2606.02373
Learn to build effective AI agents in our academy: https://academy.dair.ai/