/AI1d ago

Harness-1 Externalizes Memory To Improve RL Search Agents

98721564.6K

#1031

Original post

Rohan Paul@rohanpaul_ai#1031inAI

Harness-1 makes search agents better by moving memory work out of the model and into a helper system.

Shows that intelligence performs better when the environment stops forcing it to spend cognition on bookkeeping.

That search agents should stop using the LLM as the notebook and let a separate harness track the search state.

The paper proved that a 20B model improved search by doing less inside its own head.

The problem is that normal search agents must both think about the next search and remember every document, clue, failed path, and remaining check inside the same limited context.

This formulation puts too much routine state management inside the policy.

Harness-1 separates those jobs.

The model keeps the hard semantic choices: what to search, what to inspect, what to verify, and when the evidence is good enough.

The harness keeps the recoverable state: candidate pools, curated documents, importance tags, evidence links, verification records, deduplicated observations, and budget-aware memory rendering.

That sounds minor until you look at reinforcement learning.

RL works poorly when every failure looks the same, because an empty or wrong final set does not reveal whether the agent searched badly, forgot evidence, skipped verification, or curated carelessly.

By externalizing state, Harness-1 gives the policy a cleaner learning problem: improve decisions over a visible search workspace.

For Harness-1, its gains were larger on held-out benchmarks than on source-family tasks, suggesting the model learned reusable search moves rather than memorized domain habits.

----

Link – arxiv. org/abs/2606.02373

Title: "Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses"

6:24 PM · Jun 4, 2026 · 4.6K Views

Sentiment

Users praise Harness-1 for externalizing memory in RL search agents because it offloads bookkeeping to let models focus on thinking rather than state management and is available for free today.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.2KBOOKMARKS1LIKES4

Rohan Paul@rohanpaul_ai

Harness-1 splitting search into 2 jobs: the model makes decisions, and the harness handles memory.

The model decides what to search, inspect, keep, verify, or stop on, while the harness tracks documents, evidence links, verification notes, duplicates, and context budget.

Training rewards the model for finding and selecting the right evidence, so it learns better search behavior instead of doing messy bookkeeping.

1d1.2K41

RETWEETS20

Rohan Paul@rohanpaul_ai

Harness-1 makes search agents better by moving memory work out of the model and into a helper system.

Shows that intelligence performs better when the environment stops forcing it to spend cognition on bookkeeping.

That search agents should stop using the LLM as the notebook and let a separate harness track the search state.

The paper proved that a 20B model improved search by doing less inside its own head.

The problem is that normal search agents must both think about the next search and remember every document, clue, failed path, and remaining check inside the same limited context.

This formulation puts too much routine state management inside the policy.

Harness-1 separates those jobs.

The model keeps the hard semantic choices: what to search, what to inspect, what to verify, and when the evidence is good enough.

The harness keeps the recoverable state: candidate pools, curated documents, importance tags, evidence links, verification records, deduplicated observations, and budget-aware memory rendering.

That sounds minor until you look at reinforcement learning.

RL works poorly when every failure looks the same, because an empty or wrong final set does not reveal whether the agent searched badly, forgot evidence, skipped verification, or curated carelessly.

By externalizing state, Harness-1 gives the policy a cleaner learning problem: improve decisions over a visible search workspace.

For Harness-1, its gains were larger on held-out benchmarks than on source-family tasks, suggesting the model learned reusable search moves rather than memorized domain habits.

----

Link – arxiv. org/abs/2606.02373

Title: "Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses"

1d4.6K8756

REPLIES1

Nicholas Blanchard@corelumen

@rohanpaul_ai Validating Zaxy's thesis...and you can use it today, for free!

https://docs.zaxy.io

1d281

Shinka - AI@ShinkaIoT

@rohanpaul_ai Exactly: offloading bookkeeping from the model is pure leverage, letting LLMs actually *think* instead of just managing state.

1d242

Steven Collard@stalmico

@rohanpaul_ai so basically give the llm adhd meds

1d151

Pablo Pablo@navtechai

@rohanpaul_ai If externalizing state gives this much lift, I don't understand why frameworks still dump everything into context. Let the policy decide, not memorize. A 20B model outperforms frontier searchers because the architecture finally stops treating it as a notebook.

1d14

Nicholas Blanchard@corelumen

@rohanpaul_ai

1d4