/Tech2h ago

AutoForge Finds Retaining Reasoning Traces Improves Multi-Turn Agent Performance

6320292.1K

Original post

Cameron R. Wolfe, Ph.D.@cwolferesearch#1688inTech

Another neat tidbit from an agentic RL paper (AutoForge): retaining reasoning traces throughout a multi-turn trajectory is beneficial. Agents benefit from interleaved thinking, or preserving reasoning traces from prior turns and using them as extra context for the current turn.

Reasoning models output a long reasoning trace before their final output; e.g., in the format <think> ... </think> <final output>. This is straightforward for a single-turn use case. However, if we have a multi-turn scenario (e.g., an agent), we have to determine whether we:

1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.

The pro of retaining reasoning traces is that the agent might do some planning or analysis that is helpful at a later step. The con is that reasoning traces consume a lot of context. Solving long horizon tasks is already difficult with agents, and retaining a per-step reasoning trace can quickly exhaust the context window.

In AutoForge, authors analyze this choice, finding that retaining reasoning traces over a multi-turn trajectory has a clear performance benefit. They refer to this approach as "interleaved thinking", as it allows the agent to think / retain its thoughts.

Specifically, this finding is proven in a specialized agent training setting (we are training the agent rather than using an off-the-shelf LLM backbone). In AgentForge, authors create a series of synthetic simulated environments (all verifiable) for RL training of Qwen3-Thinking-30B-A3B. These models are then evaluated on benchmarks like TauBench / VitaBench.

On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.

This is something I've always wondered about, but I just assumed we clear prior reasoning traces by default due to the context cost. Very cool to see this choice analyzed, especially when it yields an interesting (and possibly even counterintuitive) result!

5:32 PM · Jun 9, 2026 · 1.4K Views

/Tech2h ago

AutoForge Finds Retaining Reasoning Traces Improves Multi-Turn Agent Performance

6320292.1K

#1688

Original post

Cameron R. Wolfe, Ph.D.@cwolferesearch#1688inTech

1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.

On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.

5:32 PM · Jun 9, 2026 · 1.4K Views

Sentiment

Users praised AutoForge's retention of reasoning traces for boosting multi-turn agent performance, calling the paper a great read and the build really good.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS710BOOKMARKS5LIKES4

Cameron R. Wolfe, Ph.D.@cwolferesearch

link to the paper: https://arxiv.org/abs/2512.22857

really great read!

Cameron R. Wolfe, Ph.D.@cwolferesearch

1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.

On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.

2h71045

REPLIES1

Alexa Web3 (e/acc)@alexabelonix

@cwolferesearch really good build.

2h11

Cameron R. Wolfe, Ph.D.@cwolferesearch

@alexabelonix 🫡

2h71

Rugbist@rugbist_

@cwolferesearch makes sense treating past reasoning as extra context instead of just memory

wonder if that scales cleanly with longer trajectories

2h2

Blissy@BlissyOnX

@cwolferesearch makes sense tbh. having the agent remember why it did something instead of cold-starting every turn feels obvious once u say it