/AI2h ago

AutoForge Finds Retaining Reasoning Traces Improves Multi-Turn Agent Performance

6320292.1K
Original post
Cameron R. Wolfe, Ph.D.@cwolferesearch#1467inAI

Another neat tidbit from an agentic RL paper (AutoForge): retaining reasoning traces throughout a multi-turn trajectory is beneficial. Agents benefit from interleaved thinking, or preserving reasoning traces from prior turns and using them as extra context for the current turn.

Reasoning models output a long reasoning trace before their final output; e.g., in the format <think> ... </think> <final output>. This is straightforward for a single-turn use case. However, if we have a multi-turn scenario (e.g., an agent), we have to determine whether we:

1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.

The pro of retaining reasoning traces is that the agent might do some planning or analysis that is helpful at a later step. The con is that reasoning traces consume a lot of context. Solving long horizon tasks is already difficult with agents, and retaining a per-step reasoning trace can quickly exhaust the context window.

In AutoForge, authors analyze this choice, finding that retaining reasoning traces over a multi-turn trajectory has a clear performance benefit. They refer to this approach as "interleaved thinking", as it allows the agent to think / retain its thoughts.

Specifically, this finding is proven in a specialized agent training setting (we are training the agent rather than using an off-the-shelf LLM backbone). In AgentForge, authors create a series of synthetic simulated environments (all verifiable) for RL training of Qwen3-Thinking-30B-A3B. These models are then evaluated on benchmarks like TauBench / VitaBench.

On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.

This is something I've always wondered about, but I just assumed we clear prior reasoning traces by default due to the context cost. Very cool to see this choice analyzed, especially when it yields an interesting (and possibly even counterintuitive) result!

5:32 PM · Jun 9, 2026 · 1.4K Views
Sentiment

Many users praised AutoForge's finding that retaining reasoning traces improves multi-turn agent performance, calling the paper a great read and noting the approach makes intuitive sense.

Pos
100.0%
Neg
0.0%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS710BOOKMARKS5LIKES4

link to the paper: https://arxiv.org/abs/2512.22857

really great read!

Another neat tidbit from an agentic RL paper (AutoForge): retaining reasoning traces throughout a multi-turn trajectory is beneficial. Agents benefit from interleaved thinking, or preserving reasoning traces from prior turns and using them as extra context for the current turn.

Reasoning models output a long reasoning trace before their final output; e.g., in the format <think> ... </think> <final output>. This is straightforward for a single-turn use case. However, if we have a multi-turn scenario (e.g., an agent), we have to determine whether we:

1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.

The pro of retaining reasoning traces is that the agent might do some planning or analysis that is helpful at a later step. The con is that reasoning traces consume a lot of context. Solving long horizon tasks is already difficult with agents, and retaining a per-step reasoning trace can quickly exhaust the context window.

In AutoForge, authors analyze this choice, finding that retaining reasoning traces over a multi-turn trajectory has a clear performance benefit. They refer to this approach as "interleaved thinking", as it allows the agent to think / retain its thoughts.

Specifically, this finding is proven in a specialized agent training setting (we are training the agent rather than using an off-the-shelf LLM backbone). In AgentForge, authors create a series of synthetic simulated environments (all verifiable) for RL training of Qwen3-Thinking-30B-A3B. These models are then evaluated on benchmarks like TauBench / VitaBench.

On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.

This is something I've always wondered about, but I just assumed we clear prior reasoning traces by default due to the context cost. Very cool to see this choice analyzed, especially when it yields an interesting (and possibly even counterintuitive) result!

2hViews 710Likes 4Bookmarks 5
Rugbist@rugbist_

@cwolferesearch makes sense treating past reasoning as extra context instead of just memory

wonder if that scales cleanly with longer trajectories

2hViews 2
Blissy@BlissyOnX

@cwolferesearch makes sense tbh. having the agent remember why it did something instead of cold-starting every turn feels obvious once u say it

2h