Another neat tidbit from an agentic RL paper (AutoForge): retaining reasoning traces throughout a multi-turn trajectory is beneficial. Agents benefit from interleaved thinking, or preserving reasoning traces from prior turns and using them as extra context for the current turn.
Reasoning models output a long reasoning trace before their final output; e.g., in the format <think> ... </think> <final output>. This is straightforward for a single-turn use case. However, if we have a multi-turn scenario (e.g., an agent), we have to determine whether we:
1. Retain prior reasoning traces. 2. Clear prior reasoning traces and only output a reasoning trace for the current turn.
The pro of retaining reasoning traces is that the agent might do some planning or analysis that is helpful at a later step. The con is that reasoning traces consume a lot of context. Solving long horizon tasks is already difficult with agents, and retaining a per-step reasoning trace can quickly exhaust the context window.
In AutoForge, authors analyze this choice, finding that retaining reasoning traces over a multi-turn trajectory has a clear performance benefit. They refer to this approach as "interleaved thinking", as it allows the agent to think / retain its thoughts.
Specifically, this finding is proven in a specialized agent training setting (we are training the agent rather than using an off-the-shelf LLM backbone). In AgentForge, authors create a series of synthetic simulated environments (all verifiable) for RL training of Qwen3-Thinking-30B-A3B. These models are then evaluated on benchmarks like TauBench / VitaBench.
On these experiments, we see a clear trend that enabling interleaved thinking during RL training significantly benefits agent accuracy. This trend holds across all benchmarks that were considered.
This is something I've always wondered about, but I just assumed we clear prior reasoning traces by default due to the context cost. Very cool to see this choice analyzed, especially when it yields an interesting (and possibly even counterintuitive) result!



