PRInTS Presents Generative Reward Model For Long-Horizon Agent Tasks

Original post

Mohit Bansal@mohitban47#208inTech

(presentation time/location & links/websites + summary 🧵's for these papers attached below 👇)

-- PRInTS: Reward Modeling for Long-Horizon Information Seeking

Jaewoo Lee, @ArchikiPrasad, @cyjustinchen, @codezakh, @EliasEskin https://arxiv.org/abs/2511.19314

Grand Hall Session 4; Sun. July 5, 16:00-17:30 PT

Archiki Prasad@ArchikiPrasad

🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.

PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES & WebWalkerQA!

Existing PRMs have a number of shortcomings that make them incompatible w/ info-seeking tasks: ❌ Designed mostly for math and logical reasoning ❌ Fall short in evaluating tool calls / outputs (along multiple dimensions) ❌ Don’t handle accumulating context from agents (noise + distractions in judgments).

PRInTS addresses these by jointly training with two core abilities: ☑️ Information gain scoring: evaluates a trajectory step based on PRInTS’s reasoning across multiple dimensions & computes information gain. ☑️ Trajectory summarization: continuously compresses historical context for step evaluation, keeping input length bounded.

Across three long-horizon information seeking tasks (FRAMES, GAIA, WebWalkerQA) on distinct agents (Qwen3-32B, Tongyi DeepResearch-30B-A3B, Gemini-2.5-Flash), PRInTS improves average accuracy by 9.3%, 3.9%, and 4.0%, showing its versatility and effectiveness! 💪

🧵⬇️ (1/6)

10:52 AM · Jun 30, 2026 · 39 Views