(presentation time/location & links/websites + summary 🧵's for these papers attached below 👇)
-- PRInTS: Reward Modeling for Long-Horizon Information Seeking
Jaewoo Lee, @ArchikiPrasad, @cyjustinchen, @codezakh, @EliasEskin https://arxiv.org/abs/2511.19314
Grand Hall Session 4; Sun. July 5, 16:00-17:30 PT
🚨 Excited to announce ✨PRInTS✨, a generative Process Reward Model (PRM) that improves agent’s long-horizon info-seeking via info-gain scoring + summarization.
PRInTS guides open + specialized agents with major boosts 👉+9.3% avg. w/ Qwen3-32B across GAIA, FRAMES & WebWalkerQA!
Existing PRMs have a number of shortcomings that make them incompatible w/ info-seeking tasks: ❌ Designed mostly for math and logical reasoning ❌ Fall short in evaluating tool calls / outputs (along multiple dimensions) ❌ Don’t handle accumulating context from agents (noise + distractions in judgments).
PRInTS addresses these by jointly training with two core abilities: ☑️ Information gain scoring: evaluates a trajectory step based on PRInTS’s reasoning across multiple dimensions & computes information gain. ☑️ Trajectory summarization: continuously compresses historical context for step evaluation, keeping input length bounded.
Across three long-horizon information seeking tasks (FRAMES, GAIA, WebWalkerQA) on distinct agents (Qwen3-32B, Tongyi DeepResearch-30B-A3B, Gemini-2.5-Flash), PRInTS improves average accuracy by 9.3%, 3.9%, and 4.0%, showing its versatility and effectiveness! 💪
🧵⬇️ (1/6)