6d ago

Researchers introduce Agent-BRACE for LLM agents in POMDPs

Researchers introduce Agent-BRACE, a framework that decouples LLM agents into separate belief-state and policy models for long-horizon POMDPs. The belief-state model maintains atomic natural-language claims paired with verbalized certainty labels. A distinct policy model then selects actions from this fixed-size input. The components train jointly via reinforcement learning. In tested environments the agents outperform strong RL baselines while holding context size near constant and reducing measured epistemic uncertainty over time.

0
Original post

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

9:35 AM · May 13, 2026 View on X
Reposted by

🚨 Check out Agent-BRACE, our new work on belief state modeling for LLM agents in long-horizon tasks!

In long-horizon partially-observable tasks, interaction history exceeds LLM context windows, but summarizing it can discard useful uncertainty about the environment.

Agent-BRACE represents belief states as a set of natural language claims with verbalized confidence, and jointly trains a belief model to produce these states and a policy that conditions on them when taking actions.

✅ Improved task performance over strong RL baselines ✅ Compact, near-constant context ✅ Better belief calibration 🔎 We can see epidemic uncertainty reducing as the agent explores!

👇

JoykiratJoykirat@joykiratsingh

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

4:35 PM · May 13, 2026 · 8.8K Views
4:44 PM · May 18, 2026 · 1.9K Views

Super cool work! Belief state modeling is a promising direction in long-horizon tasks as it is interpretable, can keep track of any necessary metadata, and also more cognitively aligned with how humans reason.

On a related note, we show in HorizonBench that models without stateful memory often fail at long-horizon personalization tasks with evolving constraints.

Archiki PrasadArchiki Prasad@ArchikiPrasad

🚨Excited to share ✨Agent-BRACE✨, our new work on belief state modeling for LLM agents in long-horizon tasks! 🔸Most agents either suffer from growing context window or compress history into summaries, however, they do not explicitly track what the agent does not know. 🔸Agent-BRACE splits this into two: a belief-state model that tracks uncertainty as natural-language claims with calibrated certainty labels, and a policy that acts on beliefs rather than raw history. 🔸We find Agent-BRACE beats strong RL baselines, with near-constant context usage, and beliefs of the agent get sharper as evidence accumulates during training. 🧵⬇️

8:49 PM · May 13, 2026 · 4.3K Views
1:49 PM · May 14, 2026 · 3.5K Views

🚨 Excited to share new work on belief state modeling in LLM agents!

Agent-BRACE asks: how to get text-based agents to represent uncertainty? We use verbalized uncertainty and jointly teach agents to produce belief states and condition on belief when taking actions ➡️ task gains over strong RL baselines, compact history, improved belief calibration, and reduced epistemic uncertainty!

🧵👇

JoykiratJoykirat@joykiratsingh

🚨Excited to announce Agent-BRACE! LLM agents in long-horizon POMDPs either blow up their context with raw history or summarize it, discarding uncertainty by collapsing belief into a point estimate. Agent-BRACE decouples the agent into belief state + policy models, jointly trained via RL. Key takeaways: 1️⃣ 🎯The belief state model produces a structured approximation of the belief distribution as a set of atomic natural-language claims with ordinal verbalized certainty labels ranging from certain to unknown. The policy conditions on this compact belief rather than the full history. 2️⃣ 📈 Outperforms strong RL baselines on long-horizon partially observable embodied language environments while maintaining a near-constant context window independent of episode length. 3️⃣ 🔄 The learned belief becomes increasingly calibrated as evidence accumulates, and epistemic belief decreases over time: the proportion of claims that the agent has the strongest level of belief in grows from 21% → 52% over an episode. 👇🧵

4:35 PM · May 13, 2026 · 8.8K Views
5:44 PM · May 13, 2026 · 1.9K Views