2h ago

Northwestern's Zihan Wang introduces BAGEN, finding frontier LLM agents consistently fail to predict and manage their token budgets

Early stopping cuts agent operational costs up to 64%

0
Original post

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

9:27 AM · May 29, 2026 View on X
Reposted by

@wzenus cool study. would be very helpful if they were great at estimating required token budget

Zihan "Zenus" WangZihan "Zenus" Wang@wzenus

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

4:27 PM · May 29, 2026 · 5.5K Views
5:14 PM · May 29, 2026 · 952 Views

models underestimate how much work it takes (token usage) to accomplish a task, just like us

Zihan "Zenus" WangZihan "Zenus" Wang@wzenus

🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇

4:27 PM · May 29, 2026 · 5.5K Views
5:13 PM · May 29, 2026 · 1.8K Views

@yoheinakajima Yes! Budget-awareness would be a missing ability that people should hillclimb :)

YoheiYohei@yoheinakajima

@wzenus cool study. would be very helpful if they were great at estimating required token budget

5:14 PM · May 29, 2026 · 952 Views
5:15 PM · May 29, 2026 · 76 Views
Northwestern's Zihan Wang introduces BAGEN, finding frontier LLM agents consistently fail to predict and manage their token budgets · Digg