Northwestern's Zihan Wang introduces BAGEN, finding frontier LLM agents consistently fail to predict and manage their token budgets
Early stopping cuts agent operational costs up to 64%
@wzenus cool study. would be very helpful if they were great at estimating required token budget
🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇
models underestimate how much work it takes (token usage) to accomplish a task, just like us
🧵 Claude-Opus-4.8 takes you too much tokens - but is this issue general across agents? Do agents know how much they'll spend? Introducing Budget-Aware Agents (BAGEN): We study budget awareness across 4 envs & 5 frontier agents, and find structured failures in most of them. 👇
@yoheinakajima Yes! Budget-awareness would be a missing ability that people should hillclimb :)
@wzenus cool study. would be very helpful if they were great at estimating required token budget