2h agoNew BAGEN benchmark finds frontier LLM agents lack budget awareness, though early stopping could save up to 64% of tokensAgents consistently showed over-optimism regarding their token usage