Excellent post from @AISecurityInst on how AI agents' time horizons scale with the number of tokens they're allowed. For me the most interesting point is the first graph below. Note how it is requiring much more than 10x the tokens to get 10x the time horizon…
Most AI agent evaluations boil capability down to one score. But that number hides a key choice: how much compute the agent was allowed to use. New work from our Science of Evaluation team shows why that matters. 🧵

