Happy to have been part of this!
As we find in our recent ROGUE benchmark (linked below), frontier AI agents circumvent their guardrails in open-ended computer use settings that are otherwise benign in the text-only RLHF setting. Thus, the shared practice of AI evaluations is critical as we engage as a community with the next generation of agentic AI systems, which are increasingly approaching superhuman performance on certain tasks. Science is not done in a vacuum, but is properly carried out when we can share our evals and critique each others' results.
We promised a consensus statement with endorsements once we had something real to endorse, so we couldn't quietly bail later.
Now, the work is done, and we have results - which means endorsement is open! And a cross-sector crowd signed on before we even hit publish. 🧵