METR publishes its first Frontier Risk Report evaluating whether leading AI developers could lose control of advanced agents from Anthropic, Google, Meta, and OpenAI
Discussions highlight incentive conflicts for evaluators reliant on developer access.
This is a really important structural incentive issue that affects every 3rd party evaluation/auditing/governance group in the space to a lesser or greater extent - becoming more important the more critical meaningful company access is. This even includes government-linked orgs like UK AISI, who are dependent on voluntary pre-deployment access. I'm really glad Beth raised it directly.
It's an incentives issue that badly needs to be addressed somehow. Right now I can't see a good way without some degree of mandated testing/oversight that empowers a 3rd party ecosystem. In some domains liability will help, but not in some of the forward-looking and important domains that orgs like METR work in.
Limitations of report: This report isn’t robust oversight of frontier AI developers by itself. METR has some levers to incentivise companies’ participation, including some relevant legislation, but ultimately participants could have pulled out at any time if the result would be contrary to their interests. You can view it partly as a pilot exercise of what regulation (or formalized industry standards) could/should require, or what partners/suppliers/customers/employees should demand from frontier developers. Quoting from the report: “METR’s work relies on developing and maintaining strong working relationships with companies, and this impacted both how we designed the process for this pilot (e.g. offering the silent exit option) and lower-level judgment calls as the process unfolded (e.g. having a relatively high bar for what redactions we pushed back on). In some cases we refrained from making an unflattering claim because the claim was neither solidly defensible nor particularly relevant to our core assessment. We also made efforts not to invite salient comparisons between companies on capabilities or safety.” It doesn’t feel to me like this distorted our overall conclusions too much in this case. But that was partly because the conclusions weren’t that spicy. If our conclusions reflected very negatively on AI developers or would directly lead to e.g. govt intervention or public outcry, we’d be in a difficult position. We’d be trying to balance keeping the companies happy enough that they didn’t pull out of the program (using the “no-fault exit” mechanism) vs being transparent about our conclusions. We clearly need more robust mechanisms than this for providing accountability for AI developers.