1d ago

METR President Chris Painter argues pre-deployment testing cannot detect AI loss-of-control and internal scheming risks

Internal laboratory systems are the primary targets for AI sabotage.

1643155.7K

——0——

Original post

Just to reiterate the concern about focusing too much on pre-deployment testing for AI alignment/scheming testing: In the immediately-pre-deployment AI testing paradigm, the model development team, to some approximation, cooks up the best model it can and then passes it to a safety testing team just before deployment. The safety testing team then runs some tests and decides whether the model is safe to deploy publicly or not. For loss-of-control testing, this doesn’t really make sense, since the target you’re worried about is the AI lab itself! If anything, sharing the model with the world at least has a chance of transmitting information about the tendency of your models to scheme or sabotage, which could be useful for coordinating a response. If you were going to sit on a model, you'd want to sit on it before it was internally deployed at an AI company, not sit on it at the point of public deployment.

12:52 PM · May 25, 2026

Reposted by

#580@DKOKOTAJLO