5h ago

White-Box Access Needed For Robust AI Safety And Security Evaluations

——0——
Original post
Miles BrundageMB#20@MILES_BRUNDAGEOPApollo ResearchARApollo Research|@APOLLOAIEVALS

Black-box access may soon no longer be enough to robustly make or verify safety and security claims. Deeper, white-box access is a necessary update to counter 'evaluation awareness' and keep loss-of-control evaluations state of the art. A new policy blog explains why. 🧵

10:15 AM · May 27, 2026 View on X
Reposted by
Miles BrundageMB#20|@MILES_BRUNDAGE
13410112.8K

Cluster engagement

34 snapshots