AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.
In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵