Let’s face it: after-the-fact API guardrails are not the right safety tool for frontier models.
They don’t make dangerous capabilities disappear. They just hide them behind a brittle interface that can be easily jailbroken.
A better safety agenda:
- don’t train models for very high-risk capabilities without strong evals, justification, and containment
- use staged release, as pioneered by @IreneSolaiman, from trusted testers to broader access, and open release for transparency and accountability
- massively support open-source AI so the gap between players does not become so large that a few closed labs and governments end up with overwhelming capabilities and power over everyone else
- enable independent evaluation instead of asking everyone to trust a black-box API
- give law enforcement, courts, regulators, auditors, journalists, and civil society strong AI tools to detect, investigate, and hold accountable unlawful uses of AI
Safety means transparency, staged deployment, distributed power, and making sure democratic institutions can actually enforce the law.











