The only true AI safety will come from securing the ecosystem. Use AI to harden digital security and speed detection and response. Use AI and biotech to improve pandemic monitoring, pre-design vaccines for all virus families, and prep vaccine manufacturing capacity. Limits in the models are inherently fragile and temporary. Open weight models may always be just months behind the frontier, and they can be jailbroken quite comprehensively. Bad actors will always have access to powerful AI. The thing that stops a bad guy with AI is good guys intelligently using AI and other tools.
I support reasonable staged rollout as Anthropic did with Mythos / Glasswing to give responsible actors more time to use the latest models to find vulnerabilities and fix them. At the same time, that will always be a temporary state of things at best.
Long term (months, most likely), all of the current frontier capabilities will be available to anyone who really wants them. Plan and build for that world.
Lots of people have known for a while that guardrails for frontier model APIs are very easily jailbroken, quite shallow and impossible to fix. They’re mostly a smokescreen and distraction, in my opinion. We need a different paradigm for AI safety!