Anthropic has published a new policy framework for frontier AI. I鈥檓 happy to see it! Importantly, it takes seriously the need to sometimes stop AI companies from taking actions that pose a substantial risk of catastrophic harm. There are also some areas where it could be improved.
Anthropic proposes policy framework giving governments authority to block unsafe frontier AI model deployments
Ryan Greenblatt warns the plan neglects risky internal deployments.
Positive users praise Anthropic's policy framework for frontier AI as a solid contribution with useful safety ideas, while negative users criticize it for enabling unchecked internal deployments that bypass oversight.
Most Activity
Policies like this would be good and could be an decent first step. That said, these policies don't seem sufficient.
One concern I have is that it's unclear whether blocking usage/deployment applies to internal deployment and I think most of the risk is from internal deployment.
Our Advanced AI Framework sets out how governments should prepare for and prevent catastrophic risks from frontier AI systems.
The government should have the authority to block or revoke the release of unsafe models, and invest in societal resilience. http://www.anthropic.com/policy-on-the-ai-exponential/aaif

Finally, there are some notable areas where I think this framework could be improved (also not exhaustive): - It's not clear from reading the framework whether the government authority to "block or deter unsafe models" applies to models that are used only internally at the company. While the scope of the risk reports explicitly cover such deployments (good!) this section does not specify. But internal use might be one of the most important risk areas given risks from loss of control resulting from automated AI R&D. - It's unclear what information-collecting authority third party evaluators have. For example, there is no general authority to examine necessary documents (for example, company evaluation logs) or collect information outside of the risk report process. I think we should be moving closer to an embedded auditor model in this respect. - The regulator doesn't appear to have the flexibility to promulgate minimum standards for frontier AI frameworks or update any of the framework's key definitions over time. Given that legislatures often take time to act, I think it's important to include something like this in a framework.

It also includes some good ideas that haven't been enacted in state law that I think would be a positive step forward (not exhaustive): #1 Most significantly, the framework grapples directly with how the government could intervene to stop companies from taking dangerous actions. This includes if the evaluator finds "a significant risk of catastrophic harm." It is framed as just a "possible approach" and is fairly high level, but I think it's very positive to see Anthropic directly support a mechanism of this kind. #2 Mandatory risk reports. Several state laws have a very barebones version of this for risks resulting from internal deployment, but the version in Anthropic's framework is significantly more detailed, including a residual risk assessment after safeguards are applied. #3 Stronger audits ("independent evaluations") than are found in Illinois SB 315. The evaluator is supposed to assess overall levels of risk, not just whether the developer followed the processes that it said it would, as in SB 315. It also talks about exploring an accreditation system for evaluators and says evaluators could be randomly assigned to AI developers (instead of developers choosing whoever they want). #4 Explicitly including automated AI R&D as a risk factor. This is present in all of the frontier AI companies' frontier AI frameworks, but none of the state laws; it should be included in policy frameworks going forward.

The framework includes many well-established governance mechanisms present in state law, including: - Mandatory frontier AI frameworks. - Incident reporting. - Whistleblower protections. - Penalties for violations and false statements. - Third party audits.

This is a solid contribution to the policy landscape, and I'm glad Anthropic released it! I hope more companies put out proposals like this so their views can be publicly scrutinized. Recently, OpenAI also put out a policy framework which has some good points but is much more high level than Anthropic's.
https://www-cdn.anthropic.com/files/4zrzovbb/website/0a58d567024a8b448ff15158ebc3625328dfcc1f.pdf
https://openai.com/index/public-policy-agenda/

@RyanPGreenblatt this is the real gap nobody talks about. internal deployment with no oversight is basically a backdoor for all the risk theyre trying to regulate