Some misc. comments on the auditing-related provisions in Anthropic's proposed "Advanced AI Framework" 🧵
Miles Brundage of AVERI critiques the auditing and independent evaluation provisions in Anthropic's proposed Advanced AI Framework
He calls for stronger third-party oversight of frontier models
No Digg Deeper questions have been answered for this story yet.
Most Activity
Won't go into the detail which would require more careful review than I can do this sec but suffice it to say that I *think* SB 315 and O-T / GAAIA would at least somewhat more strongly empower auditors access-wise. Also perhaps weaker than some aspects of the GPAI COP (see pic)
Access - this is an area where it seems like Ant (+OAI's) endorsement of SB 315 is especially interesting. SB 315 (pic 1) and Obernolte-Trahan (pic 2) go a little bit beyond explicit industry proposals (while still having guardrails around security/confidentiality etc.)...
I'd be all for that as a target, if the alternative was no target, but I don't think it's self-evident that this is a better target than, e.g., "is the company following their policies + are those policies reasonable" (~Obernolte-Trahan)...
Target of audit - risk reports.
Not surprisingly, Anthropic emphasizes auditing the thing that they are currently most excited about self-assessing + voluntarily having assessed by others - regularly scheduled holistic, cross-model risk assessments at a company level.
"Evaluator shopping" - will have more to say before long on this. V. related to the risk of "checkbox compliance."
I agree there should be mitigations, though not sure I am sold on the one floated here (~ratings + random assignment), though granted, it's just floated
Won't go into the detail which would require more careful review than I can do this sec but suffice it to say that I *think* SB 315 and O-T / GAAIA would at least somewhat more strongly empower auditors access-wise. Also perhaps weaker than some aspects of the GPAI COP (see pic)
Access - this is an area where it seems like Ant (+OAI's) endorsement of SB 315 is especially interesting. SB 315 (pic 1) and Obernolte-Trahan (pic 2) go a little bit beyond explicit industry proposals (while still having guardrails around security/confidentiality etc.)...
I'd be all for that as a target, if the alternative was no target, but I don't think it's self-evident that this is a better target than, e.g., "is the company following their policies + are those policies reasonable" (~Obernolte-Trahan)...
Cadence of audits - this is a bit vague. SB 315 is annual, Obernolte-Trahan is semi-annual (with the possibility of more frequent or continuous at CAISI Director's discretion ... Anthropic just says "regularly." Presumably this means at least at the cadence of risk reports but 🤷♂️
"Evaluator shopping" - will have more to say before long on this. V. related to the risk of "checkbox compliance."
I agree there should be mitigations, though not sure I am sold on the one floated here (~ratings + random assignment), though granted, it's just floated