Miles Brundage of AVERI critiques the auditing and independent evaluation provisions in Anthropic's proposed Advanced AI Framework

VIEWS4REPLIES1

Won't go into the detail which would require more careful review than I can do this sec but suffice it to say that I *think* SB 315 and O-T / GAAIA would at least somewhat more strongly empower auditors access-wise. Also perhaps weaker than some aspects of the GPAI COP (see pic)

Miles Brundage@Miles_Brundage

Access - this is an area where it seems like Ant (+OAI's) endorsement of SB 315 is especially interesting. SB 315 (pic 1) and Obernolte-Trahan (pic 2) go a little bit beyond explicit industry proposals (while still having guardrails around security/confidentiality etc.)...

2h400

Miles Brundage@Miles_Brundage

I'd be all for that as a target, if the alternative was no target, but I don't think it's self-evident that this is a better target than, e.g., "is the company following their policies + are those policies reasonable" (~Obernolte-Trahan)...

Miles Brundage@Miles_Brundage

Target of audit - risk reports.

Not surprisingly, Anthropic emphasizes auditing the thing that they are currently most excited about self-assessing + voluntarily having assessed by others - regularly scheduled holistic, cross-model risk assessments at a company level.

2h300

Miles Brundage@Miles_Brundage

"Evaluator shopping" - will have more to say before long on this. V. related to the risk of "checkbox compliance."

I agree there should be mitigations, though not sure I am sold on the one floated here (~ratings + random assignment), though granted, it's just floated

Miles Brundage@Miles_Brundage

Won't go into the detail which would require more careful review than I can do this sec but suffice it to say that I *think* SB 315 and O-T / GAAIA would at least somewhat more strongly empower auditors access-wise. Also perhaps weaker than some aspects of the GPAI COP (see pic)

2h200

Miles Brundage@Miles_Brundage

Access - this is an area where it seems like Ant (+OAI's) endorsement of SB 315 is especially interesting. SB 315 (pic 1) and Obernolte-Trahan (pic 2) go a little bit beyond explicit industry proposals (while still having guardrails around security/confidentiality etc.)...

Miles Brundage@Miles_Brundage

I'd be all for that as a target, if the alternative was no target, but I don't think it's self-evident that this is a better target than, e.g., "is the company following their policies + are those policies reasonable" (~Obernolte-Trahan)...

2h200

Miles Brundage@Miles_Brundage

Cadence of audits - this is a bit vague. SB 315 is annual, Obernolte-Trahan is semi-annual (with the possibility of more frequent or continuous at CAISI Director's discretion ... Anthropic just says "regularly." Presumably this means at least at the cadence of risk reports but 🤷‍♂️

Miles Brundage@Miles_Brundage

"Evaluator shopping" - will have more to say before long on this. V. related to the risk of "checkbox compliance."

I agree there should be mitigations, though not sure I am sold on the one floated here (~ratings + random assignment), though granted, it's just floated

2h200