In case you’re interested, the FMF has the closest thing to public guidance documents on frontier safety evaluation standards. https://www.frontiermodelforum.org/publications/#technical-reports They exist! They’re inherently not that detailed because there’s been no forcing function to get stakeholders to compromise on exact operationalization, plus an inherent need for flexibility given the rapid shifts in evaluation best practices.
Josh, I think you might be operating under some bad information.
This seems to be a misunderstanding of why FLOPs have been included in every attempt at safety legislation. There’s an inherent need to ask “to what types of models do you apply the standards”, lest we run cyber evals on every academic lab’s 50M param pretrain.
Also, it’s not that people didn’t propose standards. See eg Transluce’s draft work. But to get broad acceptance of specific standards, you would need the labs to be willing to agree, which they strongly preferred to avoid doing to not tie their future hands wrt regulatory constraints under conditions they couldn’t foresee. That’s why every lab safety framework is vague on exact threshold operationalization.
