NSA is a *spy agency* not an eval shop.
Running a model passed a spy agency before wide release could easily undermine trust in and demand for US AI models in Europe and elsewhere.
We need a more durable approach to differential access that's civilian-led.
I'm glad this EO exists and am less concerned about predeployment review mutating into a licensing regime than Dean.
However, I share his concern about transparency and confidentiality. I'd much rather lean into existing eval expertise at CAISI, which (as a standards org within NIST) is both transparent by design and a guard against the potential for mission creep within our opaque security apparatus.
The NSA et al. should still be involved (CAISI already has ways to interface with the IC, and could produce reports with a confidential annex) but it'd ease my mind if the core capacity was anchored in a civilian agency.
Confidential benchmarks are also a bad precedent for the reasons Dean gives. They are also not super necessary. Labs routinely publish uplift results on bio and cyber risk without disclosing what's in the benchmark itself. The NSA should just develop its own confidential benchmark, NSAbench, and create a portal for anyone to submit a model and run it against their private test set.