This paragraph reads a bit defeatist. And "training runs are far easier to conceal than missile silos" seems overstated.
How tractable of a technical problem is it to create a solution to verify training runs aren't happening? My view is its could be done with a few hundred million $ of R&D, by piloting and red teaming some approach like this: - Unplugging (most) backend networking to isolate some group of gpus/server(s) into 'inference units' - Rewrite some inference stack to be more reproducible - Using network taps on frontend network to log input/output pairs and recompute 0.1% of them to check correctness - Mitigate side channels & do periodic memory wipes for completeness - Other misc physical security to make sure the whole thing isn't being tampered with
One other load bearing part to mention though is knowing where the vast majority of the compute is (e.g., 99% of world AI compute) so you can apply this inference-only solution to it all. This seems like a tractable intelligence problem but will depend on how concentrated compute is in large (i.e., >$1B) clusters.
Plausibly there are also less costly software-only / cryptographic methods that could do inference-only verification, and as a fall back verifying data centers are turned off (much more costly but very easy to verify) is also an option.
Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://www.anthropic.com/institute/recursive-self-improvement