Automatic research from mathematics to AI research:
We transfer the ScaleAutoResearch pipeline, which improves a 32-year-old Ramsey number bound, to the NanoGPT Speedrun optimizer track, using Claude Code and Codex with only 1–2 A40 nodes. We run ~300 experiments in ~5k A40 hours, and then:
⭕ Results: improve (non-interpolation) SOTA from 2875 to 2755 steps.
Changes: +: non-gain aux β₂ = 0.997; SOAP for all hidden with freq=1; LR-horizon + momentum tuning -: remove Circuit-/Contra-/Soft-Muon, Aurora, NorMuon 2nd-moment, V-SOAP-blend, attn denom-floor...
Clearly, the experiments are compute-bounded, and it is possible that more results could come with more resources!
[1/n]


