Prime Intellect agents improve nanoGPT record to 2,930 steps · Digg
/Tech46d ago
Prime Intellect agents improve nanoGPT record to 2,930 steps
AI Judge changed title after evaluation, original title: "Prime Intellect runs Claude Opus 4.7 on nanoGPT benchmark"
- Prime Intellect ran 10,000 autonomous experiments with Claude Code Opus 4.7 and Codex GPT 5.5 agents on the nanoGPT optimizer track. Over two weeks the agents consumed 14,000 H200 GPU hours and delivered a record of 2,930 steps for the 124M-parameter model, beating the prior human baseline of 2,990 steps. The company released all run logs, scripts, configurations, and a report on GitHub.
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
3:43 PM · May 14, 2026 · 514.8K Views
Sentiment
Positive users celebrate Claude Opus and AI agents setting new nanoGPT records via autonomous optimization, while negative users criticize misleading charts and the models' weaknesses in code and math.
Pos
71.5%
Neg
28.5%
74 comments with sentiment.
Cluster Engagement
Digg Deeper
No Digg Deeper questions have been answered for this story yet.
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
We started automating AI research on nanogpt-speedruns & achieved new records
>for 2 weeks GPT 5.5 and Opus 4.7 iterated on novel optimizations
>10k runs & 14k H200 hours
>both agents beat the human baseline
>Opus now holds the record at 2930 steps
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
The agents fail in opposite ways. Claude Code keeps stopping and asking for direction even when the harness explicitly forbids it (~22h idle in v1). Codex never stops but gets stuck grinding the same hyperparameter surface for hours.
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
We think this is just a lower bound of what's possible today, and we're actively working at Prime Intellect on training models and building tools to automate AI research further.
Over the last 2 weeks, the agents kept setting new records in every session, especially after syncing with the latest human submissions. Opus now holds the record at 2930 steps, human baseline is 2990.
The nanogpt speedrun by @kellerjordan0 is a community benchmark for training a 124M GPT efficiently. Track 3 fixes model, data, and architecture. Only the optimizer and related hyperparameters can change.
Goal: reach a target validation loss in as few steps as possible.
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
The agents are very good at optimizer search, hyperparameter sweeps, and stacking methods together. The frontier stacks remix Contra-Muon, MuonEq, NorMuon, SOAP, and per-role LR splits, almost all sourced from upstream PRs.
@scaling01 i agree, for transparency adding this important data point that claude stoped working a lot (which is bad) but when restarted it actually got access to new record faster than codex (which is good for claude progress ironically)
we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
Automating AI research is the next major step in AI
We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours
Opus now holds the record at 2930 steps vs the 2990 human baseline
all the records are heavily based on work from previous contributors PRs (we do explore novel ideas in a dedicated "novelty" track, but none of them ended up improving the record).
So it only made sense to let the agents write a little thank you to the community themselves
we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more