Prime Intellect agents improve nanoGPT record to 2,930 steps · Digg

Prime Intellect agents improve nanoGPT record to 2,930 steps · Digg

Posts from X

Most Activity

VIEWS124.7K

Lisan al Gaib@scaling01

brutal Claude mog

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

45d124.7K593169

BOOKMARKS343LIKES752

elie@eliebakouch

we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more

https://www.primeintellect.ai/auto-nanogpt

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d95K752343

RETWEETS134

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d514.8K1.7K794

REPLIES37

Andrey Kurenkov@andrey_kurenkov

Can we all agree that LLM-powered hyper param search to optimize nanoGPT better is not really AI research?

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d42.8K43263

Lisan al Gaib@scaling01

now imagine how brutal the mog is with Mythos

this is a slight update against OpenAI pulling ahead this year through faster model cycle times

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

45d27K27743

will brown@willccbb

if you weren’t aware, it’s prime intellect season

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d25.5K26150

Vincent Weisser@vincentweisser

We started automating AI research on nanogpt-speedruns & achieved new records

>for 2 weeks GPT 5.5 and Opus 4.7 iterated on novel optimizations >10k runs & 14k H200 hours >both agents beat the human baseline >Opus now holds the record at 2930 steps

Awesome work @eliebakouch!

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d13.5K17134

Prime Intellect@PrimeIntellect

We release everything: scratchpads, run logs, scripts, configs.

http://github.com/PrimeIntellect-ai/experiments-autonomous-speedrunning

46d4.6K7829

Prime Intellect@PrimeIntellect

The agents fail in opposite ways. Claude Code keeps stopping and asking for direction even when the harness explicitly forbids it (~22h idle in v1). Codex never stops but gets stuck grinding the same hyperparameter surface for hours.

46d10.1K11814

Prime Intellect@PrimeIntellect

They struggle with novelty though. We ran a phase where every idea had to pass a novelty check, neither agent managed to improve the baseline.

We release all the generated ideas here:

Codex: https://github.com/PrimeIntellect-ai/experiments-autonomous-speedrunning/tree/main/novelty/codex/scratchpad/ideas

Claude: https://github.com/PrimeIntellect-ai/experiments-autonomous-speedrunning/tree/main/novelty/claude-code/scratchpad/ideas

46d4.8K10715

samsja@samsja19

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d7.5K11412

Prime Intellect@PrimeIntellect

We think this is just a lower bound of what's possible today, and we're actively working at Prime Intellect on training models and building tools to automate AI research further.

https://www.primeintellect.ai/auto-nanogpt

46d4.1K8416

Prime Intellect@PrimeIntellect

Over the last 2 weeks, the agents kept setting new records in every session, especially after syncing with the latest human submissions. Opus now holds the record at 2930 steps, human baseline is 2990.

The nanogpt speedrun by @kellerjordan0 is a community benchmark for training a 124M GPT efficiently. Track 3 fixes model, data, and architecture. Only the optimizer and related hyperparameters can change.

Goal: reach a target validation loss in as few steps as possible.

46d7.7K1089

Jiaxin Wen@jiaxinwen22

The hill-climbing efficiency gap between Opus and Codex is much larger than I was expecting!

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d9K5911

Prime Intellect@PrimeIntellect

The agents are very good at optimizer search, hyperparameter sweeps, and stacking methods together. The frontier stacks remix Contra-Muon, MuonEq, NorMuon, SOAP, and per-role LR splits, almost all sourced from upstream PRs.

46d5.8K816

elie@eliebakouch

@scaling01 i agree, for transparency adding this important data point that claude stoped working a lot (which is bad) but when restarted it actually got access to new record faster than codex (which is good for claude progress ironically)

elie@eliebakouch

we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more

https://www.primeintellect.ai/auto-nanogpt

45d5.4K528

Alexander Doria@Dorialexander

not surprised. tapping the sign again.

Lisan al Gaib@scaling01

brutal Claude mog

45d3.7K487

Johannes Hagemann@johannes_hage

.@eliebakouch let the agents go wild on our idle compute to compete in the nanoGPT speedrun optimizer track!

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d3.8K486

Dom@dominik_scherm

team cooked

Prime Intellect@PrimeIntellect

Automating AI research is the next major step in AI

We let Claude Code (Opus 4.7) and Codex (GPT 5.5) run autonomously on the nanoGPT speedrun optimizer track using our idle compute. ~10k runs, ~14k H200 hours

Opus now holds the record at 2930 steps vs the 2990 human baseline

46d5.5K415

elie@eliebakouch

all the records are heavily based on work from previous contributors PRs (we do explore novel ideas in a dedicated "novelty" track, but none of them ended up improving the record).

So it only made sense to let the agents write a little thank you to the community themselves

https://github.com/KellerJordan/modded-nanogpt/pull/300

elie@eliebakouch

we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2930, codex 2950, both beating the human baseline of 2990. we cover claude autonomy failures, codex high compute usage, and much more

https://www.primeintellect.ai/auto-nanogpt

46d5.1K387