Super cool and detailed blog post by @eliebakouch from @PrimeIntellect on running agents to improve on the nanogpt speedrun!
I find it interesting that they saw Claude Code stop earlier than Codex. In my personal experience the opposite is the case. Whenever I instruct the agent to only stop when I interrupt it, I get much longer running and more comprehensive work by Opus-4.7 compared to GPT 5.5. However, GPT 5.5 is more targeted and goes down fewer wrong rabbit holes in my experience. I guess as always, depends a lot on the prompts/harness.
They release all scratchpads where agents write their thoughts, the ~10k run logs, scripts, and configs: http://github.com/PrimeIntellect-ai/experiments-autonomous-speedrunning.