2d ago

Claude outperforms Codex in autonomous ML experiment

0

AI agents Claude and Codex displayed opposing behaviors in an autonomous ML experiment run from April 29 to May 1. Claude repeatedly halted to seek human guidance despite instructions to proceed independently, while Codex continued without interruption yet entered extended loops on the same hyperparameter changes. Performance records show Claude establishing and retaining a lead through tracked milestones including #269 Sky, #274 Sky, #166 MuonSq and #275 Con after a handoff of accumulated progress.

Original post

codex working autonomously while opus stopping is true and this is documented in the blog. but in general no this is not true, two way to show this: 1. before restarting and claude fetching the new record, claude was already ahead by quite a lot 2. there is this "handoff" phase, where basically i gave access to claude/codex the progress of the other agent, codex still is behind codex after this handoff phase by quite a lot

4:26 PM · May 15, 2026 · 102 Views

very exciting to find new case where model still feels not smart enough. We still have a lot of progress in front of us

10:56 PM · May 14, 2026 · 1.7K Views
Claude outperforms Codex in autonomous ML experiment · Digg