OpenClaw scores 5.2 percent on ARC-AGI-3 benchmark
OpenClaw, an AI agent powered by Anthropic's Opus 4.7 model, scored 5.2 percent on the ARC-AGI-3 public demo set. The result earned the agent $2,900 and a spot on the community leaderboard. OpenClaw used long-term memory and code execution for the tasks. On ka59 it completed the first two levels before looping on the third and was halted after five times the actions needed by a human.
New harness results for ARC v3 on the community leaderboard. We want to do more of these! There is a lot of harness innovation happening and the most general ideas will migrate to the model layer.
ARC-AGI-3 Community Leaderboard OpenClaw, using Anthropic Opus 4.7, scores 5.2% ($2.9K) on ARC-AGI-3 Public Demo Set OpenClaw used long term memory and code execution Here OpenClaw is playing ka59, it solves the first 2 levels and then breaks down into a loop
OpenClaw (via Opus 4.7) on ARC-AGI-3
It solves 2 levels of ka59 no problem, it clearly gets how to play the game. Then it gets caught in a loop and fails level 3
We cut it off after 5x human actions
ARC-AGI-3 Community Leaderboard OpenClaw, using Anthropic Opus 4.7, scores 5.2% ($2.9K) on ARC-AGI-3 Public Demo Set OpenClaw used long term memory and code execution Here OpenClaw is playing ka59, it solves the first 2 levels and then breaks down into a loop
Thank you to Bailey and @rob0the0nerd of Klaus helping us get this set up
OpenClaw (via Opus 4.7) on ARC-AGI-3 It solves 2 levels of ka59 no problem, it clearly gets how to play the game. Then it gets caught in a loop and fails level 3 We cut it off after 5x human actions