Greg Kamradt, President of the ARC Prize Foundation, adds a Codex harness to the ARC-AGI-3 agents repository to enable ablation tests across four tool configurations
Tests compare no tools, scratchpad, code execution, and combined setups.
@GregKamradt how is the scratchpad define here? something like mentioning in the prompt that it can write/read/delete a scratchpad.md file?
We're adding a codex harness added to our ARC-AGI-3 agents repo We're doing ablations to find what tool helps the most * Codex - No tools * Codex - Scratchpad only (read/write) * Codex - Code only (read/write/execute .py) * Codex - Scratchpad + Code My guess is scratchpad will account for the majority of the perf increase. Code will help, but only on certain games that are logic heavy (instead of exploration heavy) Example scratch pad screenshot
@GregKamradt ok perfect, i was going to say scratchpad dir would probably be even better than just a .md file
btw we found on our automated speedrun experiment that how scratchpad is used depends quite a lot on the model, would be curious if you find similar results on arc agi 👀

@eliebakouch Ya, basically access to a working directory just for that game run with ability to read/write Md files. External memory W/o the ability to write code
@eliebakouch Ya, basically access to a working directory just for that game run with ability to read/write Md files. External memory
W/o the ability to write code
@GregKamradt how is the scratchpad define here? something like mentioning in the prompt that it can write/read/delete a scratchpad.md file?
@eliebakouch Fascinating - will do
@GregKamradt ok perfect, i was going to say scratchpad dir would probably be even better than just a .md file btw we found on our automated speedrun experiment that how scratchpad is used depends quite a lot on the model, would be curious if you find similar results on arc agi 👀