This Meta + Stanford + Illinois survey paper argues that AI agents work better when code becomes their main working layer.
The problem is that an LLM by itself is mostly a text predictor, so long tasks can lose state, hide mistakes, and turn plans into actions in fragile ways.
The real advance is not “AI writes code,” but “AI uses code as the environment it thinks inside.”
The authors call the surrounding system an agent harness, meaning the tools, memory, sandboxes, checks, and feedback loops that turn a model into an agent.
Their core idea is that code should sit at the center of that harness, because code can be run, inspected, checked, saved, edited, and shared.
Tests become sensors.
Repositories become memory.
Logs become history.
Sandboxes become boundaries.
A generated script is no longer merely an answer; it is a handle the system can run, check, revise, share, and roll back.
The main finding is a pattern across many fields: code helps agents reason through executable steps, act through tool calls or control programs, and model environments through tests, traces, logs, repositories, and simulators.
----
Paper Link – arxiv. org/abs/2605.18747
Paper Title: "Code as Agent Harness"
