code mode!
Code is the right action interface for spatial reasoning agents.
New from NVIDIA Research: SpatialClaw, a training-free agent that uses code as its action interface for complex visual tasks.
Instead of calling a fixed set of pre-defined tools, the agent writes Python inside a persistent kernel, so it can compose perception modules, inspect intermediate results, and revise its strategy across steps. Perception outputs become ordinary variables it can reuse and combine with libraries like NumPy and SciPy.
With no benchmark-specific or model-specific tuning, it beats a recent prior agent by 11.2 points across 20 benchmarks and holds up consistently across six different model backbones.
You can check out SpatialClaw here: https://nvda.ws/4esHxr9




