/Tech14h ago

NVIDIA Research unveils SpatialClaw, a training-free visual reasoning agent that writes Python code to beat baselines by 11.2 points

It runs NumPy and SciPy within a persistent kernel.

7102117014.3K

#33

Original post

sunil pai@threepointone

code mode!

NVIDIA AI@NVIDIAAI

Code is the right action interface for spatial reasoning agents.

New from NVIDIA Research: SpatialClaw, a training-free agent that uses code as its action interface for complex visual tasks.

Instead of calling a fixed set of pre-defined tools, the agent writes Python inside a persistent kernel, so it can compose perception modules, inspect intermediate results, and revise its strategy across steps. Perception outputs become ordinary variables it can reuse and combine with libraries like NumPy and SciPy.

With no benchmark-specific or model-specific tuning, it beats a recent prior agent by 11.2 points across 20 benchmarks and holds up consistently across six different model backbones.

You can check out SpatialClaw here: https://nvda.ws/4esHxr9

2:34 PM · Jun 16, 2026 · 7.3K Views

Sentiment

Users praise NVIDIA's training-free spatial reasoning agent for saving time and resources by using code interfaces instead of traditional model training.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

4es Hxr 9

NVDA.WSVia

#949

Posts from X

Most Activity

VIEWS1.1KBOOKMARKS5LIKES7

DailyPapers@HuggingPapers

Paper page: https://paperswithcode.co/paper/2606.13673

Project page: https://spatialclaw.github.io

Code: https://github.com/NVlabs/SpatialClaw

5d1.1K75

RETWEETS11

DailyPapers@HuggingPapers

SpatialClaw

NVIDIA drops a training-free spatial reasoning agent that uses code as its action interface. A VLM writes Python in a persistent kernel, composes perception tools, inspects results, and revises its plan—no fine-tuning needed. +11.2 points over prior agents on 20 benchmarks.

5d7K7962

DC｜use.fo@vibecoder_dc

@HuggingPapers Training-free spatial reasoning via code. It's basically giving the model a calculator and a Python kernel and saying "figure it out." Better than trying to teach it physics by reading a textbook.

5d911

Saeed Anwar@saen_dev

@HuggingPapers The code-as-action interface is smart because it gives you a natural inspection layer. The failure mode to watch is when the VLM generates syntactically valid Python that is semantically wrong about spatial relationships and the interpreter just runs it cleanly.

2d3

Kekko D’Amato@kekkodamato_

@HuggingPapers Training-free models are definitely the future; they save time and resources.

5d1

XIIN@ai_next_level

@HuggingPapers I am looking for some talented people. If you are multi-talented, only then. We are going to build a strong working team. (tg) @deepsourcepy