5h ago

Researchers introduce ESI-Bench, a benchmark with 3,081 instances that tests AI agents on embodied spatial intelligence through active exploration and interaction in 3D environments

It spans 10 task categories on the BEHAVIOR-1K simulator.

0
Original post

Excited to share ESI-BENCH, a benchmark for Embodied Spatial Intelligence! Most spatial reasoning benchmarks assume an oracle observer: the agent is given the right image, view, or 3D scene. But in the real world, the observer is also an actor. To understand space, agents must decide where to look, how to move, and when to interact, to reveal what is hidden: occlusions, containment, contact, dynamics, and functionality. In many cases, the hard part is not perception itself, but choosing the right action to make informative perception possible. ESI-BENCH tests this perception-action loop. Agents receive an egocentric observation and a spatial question, then must actively gather evidence through perception, locomotion, and manipulationbefore answering. The benchmark spans 10 task categories, 29 subcategories, and 3,081 instances, built in BEHAVIOR-1K across realistic interactive scenes. 🌍Webpage: https://esi-bench.github.io 💻Code & data: https://github.com/ESI-Bench/ESI-Bench Thanks for collaborators: Jiageng, Han, @ManlingLi_ , Leonidas Guibas, @drfeifei , @jiajunwu_cs , @YejinChoinka

4:17 PM · May 19, 2026 View on X

Embodied Spatial Intelligence, aiming for perception-action loop 👇

Led the amazing @yining_hong

Yining HongYining Hong@yining_hong

Excited to share ESI-BENCH, a benchmark for Embodied Spatial Intelligence! Most spatial reasoning benchmarks assume an oracle observer: the agent is given the right image, view, or 3D scene. But in the real world, the observer is also an actor. To understand space, agents must decide where to look, how to move, and when to interact, to reveal what is hidden: occlusions, containment, contact, dynamics, and functionality. In many cases, the hard part is not perception itself, but choosing the right action to make informative perception possible. ESI-BENCH tests this perception-action loop. Agents receive an egocentric observation and a spatial question, then must actively gather evidence through perception, locomotion, and manipulationbefore answering. The benchmark spans 10 task categories, 29 subcategories, and 3,081 instances, built in BEHAVIOR-1K across realistic interactive scenes. 🌍Webpage: https://esi-bench.github.io 💻Code & data: https://github.com/ESI-Bench/ESI-Bench Thanks for collaborators: Jiageng, Han, @ManlingLi_ , Leonidas Guibas, @drfeifei , @jiajunwu_cs , @YejinChoinka

11:17 PM · May 19, 2026 · 10.7K Views
1:41 AM · May 20, 2026 · 1.4K Views
Researchers introduce ESI-Bench, a benchmark with 3,081 instances that tests AI agents on embodied spatial intelligence through active exploration and interaction in 3D environments · Digg