Chan Hee Song finds vision-language models rely on 2D shortcuts, like vertical position, instead of understanding true 3D spatial relationships
The team introduced SpatialTunnel to test true spatial reasoning.
——0——
The team introduced SpatialTunnel to test true spatial reasoning.
Users praised the study on whether VLMs understand 3D space or exploit shortcuts, showing support by planning to test related models and commending the researchers and institutions involved.
3 comments with sentiment.