/AI5h ago

SpatialUncertain Framework Evaluates VLMs On Spatial Uncertainty Detection

--0--
Original posts
Quote posts
Reposts
Original postMohit Bansal#245
Yue Zhang@zhan1624

🚨 Excited to share SpatialUncertain — a controlled framework for evaluating whether VLMs know when not to answer spatial questions (and why).

➡️ Spatial reasoning is not just about finding the right answer—it is about knowing whether the available evidence supports an answer at all.

Visual observations can be incomplete or even misleading. 📦 Objects may be hidden by occlusion. 📐 Perspective may create misleading visual cues.

Yet today's VLMs are usually evaluated as if every question has a reliable answer. We introduce SpatialUncertain, a controlled framework for evaluating: 🔍 Can VLMs recognize when visual evidence is insufficient or unreliable? 🧭 Can they identify what additional viewpoints are needed before answering?

Thread🧵👇

8:53 AM · Jun 1, 2026 · 3.1K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS729BOOKMARKS2LIKES8RETWEETS6
Zun Wang@ZunWang919

Seeing ≠ knowing. 👀

Super fun project led by Yue — we built SpatialUncertain to test whether VLMs realize when a viewpoint is occluded or misleading, making a question unanswerable. Spoiler: they mostly don’t, and they don’t know why (i.e., they can’t pick a better view) either.

Details below 👇

Yue Zhang@zhan1624

🚨 Excited to share SpatialUncertain — a controlled framework for evaluating whether VLMs know when not to answer spatial questions (and why).

➡️ Spatial reasoning is not just about finding the right answer—it is about knowing whether the available evidence supports an answer at all.

Visual observations can be incomplete or even misleading. 📦 Objects may be hidden by occlusion. 📐 Perspective may create misleading visual cues.

Yet today's VLMs are usually evaluated as if every question has a reliable answer. We introduce SpatialUncertain, a controlled framework for evaluating: 🔍 Can VLMs recognize when visual evidence is insufficient or unreliable? 🧭 Can they identify what additional viewpoints are needed before answering?

Thread🧵👇

5hViews 729Likes 8Bookmarks 2