1d agoChan Hee Song and Yu Su find weaker vision-language models rely on visual shortcuts instead of true 3D depth representationsThe researchers developed SpatialTunnel to evaluate spatial VQA.SentimentSentimentPos100%Neg0%Users appreciate the study probing whether VLMs truly grasp 3D space or exploit image shortcuts, citing its importance for real-world physical deployments and showing interest in trying the benchmarks.4 comments with sentiment. View comments.