🚀 Our paper "Learning Multi-View Spatial Reasoning from Cross-View Relations (XVR)" has been accepted to #CVPR2026!
Current VLMs can reason from a single view surprisingly well, but they still struggle to connect information across multiple viewpoints.
To address this, we introduce XVR: • 100K-sample VQA dataset • 3 categories, 8 tasks • Designed specifically for cross-view spatial reasoning
Most excitingly, cross-view reasoning transfers to robot manipulation. Using an XVR-trained VLM as a VLA backbone improves RoboCasa manipulation success rates by +13%p on average.
Project page: https://cross-view-relations.github.io/ Paper: https://arxiv.org/abs/2603.27967
🍿 More details below