📢 OneCanvas: 3D Scene Understanding via Panoramic Reprojection
We extract features from video frames and reproject them into one occlusion-free view of the whole scene that a 2D VLM reads just like a normal image. We can center this view on any viewpoint, including an agent's own pose for situated reasoning.
The same projection lets us create spatial training tasks with no human annotation, solvable only by reasoning over the 3D positions of real object features placed on an otherwise empty canvas.
The result is a stock 2D VLM that reasons in 3D, setting a new state of the art across spatial benchmarks at far less compute.
🌐 https://baranowskibrt.github.io/onecanvas/ ▶️ https://youtu.be/NIaHLB9gA7s
Great work by @baranowskibrt & @davech2y