VGGT-Omega had a really nice presentation today. tl;dr: scaling up leads to sweets benefits. This is known as but the VGGT architecture needs to be prepared for actual scale-up. The authors propose 2 axis of improvement: architecture and data #cvpr2026
Most Activity
On the architecture side: - introduce camera registers and do x-attention on them between images - reduce num. of heads in multi-task training - replace high-res conv layer w/ MLP + PixelShuffle Outcome: 70% training memory reduction -> “gpus don’t go boom” #cvpr2026
VGGT-Omega had a really nice presentation today. tl;dr: scaling up leads to sweets benefits. This is known as but the VGGT architecture needs to be prepared for actual scale-up. The authors propose 2 axis of improvement: architecture and data #cvpr2026
On the data side they go 15x more than VGGT but pay extra attention to data quality #cvpr2026
On the architecture side: - introduce camera registers and do x-attention on them between images - reduce num. of heads in multi-task training - replace high-res conv layer w/ MLP + PixelShuffle Outcome: 70% training memory reduction -> “gpus don’t go boom” #cvpr2026
This improves performance across the board. The authors also compile a list of the many divers applications of VGGT so far #cvpr2026
On the data side they go 15x more than VGGT but pay extra attention to data quality #cvpr2026