Can a video model learn correspondence from raw video, without track labels?
Our CVPR Highlight introduces Video-GMAE, which represents a video as 3D Gaussian splats moving over time, and leads to zero-shot point tracking. Visit our poster 3:30-5:30 on Sunday!
More in thread 馃У
