/Tech21h ago

Video-GMAE Enables Zero-Shot Point Tracking From Raw Video

219695.3K

Original post unavailable.

/Tech21h ago

Video-GMAE Enables Zero-Shot Point Tracking From Raw Video

219695.3K

Original post unavailable.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Tanish Baranwal @ CVPR@TanishBaranwal

(7/7)

Models and code are open-source!

Project: https://videogmae.org/ Code: https://github.com/tekotan/video-gmae Paper: https://arxiv.org/abs/2512.22489

Big thanks to my co-authors @Cinnabar233, @jathushan, @JitendraMalikCV, @berkeley_ai #CVPR2026

21h77

REPLIES1

Tanish Baranwal @ CVPR@TanishBaranwal

(6/7)

There are still clear limitations: static-camera pretraining, a 256-Gaussian budget, and difficulty with fine details under large motion. Fixing these might lead to video-SSL objectives with other emergent capabilities.

21h47

Tanish Baranwal @ CVPR@TanishBaranwal

(3/7)

Video-GMAE makes the decoder represent a clip as Gaussians that persist over time.

Frame 1: predict 256 3D Gaussian primitives. Later frames: predict residual motion and color deltas for the same primitives. Render everything differentiably and train from raw video.

21h48

Tanish Baranwal @ CVPR@TanishBaranwal

(2/7)

Most video MAEs predict masked patch tokens. They can reconstruct pixels without really preserving object identity across frames.

Video-GMAE makes correspondence part of the pretraining problem itself.

21h48

Tanish Baranwal @ CVPR@TanishBaranwal

(4/7)

Since each Gaussian keeps its identity, we can project its 3D motion into the image plane, splat displacements into a flow field, and follow the flow to track any query point.

No tracks, flow, masks, or boxes in pretraining.

21h39

Tanish Baranwal @ CVPR@TanishBaranwal

(5/7)

Zero-shot Video-GMAE is competitive with and in most cases outperforms the best self-supervised trackers:

21h25