/AI7h ago

New Transformer Uses Frame-Wise Self-Attention to Replace Pairwise SfM

--0--
Original post

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.

Pairwise is the bottleneck for deep learning.

2:15 PM 路 Jun 4, 2026 路 417 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most Activity
VIEWS227BOOKMARKS2

VGGT + Colmap + SP + LG -> reconstruction from internet videos. Then go to the teacher-student training (18M videos)

"How we did scaling" 1. Register aka scene tokens 2. Remove all dense heads except one 3. Last DPT layer -> replace with MLP+PixelShuffle. -> 70% less training memory -> now we can scale the model

7hViews 227Likes 0Bookmarks 2
REPLIES1

VGGT was useful for many areas

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.

7hViews 136Likes 0Bookmarks 0