/AI7h ago

New Transformer Uses Frame-Wise Self-Attention to Replace Pairwise SfM

52131.1K

Original post

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.

Dmytro Mishkin 🇺🇦@ducha_aiki

Pairwise is the bottleneck for deep learning.

2:15 PM · Jun 4, 2026 · 417 Views

/AI7h ago

New Transformer Uses Frame-Wise Self-Attention to Replace Pairwise SfM

--0--

#1505

Original post

Dmytro Mishkin 🇺🇦@ducha_aiki#1505inAI

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.

Dmytro Mishkin 🇺🇦@ducha_aiki

Pairwise is the bottleneck for deep learning.

2:15 PM · Jun 4, 2026 · 417 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS227BOOKMARKS2

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT + Colmap + SP + LG -> reconstruction from internet videos. Then go to the teacher-student training (18M videos)

Dmytro Mishkin 🇺🇦@ducha_aiki

"How we did scaling" 1. Register aka scene tokens 2. Remove all dense heads except one 3. Last DPT layer -> replace with MLP+PixelShuffle. -> 70% less training memory -> now we can scale the model

7h22702

RETWEETS1

Dmytro Mishkin 🇺🇦@ducha_aiki

2M sequences to train VGGT-Omega! "Scaling is not easy"

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT was useful for many areas

7h13500

REPLIES1

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT was useful for many areas

Dmytro Mishkin 🇺🇦@ducha_aiki

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.

7h13600

Posts from X

Most Activity

VIEWS227BOOKMARKS2

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT + Colmap + SP + LG -> reconstruction from internet videos. Then go to the teacher-student training (18M videos)

Dmytro Mishkin 🇺🇦@ducha_aiki

"How we did scaling" 1. Register aka scene tokens 2. Remove all dense heads except one 3. Last DPT layer -> replace with MLP+PixelShuffle. -> 70% less training memory -> now we can scale the model

7h22702

RETWEETS1

Dmytro Mishkin 🇺🇦@ducha_aiki

2M sequences to train VGGT-Omega! "Scaling is not easy"

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT was useful for many areas

7h13500

REPLIES1

Dmytro Mishkin 🇺🇦@ducha_aiki

VGGT was useful for many areas

Dmytro Mishkin 🇺🇦@ducha_aiki

Cross attention is bottleneck? Self-attention FTW. Frame-wise Attention as replacement for the index embedding -- kind of new perspective to me.