/AI18h ago

Déjà View 3D reconstruction model matches models 10x larger by looping a single transformer block with shared weights

Andrew Davison says attention layers function like unwrapped optimizers.

--0--
Quote posts
Reposts
Original post
Andrew Davison@AjdDavison#731inAI

Seemingly clarifying the interpretation that attention layers are like unwrapped iterations of a traditional optimiser.

Tobias Fischer@TobiasFischer11

Do 3D reconstruction transformers really need a billion parameters, or are most of those layers just doing the same thing over and over?

Introducing Déjà View: a single transformer block, looped K times, that matches or beats models 8–10× its size with lower compute. 🧵

11:19 PM · May 31, 2026 · 6.7K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
RETWEETS85
Tobias Fischer@TobiasFischer11

Do 3D reconstruction transformers really need a billion parameters, or are most of those layers just doing the same thing over and over?

Introducing Déjà View: a single transformer block, looped K times, that matches or beats models 8–10× its size with lower compute. 🧵

2dViews 82.2KLikes 664Bookmarks 498