/AI18h ago

Déjà View 3D reconstruction model matches models 10x larger by looping a single transformer block with shared weights

Andrew Davison says attention layers function like unwrapped optimizers.

28883112603100.9K

Quote posts

Reposts

Original post

Andrew Davison@AjdDavison#731inAI

Seemingly clarifying the interpretation that attention layers are like unwrapped iterations of a traditional optimiser.

Tobias Fischer@TobiasFischer11

Do 3D reconstruction transformers really need a billion parameters, or are most of those layers just doing the same thing over and over?

Introducing Déjà View: a single transformer block, looped K times, that matches or beats models 8–10× its size with lower compute. 🧵

11:19 PM · May 31, 2026 · 6.7K Views

/AI18h ago

Andrew Davison says attention layers function like unwrapped optimizers.

--0--

Quote posts

Reposts

Original post

Andrew Davison@AjdDavison#731inAI

Seemingly clarifying the interpretation that attention layers are like unwrapped iterations of a traditional optimiser.

Tobias Fischer@TobiasFischer11

Do 3D reconstruction transformers really need a billion parameters, or are most of those layers just doing the same thing over and over?

Introducing Déjà View: a single transformer block, looped K times, that matches or beats models 8–10× its size with lower compute. 🧵

11:19 PM · May 31, 2026 · 6.7K Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

RETWEETS85

Tobias Fischer@TobiasFischer11

Do 3D reconstruction transformers really need a billion parameters, or are most of those layers just doing the same thing over and over?

Introducing Déjà View: a single transformer block, looped K times, that matches or beats models 8–10× its size with lower compute. 🧵

2d82.2K664498