19h ago

Christian Szegedy says transformers were intuitive and simple from the start as the architecture replaced multiple complicated attention mechanisms then in use

Pedro Domingos, Professor Emeritus at the University of Washington, linked to a thread on the paper.

0
Original post

Deep learning papers are confusing because deep learning researchers are confused.

10:43 PM · May 18, 2026 View on X

@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one.

At that time, there were a lot of complicated models with various attention-based mechanisms.

As the title also suggested, the transformer was a significant simplification over those overcomplicated models.

Pedro DomingosPedro Domingos@pmddomingos

Deep learning papers are confusing because deep learning researchers are confused.

5:43 AM · May 19, 2026 · 7.2K Views
5:58 AM · May 19, 2026 · 2.5K Views
Pedro DomingosPedro Domingos@pmddomingos

If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.

6:16 AM · May 19, 2026 · 62.5K Views
8:32 AM · May 19, 2026 · 2.3K Views

If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.

6:16 AM · May 19, 2026 · 62.5K Views

@ChrSzegedy Simplification, yes. Intuitive, no. More than one of the authors have told me they didn’t know what they were doing, and it shows.

Christian SzegedyChristian Szegedy@ChrSzegedy

@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one. At that time, there were a lot of complicated models with various attention-based mechanisms. As the title also suggested, the transformer was a significant simplification over those overcomplicated models.

5:58 AM · May 19, 2026 · 2.5K Views
6:15 AM · May 19, 2026 · 322 Views