19h ago

Christian Szegedy says transformers were intuitive and simple from the start as the architecture replaced multiple complicated attention mechanisms then in use

Pedro Domingos, Professor Emeritus at the University of Washington, linked to a thread on the paper.

263731015174.8K

——0——

Original post

#654Pedro Domingos@PMDDOMINGOS

Deep learning papers are confusing because deep learning researchers are confused.

10:43 PM · May 18, 2026

#43Christian Szegedy@CHRSZEGEDY

@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one.

At that time, there were a lot of complicated models with various attention-based mechanisms.

As the title also suggested, the transformer was a significant simplification over those overcomplicated models.

Pedro Domingos@pmddomingos

Deep learning papers are confusing because deep learning researchers are confused.

5:43 AM · May 19, 2026 · 7.2K Views

5:58 AM · May 19, 2026 · 2.5K Views

QUOTE POST

#87Edward Grefenstette@EGREFEN

Pedro Domingos@pmddomingos

If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.

6:16 AM · May 19, 2026 · 62.5K Views

8:32 AM · May 19, 2026 · 2.3K Views

QUOTE POST

#654Pedro Domingos@PMDDOMINGOS

If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.

6:16 AM · May 19, 2026 · 62.5K Views

#654Pedro Domingos@PMDDOMINGOS

@ChrSzegedy Simplification, yes. Intuitive, no. More than one of the authors have told me they didn’t know what they were doing, and it shows.

Christian Szegedy@ChrSzegedy

@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one. At that time, there were a lot of complicated models with various attention-based mechanisms. As the title also suggested, the transformer was a significant simplification over those overcomplicated models.

5:58 AM · May 19, 2026 · 2.5K Views

6:15 AM · May 19, 2026 · 322 Views

Christian Szegedy says transformers were intuitive and simple from the start as the architecture replaced multiple complicated attention mechanisms then in use

Cluster engagement

Sentiment