Christian Szegedy says transformers were intuitive and simple from the start as the architecture replaced multiple complicated attention mechanisms then in use
Pedro Domingos, Professor Emeritus at the University of Washington, linked to a thread on the paper.
@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one.
At that time, there were a lot of complicated models with various attention-based mechanisms.
As the title also suggested, the transformer was a significant simplification over those overcomplicated models.
Deep learning papers are confusing because deep learning researchers are confused.
If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.
If the transformers paper was written by one of my students, I wouldn’t let him graduate until he did a better job.
@ChrSzegedy Simplification, yes. Intuitive, no. More than one of the authors have told me they didn’t know what they were doing, and it shows.
@pmddomingos Maybe, but transformers were pretty intuitive and simple from day one. At that time, there were a lot of complicated models with various attention-based mechanisms. As the title also suggested, the transformer was a significant simplification over those overcomplicated models.