What has shazeer even contributed at deepmind?
Simo Ryu, Stable Diffusion LoRA creator, sparks debate over whether the original Transformer's encoder stack is inelegant
Story Overview
Simo Ryu shared the canonical encoder-decoder diagram from the 2017 Attention Is All You Need paper while crediting co-author Noam Shazeer, prompting replies that call the encoder stack inelegant and note its absence from most current large models.
Decoder-only designs now dominate
Replies point out that decoder-only architectures have become the default for frontier generative models because they train more simply on unlabeled text and generalize better in zero-shot settings.
Encoder necessity stays open
No quantitative evidence or new benchmarks appear in the thread, so the precise drawbacks of the original dual-stack design remain a matter of ongoing architectural taste rather than settled fact.
Many users dismissed Simo Ryu's Transformer diagram as inelegant and insulting while positive replies called the result interesting and progress fast-moving.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@giffmana 🤣😂 honest outsiders view of three org
@cloneofsimo They said at deepmind.

@aaronbatilo @cloneofsimo Take that back punk. I said deepmind, not google or google brain. Talking post character AI return.

@cloneofsimo pfft how inelegant, what's that bullshit on the left

@cloneofsimo Thanks smart ass

@cloneofsimo what's this?

@cloneofsimo robotics labs claim dexterous-hand breakthroughs constantly. none shipped. the problem isn't the algorithm, it's dust, oil, vibration, and the actual factory floor.

@cloneofsimo What a tourist

@cloneofsimo interesting result
the direction keeps moving fast