Tim Lau and Weijie Su introduce a symmetry-compatible principle for designing large language model optimizers that yields an end-to-end layerwise stack for embeddings, heads, SwiGLU MLPs, and MoE routers
Paper posted to arxiv with accompanying github code.
——0——
QUOTE POST
#1502Weijie Su@WEIJIE444
An *end-to-end layerwise optimizer stack where every major matrix-valued parameter gets updated in a principled way*! Tim is simply amazing!
1/4 New paper with @weijie444! We introduce a symmetry-compatible principle for LLM optimizer design and, as a byproduct, get an end-to-end layerwise optimizer stack where every major matrix-valued parameter (embeddings, LM heads, SwiGLU MLPs, MoE routers) has its own principled update! 📝 http://arxiv.org/abs/2605.18106 💻 http://github.com/timlautk/equivariant_optimizers
5:07 PM · May 19, 2026 · 12K Views
6:14 PM · May 19, 2026 · 2.9K Views