/Tech20d ago

NanoGPT Speedrun Sets New Record at 81.8 Seconds with MUDD Skip Connections

8164289611.4K

#703

Original post

Keller Jordan#703

Larry Dial@classiclarryd

New NanoGPT Speedrun WR at 81.8 (-2.6s) from @.Lisennlp on Github with MUDD skip connections, an expressive and efficient mechanism for data dependent skips! Instead of a learned scalar or sigmoid(linear) gate, MUDD uses a 64 neuron 'MLP' to generate the coefficients. The key efficiency is in reusing the same input projection for up to 14 coefficients at once. https://github.com/KellerJordan/modded-nanogpt/pull/259

2:53 AM · May 24, 2026 · 11.4K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS19RETWEETS1

Larry Dial@classiclarryd

Paper here: https://arxiv.org/pdf/2502.12170. The MUDD coefficients are used for many purposes, such as routing multiple layers into future attention values, modulating the value embedding, modulating the bigram embedding, etc. (Delay on merging this PR was due to H100 shortage)

20d19