2h ago

NanoGPT Speedrun Sets New Record at 81.8 Seconds with MUDD Skip Connections

0
Original post

New NanoGPT Speedrun WR at 81.8 (-2.6s) from @.Lisennlp on Github with MUDD skip connections, an expressive and efficient mechanism for data dependent skips! Instead of a learned scalar or sigmoid(linear) gate, MUDD uses a 64 neuron 'MLP' to generate the coefficients. The key efficiency is in reusing the same input projection for up to 14 coefficients at once. https://github.com/KellerJordan/modded-nanogpt/pull/259

2:53 AM · May 24, 2026 View on X