9h agoNew NanoGPT Speedrun Record Set at 81.2 Seconds with Learnable XSA——0——Original postKJ#424@KELLERJORDAN0OPLDLarry Dial|@CLASSICLARRYDNew NanoGPT Speedrun WR at 81.2 (-0.6s) from @_djdumpling , with learnable XSA. Per head learnable scalar to subtract out the portion of attn that is orthogonal to a token's own value vector. Applied to the 6 non paired head layers. https://github.com/KellerJordan/modded-nanogpt/pull/26411:16 AM · May 25, 2026 View on X