/AI15h ago

Researcher Sets New Modded-NanoGPT Record at 2990 Training Steps

--0--
Original posts
Quote posts
Original post
Keller Jordan@kellerjordan0#424inAI

Modded-NanoGPT optimization result #29 (2026/05/11): @nilinabra has achieved a new step-count record of 2990 (40-step improvement) by halving the growth rate of the L2-norm of the hidden matrix parameters.

This result is better than the previous record with a p-value of 4e-5.

10:37 AM · Jun 2, 2026 · 6.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS4.3KBOOKMARKS40LIKES40RETWEETS4REPLIES1
Nilin@nilinabra

My thinking was to control weight norm without needing to tune weight decay. WD takes effect as the norms get near an equilibrium. Radial brake compresses the outward gradient component and takes effect immediately. It also affects the condition number differently than WD. https://nilin.github.io/radial-brake/

Keller Jordan@kellerjordan0

Modded-NanoGPT optimization result #29 (2026/05/11): @nilinabra has achieved a new step-count record of 2990 (40-step improvement) by halving the growth rate of the L2-norm of the hidden matrix parameters.

This result is better than the previous record with a p-value of 4e-5.

7hViews 4.3KLikes 40Bookmarks 40