/AI11h ago

Researchers Introduce Muown Row-Norm Technique For Modded-NanoGPT Training

7213206311K

Original posts

#424

Comments

#424

Original post

Keller Jordan@kellerjordan0#424inAI

Here's a thread of six academic (non-world-record) Modded-NanoGPT optimization results from the past few weeks.

Result #23: Kai Lion and Florian Hübler have contributed a 3075-step run using a row-norm control technique called Muown. This result is notable for its simplicity.

6:36 PM · May 31, 2026 · 7.1K Views

/AI11h ago

Researchers Introduce Muown Row-Norm Technique For Modded-NanoGPT Training

--0--

Original posts

#424

Comments

#424

Original post

Keller Jordan@kellerjordan0#424inAI

Here's a thread of six academic (non-world-record) Modded-NanoGPT optimization results from the past few weeks.

Result #23: Kai Lion and Florian Hübler have contributed a 3075-step run using a row-norm control technique called Muown. This result is notable for its simplicity.

6:36 PM · May 31, 2026 · 7.1K Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1KBOOKMARKS3LIKES23RETWEETS2REPLIES1

Keller Jordan@kellerjordan0

Result #24: @zhiweixux showed that the 2026/05/01 record could be improved by 50 steps by separately tuning the WSD cooldown fractions for the hidden matrix vs. other parameters. Note: This can't be applied to the current record because it uses a power lr schedule.

Keller Jordan@kellerjordan0

Here's a thread of six academic (non-world-record) Modded-NanoGPT optimization results from the past few weeks.

Result #23: Kai Lion and Florian Hübler have contributed a 3075-step run using a row-norm control technique called Muown. This result is notable for its simplicity.

11h1K233