/Tech3h ago

Paper Reveals Scaling LayerNorm Fixes Underused Deep Layers in LLMs

6541634.4K

Original post

No idea if this has anything to do with Mythos (whose secret sauce might instead be about architecture, optimizer, training objective or data), or whether all the labs are already doing something like this. But the paper is interesting and deserves to be better-known!

Jeremy Cohen@deepcohen

Did Anthropic get more gains out of model scaling than other labs thought was possible? It reminds me of an interesting recent paper, which showed that deep layers in open LLMs are not doing much, and that this can be fixed by scaling the LayerNorm output. https://arxiv.org/abs/2502.05795

12:29 PM · Jun 12, 2026 · 653 Views

Sentiment

Users find the paper on scaling LayerNorm output to revive underperforming deep layers in LLMs interesting and think it deserves wider attention.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.8KBOOKMARKS63LIKES50RETWEETS1REPLIES5

Jeremy Cohen@deepcohen

3h3.8K5063

Xidulu@xidulu

@deepcohen The layer norm scaling recipe seems identical to Depth-MuP's (or more recently, Complete P) recipe.

3h201