1d ago

Researchers Boris Hanin, William Barr Held, and Percy Liang validate a scaling law predicting pre-training loss for a 129-billion-parameter MoE model

The final 2.234 loss closely matched their 2.252 prediction.

16342611.7K

——0——

Original post

#22@PERCYLIANGOP

Boris Hanin@BORISHANIN

Incredible predictability for pre-training loss across a more than 100x scaling up of compute Big congrats to @WilliamBarrHeld and @percyliang HP transfer / parameterization based in part on our work with @CPehlevan @blake__bordelon and Tianze Jiang Part of @DARPA AIQ run by @patrickshafto

11:28 AM · May 28, 2026

QUOTE POST

#505Séb Krier@SEBKRIER

Saw this cool post by Percy: https://x.com/percyliang/status/2058621601542009341 and it reminded me of the QT'd paper.

Question: what have we learnt about how to interpret pretrainig loss over the past two years? Any good papers I should add to the neverending list?

8:29 PM · May 28, 2026 · 2.6K Views

QUOTE POST

#1426Patrick Shafto@PATRICKSHAFTO

Remarkable results! So exciting.

Congrats @BorisHanin and @WilliamBarrHeld, @percyliang

@DARPA AIQ program!

Boris Hanin@BorisHanin

6:28 PM · May 28, 2026 · 8.4K Views

6:45 PM · May 28, 2026 · 757 Views

#1480gavin leech (Non-Reasoning)@GLEECH

@sebkrier I loved Allen-Zhu's blitz around the same time https://arxiv.org/pdf/2404.05405

Séb Krier@sebkrier

Saw this cool post by Percy: https://x.com/percyliang/status/2058621601542009341 and it reminded me of the QT'd paper. Question: what have we learnt about how to interpret pretrainig loss over the past two years? Any good papers I should add to the neverending list?

8:29 PM · May 28, 2026 · 2.6K Views

8:43 PM · May 28, 2026 · 299 Views

Researchers Boris Hanin, William Barr Held, and Percy Liang validate a scaling law predicting pre-training loss for a 129-billion-parameter MoE model

Cluster engagement

Sentiment