/AI9h ago

Larry Dial of Open Athena releases a Marin pretraining recipe delivering a 6x cumulative learning speedup using 129B MoEs

Training over 1.0 trillion tokens exhibited frequent loss spikes.

--0--
Original posts
Quote posts
Reposts
Original postCLS#442
Larry Dial@classiclarryd

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community contributions. https://openathena.ai/blog/pretraining-speedup/

9:26 AM · Jun 3, 2026 · 21.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS8.7KBOOKMARKS45LIKES94RETWEETS9REPLIES4
Kaiyue Wen@wen_kaiyue

Quoting @dlwh : we are at risk of losing the reputation of spiky loss runs!

This run incorporates some stability techniques from my past projects: Hyperball, Gated Norm, and Gated Attention. Excited to see the next run from Marin!

Larry Dial@classiclarryd

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community contributions. https://openathena.ai/blog/pretraining-speedup/

8hViews 8.7KLikes 94Bookmarks 45