/AI9h ago

Larry Dial of Open Athena releases a Marin pretraining recipe delivering a 6x cumulative learning speedup using 129B MoEs

Training over 1.0 trillion tokens exhibited frequent loss spikes.

112042310030.2K

Original posts

Quote posts

Reposts

#651

Original post

CLS#442

Larry Dial@classiclarryd

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community contributions. https://openathena.ai/blog/pretraining-speedup/

9:26 AM · Jun 3, 2026 · 21.5K Views

/AI9h ago

Larry Dial of Open Athena releases a Marin pretraining recipe delivering a 6x cumulative learning speedup using 129B MoEs

Training over 1.0 trillion tokens exhibited frequent loss spikes.

--0--

Original posts

Quote posts

Reposts

#651

Original post

CLS#442

Larry Dial@classiclarryd

9:26 AM · Jun 3, 2026 · 21.5K Views

Sentiment

Users praise Marin's reported 6x pretraining speedup with 129B MoE models for enabling stable glu-based training that beats alternatives along with strong performance curves.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS8.7KBOOKMARKS45LIKES94RETWEETS9REPLIES4

Kaiyue Wen@wen_kaiyue

Quoting @dlwh : we are at risk of losing the reputation of spiky loss runs!

This run incorporates some stability techniques from my past projects: Hyperball, Gated Norm, and Gated Attention. Excited to see the next run from Marin!

Larry Dial@classiclarryd

8h8.7K9445