/Tech1d ago

Marin creator Percy Liang says stacking incremental technical improvements yielded a 6x pretraining speedup for its 129B MoE model

The project prioritized tracking Model Flops Utilization during pretraining.

310143418.9K

#313

Original post

Percy Liang@percyliang#313inTech

There are two types of advances: (i) a singular change that provides 3x and (ii) a series of micro changes that each provide 20%. It is easy to celebrate (i), but (ii) is just as important, and the hard part is making sure the improvements stack. We care about both in Marin.

Larry Dial@classiclarryd

Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community contributions. https://openathena.ai/blog/pretraining-speedup/

8:28 AM · Jun 5, 2026 · 16K Views

Sentiment

Positive users praise Marin's 6x pretraining speedup from stacked MoE and optimizer gains as nice results, while negative users dismiss the compounding of micro-gains as tedious work rather than strategy.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3KLIKES2

Tim Dettmers@Tim_Dettmers

@percyliang Super nice results -- keep it up!

Percy Liang@percyliang

21h3K20

Strata@ChainZenit

@percyliang Compounding the micro-gains is just tedious work, not a strategy.

1d64

Rugbist@rugbist_

@percyliang the compounding bit is where most people tap out. curious how you weigh the 20% gains against each other before stacking

1d49