10h ago

Larry Dial of Open Athena announces a 6x pretraining speedup for Marin after transitioning to a 129B MoE

Hyperball and Gated Norm stabilized training over a trillion tokens.

Sentiment

Pos100%

Neg0%

Users praised Marin's reported 6x pretraining speedup with 129B MoE models because the upgrades enable stable training that outperforms alternatives like swiglu while delivering an impressive performance curve.

4 comments with sentiment.

Larry Dial of Open Athena announces a 6x pretraining speedup for Marin after transitioning to a 129B MoE · Digg