10h agoLarry Dial of Open Athena announces a 6x pretraining speedup for Marin after transitioning to a 129B MoEHyperball and Gated Norm stabilized training over a trillion tokens.SentimentSentimentPos100%Neg0%Users praised Marin's reported 6x pretraining speedup with 129B MoE models because the upgrades enable stable training that outperforms alternatives like swiglu while delivering an impressive performance curve.4 comments with sentiment. View comments.