There are two types of advances: (i) a singular change that provides 3x and (ii) a series of micro changes that each provide 20%. It is easy to celebrate (i), but (ii) is just as important, and the hard part is making sure the improvements stack. We care about both in Marin.
Building momentum at Marin! Upgrading from Dense -> 129B parameter MoEs -> architecture improvements -> optimizer improvements gives our pretraining recipe an estimated 6x cumulative learning speedup, accounting for MFU. Includes community contributions. https://openathena.ai/blog/pretraining-speedup/

