Nathan Chen of Moonshot AI argues pretraining innovations make model architectures up to 10 times more compute efficient · Digg