Study finds pruning larger LLMs outperforms training smaller models from scratch, even with extra training tokens · Digg