You want a strong small LLM. Would you start small — or inherit from something bigger?
📄 New paper: Small LLMs: Pruning vs. Training from Scratch
We find that pruning is more than a better initialization: simply giving randomly initialized LLMs more training tokens is often not enough to catch up.










