Tapered MLP Widths Improve Language Model Efficiency and Accuracy · Digg