Zero Init On Geglu Out Weights Speeds Transformer Training By 10% · Digg