Zero Init On Geglu Out Weights Speeds Transformer Training By 10% · Digg
17h
ago
Zero Init On Geglu Out Weights Speeds Transformer Training By 10%