Oliver Sieberling's new paper shows dynamic short convolutions yield a 1.33x to 1.5x compute advantage over standard Transformers · Digg