Integrating dynamic short convolutions improves Transformer performance across scales using custom Triton GPU kernels · Digg