- An efficient Triton implementation supports real-time autoregressive inference.
https://arxiv.org/abs/2606.10944