Prefill versus decode, arithmetic intensity, roofline plots and transformer basics.
Slides can be found at https://alex.smola.org/posts/45-mlss-efficiency/
Alex Smola@smolix
Here's part 1 (of 5) of my short course on efficient LLM inference that I taught at Columbia University. Slides are heavily updated from two weeks ago.
https://www.youtube.com/watch?v=3ggYI8Osgss
10:20 AM · Jul 2, 2026 · 462 Views