much gratitude to the many colleagues at @modal whose insights & work i shared here -- @_gongy, @_dcw02, @saatwiknagpal, @DevenNavani, @be4ncurd, @emilyhanyf, @hsubbaraj, @racerfunction, @luiscape, @jonobelotti_IO, @mma12261, @teenychairs & probably others <3
Tried to squeeze the most important bits about the entire stack for cloud deployment of transformer inference, from application layer concerns to hardware, debugging, and o11y, into one talk. Had to operate at a very high tok/s!
https://www.youtube.com/watch?v=ZUdIsRZhWXI

