gpu scarcity is an engineering problem
at @raais this month, @elevenlabs' @angelos_peri showed how to serve 70x more users on the same gpus by using batching, fp8, speculative decoding, kv-cache compression.
new on @airstreetpress and on our raais youtube channel