On Friday, we released six new state-of-the-art drafters for accelerated inference.
We also put out a blog post on why spec dec is so great. Supporting that was a roofline model of speedup from speculation.
Play with it in our LLM Engineer's Almanac:
https://modal.com/llm-almanac/spec-dec-roofline