Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means.
Here's @_dcw02's latest work with @zhijianliu_'s DFlash technique in @sgl_project -- ~1k TPS!
@_dcw02 @zhijianliu_ @sgl_project @jianchen1799 big fan of the architecture and how y'all have demonstrated its benefits. major inspiration for this <3
Added a fun lil widget to the LLM Engineer's Almanac -- a "Token Timing Simulator" so you can get a visceral feel for what a benchmark perf number means. Here's @_dcw02's latest work with @zhijianliu_'s DFlash technique in @sgl_project -- ~1k TPS! https://modal.com/llm-almanac/token-timing-simulator