Btw I think V4-Pro has modestly accelerated V4-Flash is at 100 t/s, and indeed seems more token-effective This is really nice after what feels like years of DS API at 22 t/s
DeepSeek V4-Flash API reportedly reaches speeds of 100 tokens per second, up from a 22 t/s baseline
Engineers are debating the underlying optimizations driving the speed gains
No Digg Deeper questions have been answered for this story yet.
Most Activity
Why would DeepSeek get 22-40% faster? saw up to 110 t/s Flash, up to 90+ on Pro Inference optimizations, like at other labs? I would think they've already optimized the hell of it for RL alone, they had built this architecture for speed. Smaller bs? Or new hardware at last?
@teortaxesTex better speculation?
Why would DeepSeek get 22-40% faster? saw up to 110 t/s Flash, up to 90+ on Pro Inference optimizations, like at other labs? I would think they've already optimized the hell of it for RL alone, they had built this architecture for speed. Smaller bs? Or new hardware at last?
…on second thought I guess it might be optimization actually
Why would DeepSeek get 22-40% faster? saw up to 110 t/s Flash, up to 90+ on Pro Inference optimizations, like at other labs? I would think they've already optimized the hell of it for RL alone, they had built this architecture for speed. Smaller bs? Or new hardware at last?

@teortaxesTex will it be today?

@teortaxesTex ds api was 30ts-40ts for me