Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model.
6.7× faster than the next GPU cloud, validated by Artificial Analysis.
The hard part is moving model weights and activations fast enough, because normal GPU clusters split the model across many chips and spend a lot of time passing data between them.
Cerebras uses wafer-scale chips, meaning one processor is built across a full silicon wafer, so more of the routing happens on-chip with much higher bandwidth and lower delay.
The real business claim is not just speed, but speed on a model big enough for enterprise coding agents, where every extra second slows testing, debugging, and iteration.
---
cerebras. ai/blog/cerebras-kimi-k2-Enterprise