1d ago

Cerebras places Kimi K2.6, a trillion-parameter model, into enterprise trials running at roughly 1,000 output tokens per second, the highest speed Artificial Analysis has recorded for any frontier model

Benchmarks show 981 tokens per second on 10,000 input tokens.

2346.4K4351.3K1.0M

——0——

Original post

#980@SCALING01OP

Cerebras@CEREBRAS

Cerebras is now running Kimi K2.6 – a trillion parameter model – in enterprise trials. At ~1,000 tokens/s, this is the fastest frontier model performance ever measured by Artificial Analysis @ArtificialAnlys.

9:44 AM · May 19, 2026

QUOTE POST

#68clem 🤗@CLEMENTDELANGUE

I remember when people were saying "It's useless to open-source big models because nobody will be able to run them fast"....

Cerebras@cerebras

4:44 PM · May 19, 2026 · 685.1K Views

11:10 PM · May 19, 2026 · 269.8K Views

QUOTE POST

#687Bojan Tunguz@TUNGUZ

Holy guacamole!

Cerebras@cerebras

4:44 PM · May 19, 2026 · 685.1K Views

5:12 PM · May 19, 2026 · 17.4K Views

QUOTE POST

#839Beff (e/acc)@BEFFJEZOS

The speed of intelligence is accelerating.

2:54 AM · May 20, 2026 · 10.3K Views

QUOTE POST

#980Lisan al Gaib@SCALING01

TPUs are insane

Gemini 3.5 Flash is running at ~867 tokens/s almost as fast as Kimi-K2.6 on Cerebras custom chips

Cerebras@cerebras

4:44 PM · May 19, 2026 · 685.1K Views

5:38 PM · May 19, 2026 · 12.5K Views

Sentiment

Pos73.9%

Neg26.1%

Many users praised Cerebras for running the trillion-parameter Kimi K2.6 at record 1000 tokens per second thanks to its impressive speed and parallelism, while others questioned the claims due to high costs and limited real-world usability.

67 comments with sentiment.

Digg Depth

Sheing Ngasked

Gemini 3.5 Flash vs Cerebras Kimi k2.6, which one is faster?

Cerebras Kimi K2.6 is faster. It hits 981 output tokens per second on 10,000 input tokens, beating Gemini 3.5 Flash's 867 tokens per second on TPUs.

Kevin Roseasked

How can someone run Cerebras Kimi k2.6? Can consumers run this?

No, consumers cannot run Cerebras' Kimi K2.6 deployment. It is available only through enterprise trials on Cerebras Inference. ^¹

Enterprises contact Cerebras directly for access to the ~1,000 tokens/s version. The open-weight Kimi K2.6 model itself runs via Moonshot API, Hugging Face providers like DeepInfra or Cloudflare, or locally on multi-H100 setups.

Cluster engagement

104 snapshots