/Tech4h ago

Inception co-founder Aditya Grover launches Mercury 2 diffusion LLM on Baseten with inference speeds over 1,000 tokens per second

Early customer Augment Code cut operational costs by 90%.

659694.2K

#136

Original post

Aditya Grover@adityagrover_#452inTech

Today we're bringing Mercury 2 to @Baseten.

Mercury 2 delivers over 1,000 tokens per second for customers on @NVIDIA GPUs with the reliability and scale enterprise teams need.

Read more to see how @augmentcode is using Mercury 2 in production reducing costs by 90% and latency by 82%. More customer stories across coding agents, real-time voice, and enterprise search dropping soon.

Baseten@baseten

http://x.com/i/article/2065085903345754113

9:16 AM · Jun 11, 2026 · 1.2K Views

/Tech4h ago

Inception co-founder Aditya Grover launches Mercury 2 diffusion LLM on Baseten with inference speeds over 1,000 tokens per second

Early customer Augment Code cut operational costs by 90%.

659694.2K

#136

Original post

Aditya Grover@adityagrover_#452inTech

Today we're bringing Mercury 2 to @Baseten.

Mercury 2 delivers over 1,000 tokens per second for customers on @NVIDIA GPUs with the reliability and scale enterprise teams need.

Baseten@baseten

http://x.com/i/article/2065085903345754113

9:16 AM · Jun 11, 2026 · 1.2K Views

Sentiment

Users praised Baseten’s Mercury 2 for its 1,000 tokens-per-second speed and 90% cost reduction on NVIDIA GPUs, while some argued those efficiency gains were the real story rather than the headline performance claims.

Pos

83.3%

Neg

16.7%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.9KBOOKMARKS3LIKES31RETWEETS5REPLIES1

Stefano Ermon@StefanoErmon

Excited to see Mercury 2 live on @baseten

Mercury 2 delivers Groq/Cerebras-like speeds (>1000 tokens/sec) with quality comparable to speed-optimized models like Claude Haiku

If you have latency-sensitive workloads we’d love to hear from you.

Baseten@baseten

We are excited to announce that we have partnered with @_inception_ai to make Mercury 2 available on Baseten. This makes us the first inference platform to bring Inception’s diffusion LLM to production.

Inception’s dLLM architecture fixes the bottlenecks of sequential token generation and can deliver 1,000+ tokens/sec on standard NVIDIA GPUs. Early users like @augmentcode have seen impressive results, such as an 82% reduction in latency and 90% cost savings, while maintaining high quality.

4h1.9K313

Volodymyr Kuleshov 🇺🇦@volokuleshov

Today Mercury 2, the first reasoning diffusion LLM, is live on Baseten. The result: over 1,000 tokens per second on standard NVIDIA GPUs, at comparable quality to speed-optimized models. @AugmentCode is already using it in production, cutting cost 90% and latency 82%.

Baseten@baseten