/AI1d ago

MiniMax Contributes Parallel Models To Gradbot For Instant Voice Responses

3788654.1K
Original postMatt Turck#1497
Gradium@GradiumAI

Reasoning LLMs typically take 2-3 seconds to start emitting tokens. In a voice agent, that's 2-3 seconds of silence after the user finishes speaking.

The @MiniMax_AI team just shipped a community contribution to Gradbot with two models running in parallel. MiniMax-M2-her produces a short acknowledgement that starts streaming to TTS immediately, while MiniMax-M2.7 runs in the background reasoning and tool calls.

Thanks to @davidtaoweiji for this contribution. Checkout our readme for more details. https://github.com/gradium-ai/gradbot

6:31 AM · Jun 5, 2026 · 4.1K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS14
Compile And Push@compileandpush

@GradiumAI @MiniMax_AI Caching is the answer until it isn't and then the cache invalidation problem is the answer. Where did you end up on that tradeoff?

1dViews 14