/AI4h ago

Xiaomi and TileRT_AI launch MiMo-V2.5-Pro-UltraSpeed, hitting 1,000 tokens per second on a 1-trillion-parameter MoE using eight standard GPUs

The system uses FP4 quantization and DFlash speculative decoding.

112461411537.4K
Original post
Zephyr@zephyr_z9#1471inAI

This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380

9:12 AM · Jun 8, 2026 · 35.5K Views
Sentiment

Users view Xiaomi hitting 1000 tokens per second on a 1T model with standard GPUs as nice to see because it shows useful performance progress.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.1KBOOKMARKS6LIKES17
Gavin Baker@GavinSBaker

@zephyr_z9 @fi56622380 Not the first useful one but certainly nice to see!

Zephyr@zephyr_z9

This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380

2hViews 3.1KLikes 17Bookmarks 6
RETWEETS14
Zephyr@zephyr_z9

This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380

4hViews 35.5KLikes 236Bookmarks 114
REPLIES1
Hanchi Sun@sun_hanchi

@zephyr_z9 @fi56622380 What do u mean? MTP has been in use for like 2 years

2hViews 42
Zephyr /Assistant@badramsbando

@zephyr_z9 @fi56622380 My Internal Plan is as follows📈

⬇️Details as follows

4hViews 286Likes 2
让长风使尽@Rangfeng1117

@zephyr_z9 @fi56622380 You don't need to own chip stocks to profit from AI. Chips are a 'picks and shovels' play — high certainty, but declining elasticity as competition intensifies. If AI truly transforms every industry, the biggest returns could come from healthcare AI, legal AI, education AI —.

3hViews 204Bookmarks 1
Mayz@lunan_ai

@zephyr_z9 @fi56622380 1000 tok/s on a 1T model is actually insane if real

wonder what the quality tradeoff looks like though

3hViews 216
Zephry@raul86rodriguez

@zephyr_z9 @fi56622380 My strategy plan.

🔻↩️↩️

3hViews 123
Dima Liashko⚡@Flyingfishtrump

@zephyr_z9 @fi56622380 Maybe this is why Jensen called Groq's market a niche

3hViews 268Likes 2
Zephry@babaievaa311

@zephyr_z9 @fi56622380 My strategy plan. .

🔻↩️↩️

3hViews 2
Epicarism@epicarism

@zephyr_z9 @fi56622380 wtf

3hViews 192
Guilherme O'Tina@guilhermeotina

the integrated MTP draft is the interesting bet here. separate draft models usually cap out around 80-90% acceptance; sharing weights with the target should push that higher, but the MTP heads need to actually learn the distribution, not just mimic it. would love to see the acceptance rate breakdown per head

3hViews 112
Zephyr@zephyr_z9

@sun_hanchi @fi56622380 On this size?? I think the biggest model with MTP was StepFun Flash

2hViews 29Likes 1
Zephyr /Assistant@badramsbando

@zephyr_z9 @fi56622380 I will share my detailed trading plan (including entry and exit points, investment analysis, etc.) on WA. This might be helpful to you. Get it for free!

👉Copy and reply with "TRADING PLAN" to my WA to get it for free👉+17869786054

My WA link:http://wa.me/17869786054/?text=TRADING

4hViews 7Likes 1
Levi@lev_ey

@lunan_ai @zephyr_z9 @fi56622380 1T model is not a useful metric. What matters is quality across diverse benchmarks. That is unless this method can be utilize across any 1T model.

2hViews 5
Zephry@raul86rodriguez

@zephyr_z9 @fi56622380 I share my real-time TRADE alert (entry & exit points) on WhatsApp, free to join ✅!!! 🔽 👉 🔗: https://api.whatsapp.com/send/?phone=12242760576&text=Strategy

➡️Copy search input Reply "777" to WhatsApp: + 12242760576

3hViews 2
Zephry@babaievaa311

@zephyr_z9 @fi56622380 I share my real-time TRADE alert (entry & exit points) on WhatsApp, free to join ✅ 🔽 👉 🔗: https://api.whatsapp.com/send/?phone=12242760576&text=Strategy

➡️Copy search input Reply "555" to WhatsApp: + 12242760576

🎥 - DAILY LIVE TRADING 📖 - TRADE RECAPS ☢️ - PERSONAL STRATEGY

3hViews 1