Xiaomi and TileRT_AI launch MiMo-V2.5-Pro-UltraSpeed, hitting 1,000 tokens per second on a 1-trillion-parameter MoE using eight standard GPUs
The system uses FP4 quantization and DFlash speculative decoding.
Users view Xiaomi hitting 1000 tokens per second on a 1T model with standard GPUs as nice to see because it shows useful performance progress.
Most Activity
@zephyr_z9 @fi56622380 Not the first useful one but certainly nice to see!
This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380
This is super big I think this is the first useful speculative decoding method deployed on a big quasi frontier model Massive unlock @fi56622380

@zephyr_z9 @fi56622380 What do u mean? MTP has been in use for like 2 years

@zephyr_z9 @fi56622380 My Internal Plan is as follows📈
⬇️Details as follows

@zephyr_z9 @fi56622380 You don't need to own chip stocks to profit from AI. Chips are a 'picks and shovels' play — high certainty, but declining elasticity as competition intensifies. If AI truly transforms every industry, the biggest returns could come from healthcare AI, legal AI, education AI —.

@zephyr_z9 @fi56622380 1000 tok/s on a 1T model is actually insane if real
wonder what the quality tradeoff looks like though

@zephyr_z9 @fi56622380 My strategy plan.
🔻↩️↩️

@zephyr_z9 @fi56622380 Maybe this is why Jensen called Groq's market a niche

@zephyr_z9 @fi56622380 My strategy plan. .
🔻↩️↩️

@zephyr_z9 @fi56622380 wtf

the integrated MTP draft is the interesting bet here. separate draft models usually cap out around 80-90% acceptance; sharing weights with the target should push that higher, but the MTP heads need to actually learn the distribution, not just mimic it. would love to see the acceptance rate breakdown per head

@sun_hanchi @fi56622380 On this size?? I think the biggest model with MTP was StepFun Flash

@zephyr_z9 @fi56622380 ultra-extreme-mega-codesign

@zephyr_z9 @fi56622380 I will share my detailed trading plan (including entry and exit points, investment analysis, etc.) on WA. This might be helpful to you. Get it for free!
👉Copy and reply with "TRADING PLAN" to my WA to get it for free👉+17869786054
My WA link:http://wa.me/17869786054/?text=TRADING

@lunan_ai @zephyr_z9 @fi56622380 1T model is not a useful metric. What matters is quality across diverse benchmarks. That is unless this method can be utilize across any 1T model.

@zephyr_z9 @fi56622380 I share my real-time TRADE alert (entry & exit points) on WhatsApp, free to join ✅!!! 🔽 👉 🔗: https://api.whatsapp.com/send/?phone=12242760576&text=Strategy
➡️Copy search input Reply "777" to WhatsApp: + 12242760576

@zephyr_z9 @fi56622380 I share my real-time TRADE alert (entry & exit points) on WhatsApp, free to join ✅ 🔽 👉 🔗: https://api.whatsapp.com/send/?phone=12242760576&text=Strategy
➡️Copy search input Reply "555" to WhatsApp: + 12242760576
🎥 - DAILY LIVE TRADING 📖 - TRADE RECAPS ☢️ - PERSONAL STRATEGY