/Tech23h ago

Meituan Trains 1.6T MoE on 25K Ascend 910C Cluster in Weeks

25621212.9K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

that's not really how you treat your flagship, especially if you've already developed confidence in training stability. I guess they might just not have the data? But this is pretty scary. ofc handwavy math

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

if you think about it: 910С is roughly 90% of р800. So this is a cluster ≈10X larger than DeepSeek's at the time of training V3 (55 days), for maybe a 3x pretrain. Ok, mixed FP8 vs FP16, CANN… But I can't see it taking >5-6 weeks. Meituan chose to stop at 35T. They could to 50+

12:43 AM · Jul 3, 2026 · 2K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

Introducing LongCat-2.0

LONGCAT.CHATVia

Posts from X

Most Activity

VIEWS10KBOOKMARKS10LIKES41RETWEETS1REPLIES1

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Weights of LongCat are out on HF. The largest known pretraining on non-Western chips. if AmeriKKKan hawks are not just cope merchants, they'll see to it that this thing is lovingly evaluated, with a focus on what this says about Huawei stack (and not Meituan's data expertise).

Meituan LongCat@Meituan_LongCat

Introducing LongCat-2.0 🐱 1.6T parameters · MoE with ~48B active · 1M context The full model behind Owl Alpha on @OpenRouter — now available.

Built for agentic coding from the ground up: ◆ LongCat Sparse Attention (LSA) — scales efficiently for 1M-context tokens ◆ Zero-Compute Experts — dynamic activation 33B–56B per token, zero wasted compute ◆ MOPD — three specialized expert groups (Agent / Reasoning / Interaction), gate-routed per task

How it stacks up: → Terminal-Bench 2.1: 70.8 → SWE-bench Pro: 59.5 (GPT-5.5: 58.6) → SWE-bench Multilingual: 77.3 → FORTE: 73.2 · RWSearch: 78.8 · BrowseComp: 79.9

📖 Tech Blog: https://longcat.chat/blog/longcat-2.0/ Try it across different scenarios 🧵👇

3h10K4110

Gary Hadida@GaryHadida

@teortaxesTex Shoutout kunlunxin

23h60