/Tech2h ago

Meituan open-sources LongCat 2.0, a 1.6-trillion-parameter MoE model with 48 billion active parameters

Story Overview

Meituan has released the weights for LongCat 2.0 under an MIT license, giving the community access to a 1.6-trillion-parameter Mixture-of-Experts model that activates roughly 48 billion parameters per token along with 135 billion n-gram embedding parameters.

952072.1K

#501

Original post

elie@eliebakouch#1138inTech

longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license

interesting design choice compared to previous iteration is that they keep the same attention shape and REDUCE the number of experts from 256 to 128 (i would have expected the opposite?). they also have 135B embedding parameters with n-gram

https://huggingface.co/meituan-longcat/LongCat-2.0

1:29 AM · Jul 5, 2026 · 606 Views

Developer Impact

How the expert count and attention changes affect use

The model trims the expert pool from 256 down to 128 while preserving the prior attention shape and adding LongCat Sparse Attention variants, so practitioners can test whether these tweaks improve long-context throughput on their own hardware.

Open Question

What remains unknown about real-world performance

Benchmarks are reported mainly from internal runs on agentic and coding tasks, leaving open how the model compares under independent evaluation or outside the Chinese training stack that used only domestic ASICs.

Sentiment

Users thank and celebrate Meituan for releasing Longcat 2.0 model weights under an MIT license.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

HUGGINGFACEVia

#1138

Posts from X

Most Activity

VIEWS1.5KBOOKMARKS9LIKES32REPLIES5

elie@eliebakouch

longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license

interesting design choice compared to previous iteration is that they keep the same attention shape and REDUCE the number of zero communication experts from 256 to 128 (i would have expected the opposite?). they also have 135B embedding parameters with n-gram

https://huggingface.co/meituan-longcat/LongCat-2.0

1h1.5K329

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@eliebakouch maybe it's a complex tradeoff with N-grams or maybe zero-comm experts are not working that well

elie@eliebakouch

longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license

https://huggingface.co/meituan-longcat/LongCat-2.0

1h29130

Florian Brand@xeophon

@eliebakouch maybe this time it will stay up

elie@eliebakouch

longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license

https://huggingface.co/meituan-longcat/LongCat-2.0

1h7420

elie@eliebakouch

uh meant the number of zero communication expert (just modified the post thanks), but in general scaling the sparsity is interesting since you get more total param with the same active count, which is better in pre training and inference assuming you have a good infra lol (similar to why people do MoE over dense model)

1h81

Noé Flandre@NoeFlandre

@eliebakouch Hey Elie! Could you please explain why scaling the number of experts would have been interesting here? What’s the impact of making ur model sparser?

1h16

Noé Flandre@NoeFlandre

@eliebakouch Got it, thanks a lot!

1h4

elie@eliebakouch

@xeophon the weight weren't on the previous repo iirc

1h151

elie@eliebakouch

@teortaxesTex i don't think it's related to ngram, but i agree with the second point, my guess is that larger scale lead to more instability and this zero expert thing is sensible to that

1h34

filipe@filicroval

@eliebakouch finally!

19m24

Treff@0xTreff

@eliebakouch 36B active already seems wild for a draft

curious how reducing the zero comm experts affects the math for inference though

58m15

elie@eliebakouch

@NoeFlandre tassuming those total param that you add lead to more intelligence)

1h41

Vlad@TheVladSavinov

@eliebakouch "num_layers": 38 👀

41m4

Strata@ChainZenit

@eliebakouch wild that they cut the experts, very curious why they did that

1h4

Strata@ChainZenit

@eliebakouch that shift in the expert count is honestly such a trip.

1h1