longcat 2.0 (1.6T, ~48B active) weights are now open under MIT license
interesting design choice compared to previous iteration is that they keep the same attention shape and REDUCE the number of experts from 256 to 128 (i would have expected the opposite?). they also have 135B embedding parameters with n-gram
https://huggingface.co/meituan-longcat/LongCat-2.0





