DeepSeek is underrated still There are very few labs that have even tried pretraining anything substantially larger than V4. Their struggles with getting it to work, on top of all the inane architecture tricks, make sense. In their situation, OpenAI would still be doing 671B.
Most Activity
Kimi: 149% of a derisked DeepSeek architecture Zhipu: 110% of a derisked DeepSeek architecture Minimax: 63% of a simpler architecture than DS-V3.2 DeepSeek: 238% of an insane alien murder clown architecture This takes… conviction
DeepSeek is underrated still There are very few labs that have even tried pretraining anything substantially larger than V4. Their struggles with getting it to work, on top of all the inane architecture tricks, make sense. In their situation, OpenAI would still be doing 671B.
I don't know about Qwen-Max, it shows Alibaba moves on a very aggressive schedule now, they get credit for doing their own thing and succeeding
Kimi: 149% of a derisked DeepSeek architecture Zhipu: 110% of a derisked DeepSeek architecture Minimax: 63% of a simpler architecture than DS-V3.2 DeepSeek: 238% of an insane alien murder clown architecture This takes… conviction