5h agoLeaked architecture details show Microsoft trained multiple 600B-parameter MoE models before finalizing its 1T-parameter MAI-Base-1MAI-Base-1 utilizes 35B active parameters across 512 expertsSentimentSentimentPos0%Neg100%Some users sarcastically dismissed Microsoft training three DeepSeek-V3 sized models as having copied the wrong scale to emulate.1 comment with sentiment. View comments.