Leaked architecture details show Microsoft trained multiple 600B-parameter MoE models before finalizing its 1T-parameter MAI-Base-1 · Digg