Microsoft details MAI-Base-1 pretraining, revealing it excluded all synthetic data and open-source datasets · Digg