9h agoMicrosoft details MAI-Base-1 pretraining, revealing it excluded all synthetic data and open-source datasetsThe pipeline filtered 1.5 trillion crawled pages for quality.SentimentSentimentPos100%Neg0%Users approve of Microsoft's training of MAI-Base-1 without LLM-generated or open source data because the approach makes total sense and offers positive expected value.2 comments with sentiment. View comments.