5h agoMAI-Base-1 Training Uses AdamW With Custom Betas, Dropout, And FP8 ComputeSentimentSentimentPos100%Neg0%Users appreciate the transparency on loss spikes during MAI-Base-1 training, which the report attributes to high expert imbalance in coding datasets.1 comment with sentiment. View comments.