6h agoCartwheel co-founder Andrew Carr says optimizing Microsoft GB200 clusters for dropless MoE yielded up to 1.69x efficiency gainsInitial Model Flops Utilization ranged from 16% to 22%.SentimentSentimentPos0%Neg100%Users criticized Microsoft for running GB200 GPUs at just 20% MFU, sarcastically calling out the high water consumption paired with such low utilization.1 comment with sentiment. View comments.