Nuno Sempere's analysis projects training a 10-trillion-parameter model requires a mean of 41,300 Nvidia B200 GPUs · Digg