that's not really how you treat your flagship, especially if you've already developed confidence in training stability. I guess they might just not have the data? But this is pretty scary. ofc handwavy math
if you think about it: 910С is roughly 90% of р800. So this is a cluster ≈10X larger than DeepSeek's at the time of training V3 (55 days), for maybe a 3x pretrain. Ok, mixed FP8 vs FP16, CANN… But I can't see it taking >5-6 weeks. Meituan chose to stop at 35T. They could to 50+
