@scaling01 @AndrewCurran_ this is literally what you need to do if you want to « scale », you can’t yolo a 10T model aha
@AndrewCurran_ i mean he said that, but is it true?
why would he train: 2 x 1T models 2 x 1.5T models a 6T model and a 10T model
all at the same time?
I don't think we are getting the 10T that soon
