Users criticized chess model training costs scaling cubically with compute as raw cubic pain without PR spin.
#3 isn't that true for anything at or below ~10gflops (lc0 model size) but I think true if you want to make a new big model.
the "2x more positions" point is subtle: in chess, like in LLMs, bigger models need less training to get to a certain quality level. but 2x bigger model means 2x fewer nodes you can inference, and for bigger models to be worth it you almost certainly need to train them to capacity, which means roughly 2x more positions.
training cost for chess models is cubic in flops: 2x bigger model means 2x more flops/position, 2x more positions necessary, and probably 2x more flops needed to generate training data
the last claim is that for small models, just getting in the ballpark of correct is fine, because you'll be looking at tons of positions and can calibrate that way. But for big models, you need extremely precise calibration. Again this is needed because otherwise you cannot make real use of the big models
#3 isn't that true for anything at or below ~10gflops (lc0 model size) but I think true if you want to make a new big model.
the "2x more positions" point is subtle: in chess, like in LLMs, bigger models need less training to get to a certain quality level. but 2x bigger model means 2x fewer nodes you can inference, and for bigger models to be worth it you almost certainly need to train them to capacity, which means roughly 2x more positions.

@cis_female training scaling laws but without the beautiful PR spin
just raw cubic pain