ML Engineer Suggests Training Networks On GPUs And TPUs Simultaneously
โโ0โโ
it's plausible that this can be solved after pretrain, but exploring the sub-level sets can be pretty hard (they are disconnected, and id expect numerics to be one of the biggest diffs across basins)
one simple, but insane, version of this: train on both GPUs and TPUs, sum the gradients
7:50 PM ยท May 22, 2026 ยท 238 Views
7:50 PM ยท May 22, 2026 ยท 240 Views