6h ago

Emad Mostaque says a single researcher can now manage several thousand GPUs for AI training with modern automation

Historically, one researcher managed 1,000 A100 GPUs.

238101413.6K

——0——

Original post

how many gpus do you think a single researcher can handle at once for a single big training job without any help? (assume it's set up with slurm or something)

12:34 AM · May 24, 2026

#167Emad@EMOSTAQUE

@DavidSHolz A few years ago we had 1000+ GPUs (so 128 nodes) fine on a100s by a single researcher

With h and b chips and new failover you should be able to get to thousands comfortable

On TPUs even more i would think

David@DavidSHolz

how many gpus do you think a single researcher can handle at once for a single big training job without any help? (assume it's set up with slurm or something)

7:34 AM · May 24, 2026 · 13.5K Views

1:21 PM · May 24, 2026 · 785 Views

#1709Lucas Nestler@CLASHLUKE

@EMostaque @DavidSHolz I once used ~90% of the cluster to tokenize videos, before getting kicked for hogging preemptible compute

Emad@EMostaque

@DavidSHolz A few years ago we had 1000+ GPUs (so 128 nodes) fine on a100s by a single researcher With h and b chips and new failover you should be able to get to thousands comfortable On TPUs even more i would think

1:21 PM · May 24, 2026 · 785 Views

2:00 PM · May 24, 2026 · 23 Views