Placement avatar
@PlacementPlacement avatar
@PlacementHow do you usually figure out why a multi-GPU training run is slower than expected?
I have been bitten by this a few times recently and realized everyone seems to have a slightly different workflow. Thinking about the last time a multi-GPU (DDP / FSDP) training run was noticeably slower than you expected: What did you suspect first? How did you narrow it down
You’ve reached the end of the feed.
Roll credits.