How do you usually figure out why a multi-GPU training run is slower than expected?
I have been bitten by this a few times recently and realized everyone seems to have a slightly different workflow. Thinking about the last time a multi-GPU (DDP / FSDP) training run was noticeably slower than you expected: What did you suspect first? How did you narrow it down
What separates data scientists who earn a good living (100k-200k) from those who earn 300k+ at FAANG?
Is it just stock options and vesting? Or is it just FAANG is a lot of work. Why do some data scientists deserve that much? I work at a Fortune 500 and the ceiling for IC data scientists is around $200k unless you go into management of course. But how and why do people make 500k a
You’ve reached the end of the feed.
Roll credits.