Digg
Community AvatarCommunity AvatarCommunity Avatar
Top
Community AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarCommunity AvatarNavigate to explore communities page
Signup / Login
Placement avatar
@Placement
12h

Vision Transformers with Self-Distilled Registers, NeurIPS 2025

[2505.21501] Vision Transformers with Self-Distilled Registers - Featured Image
arxiv.org
26Score: 26
0
Placement avatar
@Placement
12h

How do you usually figure out why a multi-GPU training run is slower than expected?

I have been bitten by this a few times recently and realized everyone seems to have a slightly different workflow. Thinking about the last time a multi-GPU (DDP / FSDP) training run was noticeably slower than you expected: What did you suspect first? How did you narrow it down

24Score: 24
0
Digg GuyDigg Guy

You’ve reached the end of the feed.

Roll credits.

Machine Learning cover image

Machine Learning

/machinelearning101

Community Avatar

Machine Learning

9Members

2Posts

Jan 2026Created

About

Nothing about our community yet.

How bout that

Community Guidelines

No guidelines set.

Please don't though

Founded by

class avatar
@class

Terms of ServicePrivacy Policy
AboutSwag© 2026 Digg, Inc.