/Tech4h ago

CoreAutoAI co-founder Rohan Anil argues every training loss spike indicates underlying issues and must be investigated

Academic Ravid Shwartz Ziv suggests selectively ignoring some alerts.

9100385.1K

#102

Original post

rohan anil@_arohan_#102inTech

Every loss spike is your model telling you something. Listen carefully

10:12 AM · Jun 13, 2026 · 5K Views

Sentiment

Positive users agree loss spikes during model training merit attention for potential insights, while negative users caution that spikes often stem from corrupted samples and waste debugging time.

Pos

50.0%

Neg

50.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS263LIKES6REPLIES2

rohan anil@_arohan_

@0x_lun There are no bad batches. Its just missing stoicism in the model

4h2636

BOOKMARKS1

Vipul Sharma@VipulS_1

@_arohan_ True. https://blog.christianperone.com/2019/08/listening-to-the-neural-network-gradient-norms-during-training/

Cc @tarantulae

2h6521

Ravid Shwartz Ziv@ziv_ravid

What if models are like kids, and you don't want to listen to them all the time 🤔

rohan anil@_arohan_

Every loss spike is your model telling you something. Listen carefully

1h24910

Lunari@0x_lun

@_arohan_ most of the time its just a bad batch and everyone panics anyway

4h2201

rohan anil@_arohan_

@0x_lun One cannot build an antifragile company without antifragile models

4h1653

ShitCockaSays@batcz

@_arohan_ telling me I forgot max_grad_norm: 1

3h111

radah@the_radah

@_arohan_ Isn't every loss drop too?

3h108

pushinproto@pushinproto

@_arohan_ That I forgot to clip my gradients? 😛

3h107

Lunari@0x_lun

@_arohan_ stoicism is a fun frame until the spike is a corrupted sample and you wasted three hours debugging your learning rate

4h15

betraidx@betraidx

@_arohan_ also: every flat plateau is your model agreeing with your overfitting

3h2

Alex YGift@Radipdegen

@_arohan_ sometimes the model is just screaming noise into the void tho

how do u tell the difference?

4h2