Every loss spike is your model telling you something. Listen carefully
CoreAutoAI co-founder Rohan Anil argues every training loss spike indicates underlying issues and must be investigated
Academic Ravid Shwartz Ziv suggests selectively ignoring some alerts.
Positive users agree loss spikes during model training merit attention for potential insights, while negative users caution that spikes often stem from corrupted samples and waste debugging time.
Most Activity

@0x_lun There are no bad batches. Its just missing stoicism in the model

@_arohan_ True. https://blog.christianperone.com/2019/08/listening-to-the-neural-network-gradient-norms-during-training/
Cc @tarantulae
What if models are like kids, and you don't want to listen to them all the time 🤔
Every loss spike is your model telling you something. Listen carefully

@_arohan_ most of the time its just a bad batch and everyone panics anyway

@0x_lun One cannot build an antifragile company without antifragile models

@_arohan_ telling me I forgot max_grad_norm: 1

@_arohan_ Isn't every loss drop too?

@_arohan_ That I forgot to clip my gradients? 😛

@_arohan_ stoicism is a fun frame until the spike is a corrupted sample and you wasted three hours debugging your learning rate

@_arohan_ also: every flat plateau is your model agreeing with your overfitting

@_arohan_ sometimes the model is just screaming noise into the void tho
how do u tell the difference?