Rohan Anil queries deep learning architecture modifications as hacks
Rohan Anil posted an open question asking what deep learning architecture modifications practitioners consider hacks. Jerry Tworek replied by identifying layer normalization as one such modification. The CoreAutoAI account quoted the post and reframed the query around architecture modifications not viewed as hacks.
What are various deep learning architecture modifications you all consider hacks?
@willdepue Dropout can be looked at as an extra gradient step. Its a more interesting way to look at it
@_arohan_ dropout, and honestly most forms of regularization
@_arohan_ Layer norm 100%
What are various deep learning architecture modifications you all consider hacks?
@_arohan_ Clipping, any kind of clipping (ok maybe ReLU is fine)
What are various deep learning architecture modifications you all consider hacks?
@_arohan_ dropout, and honestly most forms of regularization
What are various deep learning architecture modifications you all consider hacks?
@_arohan_ the part where we use anything other than the hypercomputer that runs every possible program, eliminating the ones that don't match the data, and producing an output distribution by taking a weighted average of the program outputs weighted by 2^-length
What are various deep learning architecture modifications you all consider hacks?