3d ago

Rohan Anil queries deep learning architecture modifications as hacks

634031117670.2K

——0——

Rohan Anil posted an open question asking what deep learning architecture modifications practitioners consider hacks. Jerry Tworek replied by identifying layer normalization as one such modification. The CoreAutoAI account quoted the post and reframed the query around architecture modifications not viewed as hacks.

Original post

rohan anil#83@_AROHAN_

What are various deep learning architecture modifications you all consider hacks?

5:43 PM · May 13, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

Reposted by

#83@_AROHAN_

ORIGINAL POST

#83rohan anil@_AROHAN_

What are various deep learning architecture modifications you all consider hacks?

12:43 AM · May 14, 2026 · 41K Views

#83rohan anil@_AROHAN_

@willdepue Dropout can be looked at as an extra gradient step. Its a more interesting way to look at it

will depue@willdepue

@_arohan_ dropout, and honestly most forms of regularization

2:24 AM · May 14, 2026 · 7.2K Views

2:28 AM · May 14, 2026 · 3.1K Views

#126Jerry Tworek@MILLIONINT

@_arohan_ Layer norm 100%

rohan anil@_arohan_

What are various deep learning architecture modifications you all consider hacks?

12:43 AM · May 14, 2026 · 41K Views

1:14 AM · May 14, 2026 · 3.2K Views

#196Rishabh Agarwal@AGARWL_

@_arohan_ Clipping, any kind of clipping (ok maybe ReLU is fine)

rohan anil@_arohan_

What are various deep learning architecture modifications you all consider hacks?

12:43 AM · May 14, 2026 · 41K Views

1:21 AM · May 15, 2026 · 826 Views

#254will depue@WILLDEPUE

@_arohan_ dropout, and honestly most forms of regularization

rohan anil@_arohan_

What are various deep learning architecture modifications you all consider hacks?

12:43 AM · May 14, 2026 · 41K Views

2:24 AM · May 14, 2026 · 7.2K Views

#290Leo Gao@NABLA_THETA

@_arohan_ the part where we use anything other than the hypercomputer that runs every possible program, eliminating the ones that don't match the data, and producing an output distribution by taking a weighted average of the program outputs weighted by 2^-length

rohan anil@_arohan_

What are various deep learning architecture modifications you all consider hacks?

12:43 AM · May 14, 2026 · 41K Views

5:31 AM · May 14, 2026 · 688 Views