5h ago

Research Shows 2-Layer Models Memorize Graphs With Minimal Eigendirections

0
Original post

Also added clearer plots showing something unexplained by any existing theory about word2vec/node2vec models: a wide 2-layer model under CE loss somehow uses only a minimal set of eigendirections to memorize a graph even w/o regularization! (e.g., 2 directions for cycle graph)

9:21 AM · May 26, 2026 View on X

We also studied how different hyperparameters (lr, wd, initialization) in training affects whether the model memorizes geometrically or associatively. It seems like there are various knobs that make the model memorize. Hope this helps as a starting point for some theory!

Vaishnavh NagarajanVaishnavh Nagarajan@_vaishnavh

Also added clearer plots showing something unexplained by any existing theory about word2vec/node2vec models: a wide 2-layer model under CE loss somehow uses only a minimal set of eigendirections to memorize a graph even w/o regularization! (e.g., 2 directions for cycle graph)

4:21 PM · May 26, 2026 · 94 Views
4:21 PM · May 26, 2026 · 77 Views

We also tried our best to make the paper as short and load-able as we could...

tagging co-authors @ShNoroozi (who led the work) and @ElanRosenfeld

Meet us at ICML if you want to chat!

arxiv.org
Deep sequence models tend to memorize geometrically; it is unclear why
Deep sequence models are said to store atomic facts predominantly in the form of associative memory: a brute-force lookup of co-occurring entities. We identify a dramatically different form of...
Vaishnavh NagarajanVaishnavh Nagarajan@_vaishnavh

We also studied how different hyperparameters (lr, wd, initialization) in training affects whether the model memorizes geometrically or associatively. It seems like there are various knobs that make the model memorize. Hope this helps as a starting point for some theory!

4:21 PM · May 26, 2026 · 77 Views
4:21 PM · May 26, 2026 · 101 Views
Research Shows 2-Layer Models Memorize Graphs With Minimal Eigendirections · Digg