11h ago

Prime Intellect researcher argues memorization preconditions model learning

0

Kalomaze at Prime Intellect stated that memorization functions as a precondition for how much language models can learn. Without first internalizing the gist of research papers or similar material, he said, models treat shorthand references appearing elsewhere in training data as non sequiturs during next-token prediction. Cody Blakeney replied that many resist this view because it conflicts with preferences for pure-reasoning approaches using super small models and that memorization and reasoning appear intertwined in more complex ways than is often accepted.

Original post

i think people underestimate the value of memorization as a precondition for how much you can learn for example: if a model hasn't already memorized the gist of some research paper, any shorthand reference to it somewhere else in the data is nonsequitur-ish when doing NTP over it

5:18 PM · May 17, 2026 View on X

@code_star @kalomaze if it were true, we would have had neuro-symbolic models already. memorization is necessary

Cody BlakeneyCody Blakeney@code_star

@kalomaze I lot of people don’t want to believe this is true, because they want a world of pure reasoning super small models. It does seem that memorization and reasoning are intertwined in more complex ways than people want to imagine though.

1:56 AM · May 18, 2026 · 281 Views
3:50 AM · May 18, 2026 · 47 Views

i think people underestimate the value of memorization as a precondition for how much you can learn for example: if a model hasn't already memorized the gist of some research paper, any shorthand reference to it somewhere else in the data is nonsequitur-ish when doing NTP over it

12:18 AM · May 18, 2026 · 8.8K Views

memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it

kalomazekalomaze@kalomaze

i think people underestimate the value of memorization as a precondition for how much you can learn for example: if a model hasn't already memorized the gist of some research paper, any shorthand reference to it somewhere else in the data is nonsequitur-ish when doing NTP over it

12:18 AM · May 18, 2026 · 8.8K Views
12:28 AM · May 18, 2026 · 1.4K Views

it's true that some data is only useful for verbatim-recall parlor trick memorization. but i think most of what lms memorize is proportional to its broader utility. i.e a work of fiction gets remembered proportional to how much remembering it helps predict how people reference it

kalomazekalomaze@kalomaze

memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it

12:28 AM · May 18, 2026 · 1.4K Views
12:37 AM · May 18, 2026 · 991 Views

@kalomaze I lot of people don’t want to believe this is true, because they want a world of pure reasoning super small models. It does seem that memorization and reasoning are intertwined in more complex ways than people want to imagine though.

kalomazekalomaze@kalomaze

memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it

12:28 AM · May 18, 2026 · 1.4K Views
1:56 AM · May 18, 2026 · 281 Views