Prime Intellect researcher argues memorization preconditions model learning
Kalomaze at Prime Intellect stated that memorization functions as a precondition for how much language models can learn. Without first internalizing the gist of research papers or similar material, he said, models treat shorthand references appearing elsewhere in training data as non sequiturs during next-token prediction. Cody Blakeney replied that many resist this view because it conflicts with preferences for pure-reasoning approaches using super small models and that memorization and reasoning appear intertwined in more complex ways than is often accepted.
@code_star @kalomaze if it were true, we would have had neuro-symbolic models already. memorization is necessary
@kalomaze I lot of people don’t want to believe this is true, because they want a world of pure reasoning super small models. It does seem that memorization and reasoning are intertwined in more complex ways than people want to imagine though.
i think people underestimate the value of memorization as a precondition for how much you can learn for example: if a model hasn't already memorized the gist of some research paper, any shorthand reference to it somewhere else in the data is nonsequitur-ish when doing NTP over it
memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it
i think people underestimate the value of memorization as a precondition for how much you can learn for example: if a model hasn't already memorized the gist of some research paper, any shorthand reference to it somewhere else in the data is nonsequitur-ish when doing NTP over it
it's true that some data is only useful for verbatim-recall parlor trick memorization. but i think most of what lms memorize is proportional to its broader utility. i.e a work of fiction gets remembered proportional to how much remembering it helps predict how people reference it
memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it
@kalomaze I lot of people don’t want to believe this is true, because they want a world of pure reasoning super small models. It does seem that memorization and reasoning are intertwined in more complex ways than people want to imagine though.
memorization of things that exist stably in the world of composed abstractions is, in some fundamental way, actually necessary in order to be able to apply reasoning as a device to those things ergo, to reason about our world, you have to be grounded in what exists in it