Researcher Speculates Brain Regularizers Could Yield Major AI Efficiency Gains
i have a hunch that the human brain has a bunch of ridiculous regularizers and priors about priors, which if integrated would be a substantial compute efficiency win and potentially more
ex: i think circuit sparse transformers (L0 penalty) just might be more data efficient
imagine i give you two models with the same held out test perplexity, but one has a 3x more nonzero weights as the other. which would you choose? the shorter MDL model right?
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. https://arxiv.org/abs/2605.10878