ETH Zürich's Tiberiu Mușat proves that fixed-precision neural network weight norm is equivalent to Kolmogorov complexity
The equivalence explains why deep networks generalize rather than memorize.
As prophesied by the venerable I. Sutskever
Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. https://arxiv.org/abs/2605.10878