5h ago

ETH Zürich's Tiberiu Mușat proves that fixed-precision neural network weight norm is equivalent to Kolmogorov complexity

The equivalence explains why deep networks generalize rather than memorize.

0
Original post

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. https://arxiv.org/abs/2605.10878

2:07 AM · May 27, 2026 View on X

As prophesied by the venerable I. Sutskever

Tiberiu MușatTiberiu Mușat@Tiberiu_Musat_

Why does deep learning generalize? What does weight decay really do? Can algorithmic information theory address these questions? In my latest preprint, I give a proof that the minimum neural weight norm matches the minimum program length (aka Kolmogorov Complexity), up to a logarithmic factor. In other words, the neural network with the smallest possible weight norm (that fits the data) must encode the shortest program (that fits the data). The result only holds for fixed-precision neural nets: infinite precision nets can store infinite information with finite (small) weights. https://arxiv.org/abs/2605.10878

9:07 AM · May 27, 2026 · 17.5K Views
1:38 PM · May 27, 2026 · 4K Views