Gavin Leech and David Pfau discuss attention mechanisms and kernel smoothing on X
——0——
Gavin Leech posted an image on X contrasting characterizations of attention as kernel smoothing with outcomes from large-scale implementations. David Pfau joined the exchange, which referenced earlier kernel-based machine learning methods. The two researchers noted that prior mathematical connections between attention and statistical techniques have not yielded systems of comparable scale or performance, and they highlighted the gap between theoretical equivalence and the engineering required to apply attention in modern models.
REPLY
#150David Pfau@PFAU
@gleech @curiouswavefn If I had a dime for every time a statistician declared deep learning was just kernel learning in a trenchcoat...
@pfau @curiouswavefn he found it!
9:10 AM · May 18, 2026 · 212 Views
9:14 AM · May 18, 2026 · 155 Views
@pfau @curiouswavefn he found it!

9:10 AM · May 18, 2026 · 212 Views