/Tech3h ago

Ailon-Chazelle Method Speeds Random Projections With PHD Factorization

415251.5K
Original post
Alex Smola@smolix#303inTech

Random projections compress 65,536 dims down to 1,024 — at the price of 67 million multiplications per vector. A lot of work spent in the name of saving work. Ailon & Chazelle's 2009 fix is, literally, a PHD: Φ = P·H·D 🧵

7:21 PM · Jun 10, 2026 · 960 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS255LIKES1REPLIES1
Alex Smola@smolix

The obvious fix is to sparsify the projection matrix. But sparse projections fail exactly on sparse vectors: put all the mass on one coordinate and the variance explodes. At 1% fill you need 150x as many projections — the savings are gone.

Ailon & Chazelle call this the Heisenberg principle of the Fourier transform: a signal and its spectrum cannot both be concentrated. Which points straight at the fix — if spiky inputs are the only enemy, rotate the world so nothing is spiky.

Alex Smola@smolix

Random projections compress 65,536 dims down to 1,024 — at the price of 67 million multiplications per vector. A lot of work spent in the name of saving work. Ailon & Chazelle's 2009 fix is, literally, a PHD: Φ = P·H·D 🧵

3hViews 255Likes 1Bookmarks 0
Alex Smola@smolix

Postscript: in 2023 Fandina, Høgsgaard & Larsen proved the FJLT is even faster than its inventors knew — 16 years later, without changing a line of the algorithm. It's also the rotation inside Fastfood and TurboQuant. Stay tuned for one-bit estimators + deep learning.

https://alex.smola.org/posts/43-fjlt/

Alex Smola@smolix

Φ = P·H·D, applied right to left: D flips n random signs, H is a Walsh-Hadamard transform (n log n additions, zero multiplications), P is a sparse Gaussian. HD is orthogonal, so norms and inner products survive untouched — but now nothing is spiky.

One cosh inequality shows every coordinate of HDu behaves like a Gaussian of size 1/√n — the flattest a unit vector can possibly be. Union bound over a billion vectors at once. The bill: ~1M additions + 0.5M multiplications instead of 67M.

3hViews 132Likes 1Bookmarks 0
Alex Smola@smolix

Φ = P·H·D, applied right to left: D flips n random signs, H is a Walsh-Hadamard transform (n log n additions, zero multiplications), P is a sparse Gaussian. HD is orthogonal, so norms and inner products survive untouched — but now nothing is spiky.

One cosh inequality shows every coordinate of HDu behaves like a Gaussian of size 1/√n — the flattest a unit vector can possibly be. Union bound over a billion vectors at once. The bill: ~1M additions + 0.5M multiplications instead of 67M.

Alex Smola@smolix

The obvious fix is to sparsify the projection matrix. But sparse projections fail exactly on sparse vectors: put all the mass on one coordinate and the variance explodes. At 1% fill you need 150x as many projections — the savings are gone.

Ailon & Chazelle call this the Heisenberg principle of the Fourier transform: a signal and its spectrum cannot both be concentrated. Which points straight at the fix — if spiky inputs are the only enemy, rotate the world so nothing is spiky.

3hViews 180Likes 0Bookmarks 0