Random projections compress 65,536 dims down to 1,024 — at the price of 67 million multiplications per vector. A lot of work spent in the name of saving work. Ailon & Chazelle's 2009 fix is, literally, a PHD: Φ = P·H·D 🧵
Most Activity
The obvious fix is to sparsify the projection matrix. But sparse projections fail exactly on sparse vectors: put all the mass on one coordinate and the variance explodes. At 1% fill you need 150x as many projections — the savings are gone.
Ailon & Chazelle call this the Heisenberg principle of the Fourier transform: a signal and its spectrum cannot both be concentrated. Which points straight at the fix — if spiky inputs are the only enemy, rotate the world so nothing is spiky.
Random projections compress 65,536 dims down to 1,024 — at the price of 67 million multiplications per vector. A lot of work spent in the name of saving work. Ailon & Chazelle's 2009 fix is, literally, a PHD: Φ = P·H·D 🧵
Postscript: in 2023 Fandina, Høgsgaard & Larsen proved the FJLT is even faster than its inventors knew — 16 years later, without changing a line of the algorithm. It's also the rotation inside Fastfood and TurboQuant. Stay tuned for one-bit estimators + deep learning.
https://alex.smola.org/posts/43-fjlt/
Φ = P·H·D, applied right to left: D flips n random signs, H is a Walsh-Hadamard transform (n log n additions, zero multiplications), P is a sparse Gaussian. HD is orthogonal, so norms and inner products survive untouched — but now nothing is spiky.
One cosh inequality shows every coordinate of HDu behaves like a Gaussian of size 1/√n — the flattest a unit vector can possibly be. Union bound over a billion vectors at once. The bill: ~1M additions + 0.5M multiplications instead of 67M.
Φ = P·H·D, applied right to left: D flips n random signs, H is a Walsh-Hadamard transform (n log n additions, zero multiplications), P is a sparse Gaussian. HD is orthogonal, so norms and inner products survive untouched — but now nothing is spiky.
One cosh inequality shows every coordinate of HDu behaves like a Gaussian of size 1/√n — the flattest a unit vector can possibly be. Union bound over a billion vectors at once. The bill: ~1M additions + 0.5M multiplications instead of 67M.
The obvious fix is to sparsify the projection matrix. But sparse projections fail exactly on sparse vectors: put all the mass on one coordinate and the variance explodes. At 1% fill you need 150x as many projections — the savings are gone.
Ailon & Chazelle call this the Heisenberg principle of the Fourier transform: a signal and its spectrum cannot both be concentrated. Which points straight at the fix — if spiky inputs are the only enemy, rotate the world so nothing is spiky.