20h ago

Looped Transformers With Subquadratic Mixers Multiply Linear-Time Expressivity

0
Original post

1/ Looped transformers offer extreme parameter efficiency, but their quadratic self-attention kills long-context scalability. What if you swapped attention for subquadratic mixers? It turns out looping doesn't just save parameters—it actively multiplies linear-time expressivity. 🧵

8:57 AM · May 23, 2026 View on X