Excited to share last summer's work at Google Research!
Most hybrid models today are static: each token sees the same interleaved pattern of your favorite linear model and attention.
Oryx instead varies the model used across the sequence through shared representations.
1/

