Tilde Research releases Parallax, upgrading softmax attention to local-linear regression to fix boundary bias
It matches FlashAttention 3 performance on Hopper hardware
Positive users praised Tilde Research's Parallax local-linear attention correction and varlen attn plans as based or a win, while negative users called the work crazy and malicious.
Most Activity
Tilde folks are cracked. They won’t stop!
open-source is the resistance force against "king wanna bes" like @DarioAmodei, who are trying to monopolize and control future of humanity.
build, share, and rebel!
Thanks @tilderesearch for making this blog post! A few future directions for Parallax I find interesting: - Optimizer: understanding why optimizer interacts so strongly with the Parallax correction, and what that implies for attention more broadly. - Architecture: developing the nonparametric counterpart of DeltaNet, a mechanism sitting between Parallax and LLA. - System: Parallax keeps the structure of standard attention, so it should compose with attention sparsity optimizations. - Post-training: with W_R = 0, Parallax is standard attention, so it can be initialized from a pretrained checkpoint and adapted. I'm curious whether W_R could serve as a steering parameter for RL.

@YifeiZuoX @tilderesearch BTW do you have any plans for future parallax kernel releases? specifically better MFU, varlen, etc?

@zhaoran_wang @DarioAmodei exactly kinda stuff i mentioned here

@LLMenjoyer @tilderesearch yeah, varlen is planned @zz30gs

@LLMenjoyer @YifeiZuoX @tilderesearch thanks for your interest. we are planning varlen attn

@zz30gs @YifeiZuoX @tilderesearch w

@YifeiZuoX @tilderesearch @zz30gs based

@josepha_mayo @DarioAmodei it is literally crazy and malicious