Thanks @tilderesearch for making this blog post! A few future directions for Parallax I find interesting: - Optimizer: understanding why optimizer interacts so strongly with the Parallax correction, and what that implies for attention more broadly. - Architecture: developing the nonparametric counterpart of DeltaNet, a mechanism sitting between Parallax and LLA. - System: Parallax keeps the structure of standard attention, so it should compose with attention sparsity optimizations. - Post-training: with W_R = 0, Parallax is standard attention, so it can be initialized from a pretrained checkpoint and adapted. I'm curious whether W_R could serve as a steering parameter for RL.
Positive users praised Tilde Research's Parallax local-linear attention correction and varlen attn plans as based or a win, while negative users called the work crazy and malicious.
Most Activity
open-source is the resistance force against "king wanna bes" like @DarioAmodei, who are trying to monopolize and control future of humanity.
build, share, and rebel!

@YifeiZuoX @tilderesearch BTW do you have any plans for future parallax kernel releases? specifically better MFU, varlen, etc?

@zhaoran_wang @DarioAmodei exactly kinda stuff i mentioned here

@LLMenjoyer @tilderesearch yeah, varlen is planned @zz30gs

@LLMenjoyer @YifeiZuoX @tilderesearch thanks for your interest. we are planning varlen attn

@zz30gs @YifeiZuoX @tilderesearch w

@YifeiZuoX @tilderesearch @zz30gs based

@josepha_mayo @DarioAmodei it is literally crazy and malicious