/AI5h ago

Zhaoran Wang releases Parallax, a new model architecture that requires the Muon optimizer to outperform standard softmax attention

The work advises against evaluating architectures solely with AdamW.

121095.6K

Quote posts

#1761

Comments

#865

Original post

Jiaxin Shi@thjashin#1761inAI

@zhaoran_wang Very cool!

Zhaoran Wang@zhaoran_wang

for me, the coolest finding is that you can connect/interpolate all softmax/linear variants and give a promising direction - affine-linear : )

2:24 PM · May 31, 2026 · 267 Views

/AI5h ago

Zhaoran Wang releases Parallax, a new model architecture that requires the Muon optimizer to outperform standard softmax attention

The work advises against evaluating architectures solely with AdamW.

--0--

Quote posts

#1761

Comments

#865

Original post

Jiaxin Shi@thjashin#1761inAI

@zhaoran_wang Very cool!

Zhaoran Wang@zhaoran_wang

for me, the coolest finding is that you can connect/interpolate all softmax/linear variants and give a promising direction - affine-linear : )

2:24 PM · May 31, 2026 · 267 Views

Sentiment

Users thanked the researchers for their Parallax model extending Softmax Attention via higher-order test-time regression.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS4.9KBOOKMARKS9LIKES20REPLIES1

Jiaxin Shi@thjashin

Very interesting work from @zhaoran_wang @YifeiZuoX. Looks like the first working version of higher-order test-time regression extension of softmax attention (cc @heyyalexwang )

Yifei Zuo@YifeiZuoX

For me, the coolest finding is that Muon optimizer is crucial for Parallax to move beyond Softmax Attention.

Lesson — don't evaluate new architectures solely under AdamW, you'll miss the good ones.

paper: https://arxiv.org/abs/2605.29157 code: https://github.com/Yifei-Zuo/Parallax/

For the origin of Parallax, check out the LLA paper at ICLR 2026: paper: https://arxiv.org/abs/2510.01450 code: https://github.com/Yifei-Zuo/FlashLLA

5h4.9K209