/Tech2h ago

Researcher Proposes Upstream Architectures For Improved Sparse Inference

6287102.6K
Original post
Xiuying Wei@XiuyingWei966

These days, paper visibility feels really hard, especially for us unknowns. Happy to see the community building more ways to help. Just learned about @HuggingPapers and submitted a short paper there. Many thanks to @researchpodapp for noticing it and turning it into a brilliant podcast with a sleek UI design, and to @NielsRogge for reminding me to release pretrained checkpoints for my previous two papers, as well as direct help with improving the model cards.

Attaching this short paper here to see whether anyone is interested. Main takeaway: instead of focusing only on improving inference sparsity methods, we can also design upstream architectures that are inherently more capable of handling sparse inference.

8:37 AM · Jun 11, 2026 · 2.4K Views
Sentiment

Users praised the upstream architectures paper for improved sparse inference, congratulating the researcher on the publication while appreciating the website's UI design and ease of use.

Pos
100.0%
Neg
0.0%
3 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS84
Xiuying Wei@XiuyingWei966

Arxiv link: https://arxiv.org/pdf/2605.28640 with @caglarml . This follows our previous RAT and RAT+, https://arxiv.org/pdf/2507.04416 and https://arxiv.org/pdf/2602.18196, which tie directly into the highly compressed attention in DeepSeek V4.

6hViews 84Likes 1
LIKES3REPLIES1
haon@noah_eacc

@XiuyingWei966 @HuggingPapers @researchpodapp cool paper man & congrats on the publication!

my main goal with researchpod is 1) to make it easier for non-technical folks to get into these subjects 2) to help busy researchers like yourself to keep current with all the interesting stuff being published!

6hViews 44Likes 3
RETWEETS1
Xiuying Wei@XiuyingWei966

Do you know that instead of just improving inference-time sparse attention methods, we can also design upstream architectures that are more capable of handling downstream sparsity? We demonstrate it in our https://arxiv.org/pdf/2605.28640: 📊 Top: Our pretrained models. Bottom: Continued pretrain on OLMo. Task: Ruler/NIAH.

We achieve this simply by adding an exponentially decaying memory. Podcast interface:

RyanLee@RyanLeeMiniMax

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: https://github.com/MiniMax-AI/MSA Paper:https://github.com/MiniMax-AI/MSA/blob/main/docs/MiniMaxSparseAttention.pdf

3hViews 182Likes 5Bookmarks 2
Xiuying Wei@XiuyingWei966

@noah_eacc @HuggingPapers @researchpodapp Thanks a lot @noah_eacc! Yes! I really appreciate the website’s UI design (the texture) as well as the conversion to guide the story in an easy-to-understand fashion! All the best to the app!

5hViews 31Likes 1

@XiuyingWei966 @HuggingPapers @researchpodapp The gap between good papers and noticed papers is mostly luck and timezones. Glad someone's building plumbing for this.

Peer review decides whether work is right. A summarizer bot apparently decides whether it exists.

5hViews 28Likes 1

@XiuyingWei966 @HuggingPapers @researchpodapp Great job @XiuyingWei966! Very well done.

2hViews 18