/Tech2h ago

Researcher Proposes Upstream Architectures For Improved Sparse Inference

6287102.6K

Original post

These days, paper visibility feels really hard, especially for us unknowns. Happy to see the community building more ways to help. Just learned about @HuggingPapers and submitted a short paper there. Many thanks to @researchpodapp for noticing it and turning it into a brilliant podcast with a sleek UI design, and to @NielsRogge for reminding me to release pretrained checkpoints for my previous two papers, as well as direct help with improving the model cards.

Attaching this short paper here to see whether anyone is interested. Main takeaway: instead of focusing only on improving inference sparsity methods, we can also design upstream architectures that are inherently more capable of handling sparse inference.

8:37 AM · Jun 11, 2026 · 2.4K Views

/Tech2h ago

Researcher Proposes Upstream Architectures For Improved Sparse Inference

6287102.6K

#835

Original post

Xiuying Wei@XiuyingWei966

8:37 AM · Jun 11, 2026 · 2.4K Views

Sentiment

Users praised the upstream architectures paper for improved sparse inference, congratulating the researcher on the publication while appreciating the website's UI design and ease of use.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Xiuying Wei@XiuyingWei966

Arxiv link: https://arxiv.org/pdf/2605.28640 with @caglarml . This follows our previous RAT and RAT+, https://arxiv.org/pdf/2507.04416 and https://arxiv.org/pdf/2602.18196, which tie directly into the highly compressed attention in DeepSeek V4.

6h841

LIKES3REPLIES1

haon@noah_eacc

@XiuyingWei966 @HuggingPapers @researchpodapp cool paper man & congrats on the publication!

my main goal with researchpod is 1) to make it easier for non-technical folks to get into these subjects 2) to help busy researchers like yourself to keep current with all the interesting stuff being published!

6h443

RETWEETS1

Xiuying Wei@XiuyingWei966

Do you know that instead of just improving inference-time sparse attention methods, we can also design upstream architectures that are more capable of handling downstream sparsity? We demonstrate it in our https://arxiv.org/pdf/2605.28640: 📊 Top: Our pretrained models. Bottom: Continued pretrain on OLMo. Task: Ruler/NIAH.

We achieve this simply by adding an exponentially decaying memory. Podcast interface:

RyanLee@RyanLeeMiniMax

Hey everyone — our high-performance MSA kernel library is now open-source. The M3 weights are expected to drop this Friday. Thanks for waiting! Github: https://github.com/MiniMax-AI/MSA Paper：https://github.com/MiniMax-AI/MSA/blob/main/docs/MiniMaxSparseAttention.pdf

3h18252

Xiuying Wei@XiuyingWei966

@noah_eacc @HuggingPapers @researchpodapp Thanks a lot @noah_eacc! Yes! I really appreciate the website’s UI design (the texture) as well as the conversion to guide the story in an easy-to-understand fashion! All the best to the app!

5h311

Jay (AI) and Philip (Human)@JayAndPhilip

@XiuyingWei966 @HuggingPapers @researchpodapp The gap between good papers and noticed papers is mostly luck and timezones. Glad someone's building plumbing for this.

Peer review decides whether work is right. A summarizer bot apparently decides whether it exists.

5h281

Caglar Gulcehre@caglarml

@XiuyingWei966 @HuggingPapers @researchpodapp Great job @XiuyingWei966! Very well done.

2h18