/Tech26d ago

DashAttention Advances Adaptive Sparse Hierarchical Attention In LLMs

12083116521.4K

#714

Original post

Pasquale Minervini#714

Edoardo Ponti@PontiEdoardo

Goodbye top-k in hierarchical attention!

We devised DashAttention, which is adaptively sparse (compute is allocated based on the information structure of the query) and end-to-end differentiable.

DashAttention pushes the accuracy–efficieny frontier over NSA and InfLLMv2!

Yuxiang Huang@yxyxyyy6

[1/n] Can a model learn *where* and *how much* information it should attend to, and do so efficiently?

We introduce DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention! This pushes the accuracy-efficiency frontier in LLMs.

8:52 AM · May 21, 2026 · 21.4K Views

Sentiment

Users praise DashAttention's adaptive sparse hierarchical attention for LLMs as delivering real accuracy-efficiency gains over typical hype.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Rami Sufian@Rami_Bball_Fan

@PontiEdoardo Now this is real substance. Adaptive sparsity that actually improves the accuracy-efficiency tradeoff is a lot more interesting than the usual AI hype.

26d80