10h agoDashAttention Advances Adaptive Sparse Hierarchical Attention In LLMs——0——Original postPM#713@PMINERVINIOPEPEdoardo Ponti|@PONTIEDOARDOGoodbye top-k in hierarchical attention! We devised DashAttention, which is adaptively sparse (compute is allocated based on the information structure of the query) and end-to-end differentiable. DashAttention pushes the accuracy–efficieny frontier over NSA and InfLLMv2!8:52 AM · May 21, 2026 View on X