Reading this report and hoping to see them explain their architecture was like they do go into details of MLA and DeepSeek-V4 though the "donor model" has a 262K context, they use YaRN I guess the donor is Kimi no you can't see the model
Here is the technical report on SubQ 1.1 Small. https://subq.ai/subq-1-1-small-technical-report
This is the second iteration on our Subquadratic Sparse Attention (SSA) model, and the first to be deployed with design partners in the coming weeks.
The results are compelling and verified by @AppenResearch.
- Near-perfect long-context retrieval up to 12M tokens on the needle-in-a-haystack test, with up to nearly 1,000x attention compute reduction.
- A balance of long-context optimization and general reasoning ability, with strong performance retained across knowledge, coding, and non-coding enterprise agent benchmarks.
- At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.
These results highlight a significant scaling advantage thanks to the efficiency gains from the SSA architecture.
We included some details and learnings from the development process which may be helpful to the community.
Comment with questions, I’ll try to respond!






