π Introducing Vortex β sparse attention designed by AI agents, efficient at scale.
π Same accuracy, way more throughput β across every model we tried π πΉ GLM-4.7-Flash (MLA) β 4.7Γ faster πΉ MiniMax-M2.7 (229B) β 1.37Γ faster πΉ Qwen3-1.7B (agent-discovered!) β 3.46Γ faster
π€ How? An agent writes a flow in a few lines of Python; Vortex compiles it into fused kernels in a real serving stack (SGLang) and benchmarks it end-to-end.
ποΈ The design: a Python frontend (vFlow) over a page-centric tensor abstraction (vTensor) + a serving-integrated backend.
π https://arxiv.org/abs/2606.06453 π» https://github.com/Infini-AI-Lab/vortex_torch π https://infini-ai-lab.github.io/vortex_torch/ π https://infini-ai-lab.github.io/vortex_torch/docs/
