🌀 Introducing Vortex — sparse attention designed by AI agents, efficient at scale.
📈 Same accuracy, way more throughput — across every model we tried 👇 🔹 GLM-4.7-Flash (MLA) → 4.7× faster 🔹 MiniMax-M2.7 (229B) → 1.37× faster 🔹 Qwen3-1.7B (agent-discovered!) → 3.46× faster
🤖 How? An agent writes a flow in a few lines of Python; Vortex compiles it into fused kernels in a real serving stack (SGLang) and benchmarks it end-to-end.
🏗️ The design: a Python frontend (vFlow) over a page-centric tensor abstraction (vTensor) + a serving-integrated backend.
📄 https://arxiv.org/abs/2606.06453 💻 https://github.com/Infini-AI-Lab/vortex_torch 🌐 https://infini-ai-lab.github.io/vortex_torch/ 📚 https://infini-ai-lab.github.io/vortex_torch/docs/
