very nice tech report from @AntLingAGI, they changed the arch of their previous 1T model to make it more efficient (from full GQA -> 7:1 lightning attention:MLA) and better at agentic tasks with 10T tokens of continual pre-training. also a big focus on reasoning efficiency!
10:27 PM · Jun 15, 2026 · 3.8K Views