8h agoTogether AI optimizes MiniMax-M3 serving, boosting agentic workload throughput by up to 125% using custom sparse attention kernelsA dedicated Rust-based gateway offloads preprocessing for 1M-token inputs.SentimentSentimentPos100%Neg0%Users are excited about Together AI serving MiniMax-M3 with sparse attention and paged decode because the optimizations boost inference throughput by up to 125%.1 comment with sentiment. View comments.