/Tech1d ago

LMSYS details TITO, a token-matching optimization for its Miles system that cuts agentic RL training compute by 10x

The system prevents silent off-policy drift during multi-turn rollouts.

71962314428.3K

#654

Original post

Banghua Zhu#1261

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy.

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

9:03 AM · Jun 9, 2026 · 20.4K Views

/Tech1d ago

LMSYS details TITO, a token-matching optimization for its Miles system that cuts agentic RL training compute by 10x

The system prevents silent off-policy drift during multi-turn rollouts.

71962314428.3K

#654

Original post

Banghua Zhu#1261

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

9:03 AM · Jun 9, 2026 · 20.4K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS7.6KLIKES55

Ying Sheng@ying11231

I remember when Jiajun told me he wants to push for TITO because he thinks this is important though he does not understand why people are not doing it. It’s great they were able to insist on their own judgement! A long work before this blog.

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

1d7.6K5530

BOOKMARKS33RETWEETS6

Banghua Zhu@BanghuaZ

Getting the chat template consistent across multiple turns for agentic training can be much more tricker than people think. There have been headaches like reasoning trajectory pruned by chat templates, detokenize-retokenize mismatch etc.

Token-In-Token-Out (TITO) ensures that the tokens prefix across turns are consistent, removing silent off-policyness introduced by multi-turn agentic training. Miles has fully supported TITO for popular open source models. Check the blog here!

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

1d5.7K5033

LMSYS Org@lmsysorg

Read full blog: https://www.lmsys.org/blog/2026-05-13-no-token-left-behind/

1d28421