/Tech1d ago

LMSYS details TITO, a token-matching optimization for its Miles system that cuts agentic RL training compute by 10x

The system prevents silent off-policy drift during multi-turn rollouts.

71962314428.3K
Original postBanghua Zhu#1261
LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy.

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

9:03 AM · Jun 9, 2026 · 20.4K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS7.6KLIKES55
Ying Sheng@ying11231

I remember when Jiajun told me he wants to push for TITO because he thinks this is important though he does not understand why people are not doing it. It’s great they were able to insist on their own judgement! A long work before this blog.

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy.

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

1dViews 7.6KLikes 55Bookmarks 30
BOOKMARKS33RETWEETS6
Banghua Zhu@BanghuaZ

Getting the chat template consistent across multiple turns for agentic training can be much more tricker than people think. There have been headaches like reasoning trajectory pruned by chat templates, detokenize-retokenize mismatch etc.

Token-In-Token-Out (TITO) ensures that the tokens prefix across turns are consistent, removing silent off-policyness introduced by multi-turn agentic training. Miles has fully supported TITO for popular open source models. Check the blog here!

LMSYS Org@lmsysorg

📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles

In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy.

Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy

How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect

Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.

1dViews 5.7KLikes 50Bookmarks 33
LMSYS Org@lmsysorg

Read full blog: https://www.lmsys.org/blog/2026-05-13-no-token-left-behind/

1dViews 284Likes 2Bookmarks 1