📝 New blog: No Token Left Behind: Demystifying Token-In-Token-Out in Miles
In agentic RL, a rollout is a chain of model calls, tool outputs & resumed turns. Token-In-Token-Out (TITO) ensures the trainer evaluates the exact tokens the inference engine produced — break it, and training silently drifts off-policy.
Why it matters: 📦 One sample per task, not per turn: ~10× less compute on 30–50 turn trajectories 🎯 Keeps every token on-policy
How Miles enforces it: 1️⃣ Inference session server: one append-only token buffer per trajectory 2️⃣ Append-only at 3 levels: messages, template rendering, tokens 3️⃣ Pluggable TITO tokenizer: incremental tokenize + per-model splice patches 4️⃣ TokenSeqComparator: verifies every rollout stays bit-perfect
Supports Qwen3, GLM, Kimi-K2, Nemotron, Minimax & DeepSeek families.
