/AI6h ago

XGBoost creator Tianqi Chen releases PithTrain, a compact MoE training framework designed to fit inside AI agent context windows

The 11K-line Python codebase lets agents autonomously customize code.

--0--
Original posts
Quote posts
Original postTim Dettmers#54
Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

11:01 AM · Jun 1, 2026 · 2.3K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS336BOOKMARKS1LIKES9RETWEETS1
Tianqi Chen@tqchenml

When we work with our colleagues in various places, one critical need is to optimize and customize the MoE training framework for our cluster environment. Agents can help, but existing large codebases easily grow out of an agent's context window. What if we rebuild something from the ground up that is compact and easy for agents to operate on? PithTrain is the result of that exercise. It runs scalably and efficiently for modern MoE training, allowing agents to build out new features with fewer turns, less cluster-access time, and fewer tokens. We believe that agent-native machine learning systems will favor agent-task efficiency; this is one of the first steps toward that direction.

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

6hViews 336Likes 9Bookmarks 1