/AI6h ago

Ruihang Lai launches PithTrain, an 11,000-line agent-native MoE training framework designed to be modified by AI agents

It helps agents complete tasks with 62% fewer turns.

--0--
Original postTianqi Chen#426
Hao Kang@haok1402

PithTrain is out. The bigger bet behind it: ML systems built to be evolved by agents, not just maintained by humans. Grateful to my collaborators for everything we've built and learned! Excited for what's next :-)

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

11:05 AM · Jun 1, 2026 · 914 Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS5.1KBOOKMARKS58LIKES74RETWEETS22
Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

6hViews 5.1KLikes 74Bookmarks 58
REPLIES3
Cody Blakeney@code_star

Admittedly I really got this way when I became hydra config pilled at meta.

If you designed you code so a function or a class was configured, it was directly composable, and you could see the exact function and args!

(I do realize that’s just indirection and a builder with extra steps)

Cody Blakeney@code_star

Also tbh, now that I’m reading the paper, most of what they are saying is good for agents just seems … good? I always hated this shit with hidden builder classes.

4hViews 1.8KLikes 14Bookmarks 16
Ruihang Lai launches PithTrain, an 11,000-line agent-native MoE training framework designed to be modified by AI agents · Digg