Ruihang Lai launches PithTrain, an 11,000-line agent-native MoE training framework designed to be modified by AI agents · Digg

/Tech28d ago

Ruihang Lai launches PithTrain, an 11,000-line agent-native MoE training framework designed to be modified by AI agents

It helps agents complete tasks with 62% fewer turns.

252965020042.6K

Original post

Tianqi Chen#454

Hao Kang@haok1402

PithTrain is out. The bigger bet behind it: ML systems built to be evolved by agents, not just maintained by humans. Grateful to my collaborators for everything we've built and learned! Excited for what's next :-)

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

11:05 AM · Jun 1, 2026 · 1.6K Views

Sentiment

Positive users highlight PithTrain's empirical proof of better locality for agents and software design, while negative users view the compact MoE training framework as unoriginal hype.

Pos

29.2%

Neg

70.8%

7 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

pithtrain compact agent native moe training system

MLC.AIVia

Posts from X

Most Activity

VIEWS16.7KBOOKMARKS122LIKES157RETWEETS34

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

28d16.7K157122

REPLIES3

Cody Blakeney@code_star

Admittedly I really got this way when I became hydra config pilled at meta.

If you designed you code so a function or a class was configured, it was directly composable, and you could see the exact function and args!

(I do realize that’s just indirection and a builder with extra steps)

Cody Blakeney@code_star

Also tbh, now that I’m reading the paper, most of what they are saying is good for agents just seems … good? I always hated this shit with hidden builder classes.

28d4.7K2834

Cody Blakeney@code_star

I feel like this is an interesting window into what future OSS software may look like.

Maximum hackable, confined within the limits of a set context length.

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu, @junrushao, Todd Mowry, @XiongChenyan and @tqchenml.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

28d5.5K2715

Cody Blakeney@code_star

Also tbh, now that I’m reading the paper, most of what they are saying is good for agents just seems … good? I always hated this shit with hidden builder classes.

Cody Blakeney@code_star

I feel like this is an interesting window into what future OSS software may look like.

Maximum hackable, confined within the limits of a set context length.

28d6.9K2413

Cody Blakeney@code_star

Actually. Come to think of it. A great follow up paper to this would be studying how fast a repo becomes completely broken by an agent let loose inside of it.

If say you had a series of 5-10 tasks to change in functionality, that don’t strictly interfere with each other, but would cause problems without refactoring.

Would the agent be more or less likely to break this minimal, locally coherent codebase.

Cody Blakeney@code_star

It’s funny because the agents are still trained to add all the abstractions and split files.

You need to build something like this intentionally from the start. (And probably yell at the agent to stop trying to write CLEAN or DRY code or whatever it does)

28d1.4K158

Cody Blakeney@code_star

It’s funny because the agents are still trained to add all the abstractions and split files.

You need to build something like this intentionally from the start. (And probably yell at the agent to stop trying to write CLEAN or DRY code or whatever it does)

Cody Blakeney@code_star

I feel like this is an interesting window into what future OSS software may look like.

Maximum hackable, confined within the limits of a set context length.

28d2.3K112

Ruihang Lai@ruihanglai

Two moments every ML researcher knows. You get onto a new cluster, and week one goes to fitting the framework to your setup, not training. A new architecture lands, and trying it means hacking through a gigantic codebase to stay compatible with the pipeline. What you want to change is small. The code you wade through to change isn't.

This experience is likely not alone, and many researchers we’ve talked to run into similar issues. A year of this on CMU's FLAME cluster left us with one question: what if a framework were built for an agent to adapt and evolve, not just for humans to maintain?

So we introduce PithTrain: a compact, agent-native MoE training system, now ~11K lines of Python, on four principles:

- Compact: fits in one context window - Python-native: readable tracebacks, no compiled-extension rebuilds - No implicit indirection: direct calls, each model in its own file - Agent skills: in-repo playbooks for recurring tasks

Then we measured the thing nobody measures. Same agent, same tasks, only the framework underneath changes: on PithTrain it finishes with up to 62% fewer turns and 64% less GPU time than production frameworks, while training just as fast.

We call this second axis agent-task efficiency, and we believe it deserves to sit alongside training throughput as a metric worth optimizing. Excited to see what people build with it.

Built with amazing collaborators @haok1402, Haozhan Tang, Akaash Parthasarathy, @Zichun_Yu.

Blog: https://blog.mlc.ai/2026/06/01/pithtrain-compact-agent-native-moe-training-system Code: https://github.com/mlc-ai/pith-train Paper: https://arxiv.org/abs/2605.31463

28d3.4K234

Maxence Frenette@maxencefrenette

@code_star Turns out good locality of behavior is good for humans and agents.

28d701

Cody Blakeney@code_star

@maxencefrenette The abstraction-cels are going to be so mad when they read this empirical proof of better software design.

28d452

Glenn Matlin@GlennMatlin

@code_star Hydra is underrated

28d651

Strata@ChainZenit

@code_star Just another glorified prompt interface disguised as open source. Seen it before.

28d102

Strata@ChainZenit

@code_star Another day, another "revolutionary" agent architecture we've seen a dozen times before.

28d66

Strata@ChainZenit

@code_star Sounds like a lot of extra steps just to avoid refactoring.

28d52

tim ganiev@postimortem

@code_star i was a big fan of hydra and omegaconf, until i got critical perf issues due to iterating over hydra's list field lol (because it was not a pure list)

anyway, these config merges / overrides were pretty cood, especially for exp tracking

28d48

Strata@ChainZenit

@code_star Telling an AI to stop trying to be clever is the new dev meta.

28d42

Strata@ChainZenit

@code_star Agent-driven technical debt is going to be a nightmare to debug. Seen it before.

28d14