/Tech3h ago

THUDM open-sources Slime, an LLM post-training framework for reinforcement learning scaling and online preference optimization

Story Overview

THUDM has released Slime as a fully open-source framework that handles the post-training stage of large language models through reinforcement learning scaling and online preference optimization, with the same stack used to finish GLM-5.2 training in roughly two days.

291.5K1221.2K134.2K

#90

Original post

slime@slime_framework

Thanks for the support!

A small note: slime has supported not only OPD, but the full RL + OPD post-training workflow since GLM-4.5.

More to come for scalable agentic RL infra.

Didier Lopes@didier_lopes

Incredible how Z. ai literally has their RL infrastructure open source.

The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.

https://github.com/THUDM/slime

1:31 PM · Jun 19, 2026 · 6.9K Views

Developer Impact

Teams gain ready-made RL loops instead of custom builds

Native Megatron-LM and SGLang hooks plus support for synchronous or asynchronous modes let developers plug in their own verifiers, environments, and data workflows without rewriting core training plumbing.

Open Question

Wider adoption still lacks public signals

The repo shows use across GLM variants plus Qwen, DeepSeek V3, and Llama 3, yet no adoption counts, contributor activity, or scaling benchmarks beyond the GLM-5.2 timeline have been shared.

Sentiment

Many users are excited about THUDM open-sourcing the Slime RL platform for GLM-5.2 post-training because its flexibility and two-day efficiency lower barriers for custom RL workflows.

Pos

93.7%

Neg

6.3%

11 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

GITHUBVia

Posts from X

Most Activity

VIEWS5.5KBOOKMARKS5LIKES75

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@didier_lopes > The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.

what

1d5.5K755

RETWEETS118

Didier Lopes@didier_lopes

Incredible how Z. ai literally has their RL infrastructure open source.

The entire OPD post-training of GLM-5.2 took on this slime platform took ~2 days.

https://github.com/THUDM/slime

1d128.4K1.4K1.2K

REPLIES2

Didier Lopes@didier_lopes

@Infopulsed This is probably why everyone is so paranoid about distillation - it seems it's incredibly efficient.

This is from Qwen3's technical report.

14h72684

Didier Lopes@didier_lopes

@alexbastian_ai They do shed some light into the type of data they use for pre-training

20h1.5K104

vik@vikhyatk

@teortaxesTex @didier_lopes this is after training all of the expert variants and collapse them back into the same base model? seems plausible to me the signal is much more dense in that phase

1d874162

Alex Sebastian@alexbastian_ai

@didier_lopes Nice, now open source the pre-training code and all the datasets.

1d3.1K17

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@vikhyatk @didier_lopes yeah the word "entire" was throwing me off but for OPD I guess makes sense, if they encounter no problems

1d65691

vik@vikhyatk

@teortaxesTex @didier_lopes 200 experiments to figure out the optimal policy. and then 2 days to do the final training run

1d31761

Prannay Hebbar@Pran_Ker

@didier_lopes yup, it contains their exact Megatron +sglang setup, you can even checkout the some of the older GLM versions in other branches.

16h1.3K41

Zhipeng Huang@nopainkiller

@didier_lopes they are on X @slime_framework

22h1.3K51

Didier Lopes@didier_lopes

@Pran_Ker amazing.

Not even 1.0.0 yet, we are so early

15h9403

Deepak@thedeepflux

@didier_lopes GLM-5.2’s 2-day fine-tune on an open RL platform is a solid signal. most RL frameworks take weeks even with larger teams. this could reset expectations for solo or small-team builders.

6h5711