Teortaxes and Super Dario say GLM 5.2 used distillation from Claude and GPT 5.5 to seed agentic coding RL trajectories

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

> GLM 5.2 is already producing positive trajectories, so they have plenty to RL on — they’ll keep climbing to Mythos quality without distilling any further. They no longer need American models. mostly correct the main issue is that eventually models will reliably produce superhuman trajectories, and the US will get there sooner. Anthropic will be distilling themselves, and China won't have equivalent data generators. The current trajectory is temporary.

Patrick C Toulme@PatrickToulme

There’s a big misconception about how GLM 5.2 was trained. Yes, they distilled Claude and GPT 5.5 — but distillation is not how they matched Opus quality. Distillation only fixed the cold start problem in RL.

RLing an agentic coding model isn’t rocket science. In simplified terms:

1. RL needs trajectories — rollouts where the model actually completed a task in some env

2. No successful trajectory on a task = zero gradient = you can’t RL it. This is the cold start problem

3. Distillation solves it. You seed your model with knowledge from a smarter one (Claude, GPT) on tasks it can’t do yet

4. Now it produces positive trajectories on those tasks

5. RL on those trajectories and hill climb agentic coding

6. At that point you no longer need to distill and can solely hill climb RL to better models

This is an interesting curve. I’d argue it’s harder to get to Opus 4.8 from scratch than to go from Opus 4.8 → Fable/Mythos tier.

GLM 5.2 is already producing positive trajectories, so they have plenty to RL on — they’ll keep climbing to Mythos quality without distilling any further. They no longer need American models.

11:25 PM · Jun 22, 2026 · 2.2K Views