/Tech5h ago

DeepSeek's Deli Chen open-sources SKILL, an autonomous ML research framework that executed a 285-billion parameter GRPO experiment

Story Overview

Deli Chen from DeepSeek just dropped SKILL, a minimal protocol that turns a language model into an autonomous research agent. The system ran a full 285-billion-parameter GRPO training campaign, iterated through sixteen versions of its own survey paper on self-play methods, and landed an internal review score of 8.6 without directional human input after the initial prompt.

184762741177.4K

#501

Original post

Deli Chen@victor207755822#1803inTech

🧵 Deli AutoResearch SKILL is now officially open source! 🎉 https://victorchen96.github.io/auto_research/framework.html

Alongside it, we’re dropping our 4th survey paper — this time on Self-play. https://victorchen96.github.io/auto_research/paper.html

Inspired by AlphaZero, we got a powerful insight: prior knowledge doesn’t always lift the ceiling. Models can discover more globally optimal solutions just by playing against themselves.

The biggest change in this paper? For the first time, the AutoResearch Agent autonomously planned GPU experiments — and submitted actual RL runs on the DeepSeek 285B model.

The entire RL pipeline — experiment design, code writing, running, debugging, and conclusion summarization — was 100% automated, with zero human intervention from me. This was incredibly difficult, but an incredibly important step. https://victorchen96.github.io/blog_self_play_story.html

GRPO is the tool being called by the AutoResearch Agent here. We see this as the beginning of our Continual Learning research journey. 🚀

As always, this is my personal research project, unaffiliated with any organization. All views are my own.

#AI #ReinforcementLearning #SelfPlay #OpenSource #AutoML #ContinualLearning #DeepSeek

7:52 AM · Jun 17, 2026 · 40.9K Views

Developer Impact

One file carries the entire agent spec

Everything needed lives in a single SKILL.md: motivation rules, stall-detection logic, heartbeat watchdog, and strict session constraints. No executable code ships, so anyone can copy the protocol and wire their own models around it.

Open Question

Limits stay visible even at 8.6

The 72-hour runs still relied on internal simulated reviewers rather than external peer review, and the framework openly flags ongoing hallucination risks. How the approach behaves once real labs start forking and extending it remains to be seen.

Sentiment

Users are discussing Deli's open-sourced AutoResearch framework with its self-play survey and RL agent, with some praising GRPO for its simplicity and performance on reasoning tasks while others remain unimpressed by the supporting paper.

Pos

66.6%

Neg

33.4%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.