The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
AI Judge changed title after evaluation, original title: "Prime Intellect releases General-Agent, a synthetic reinforcement learning environment with self-evolving tasks"
Tasks advance automatically via synthesizer, solver, and gate components.
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
Positive users hail Prime Intellect's General-Agent as substantial infrastructure for multi-agent self-evolving RL training, while some worry the self-evolving task corpus breaks auditability and reliable verification.
No Digg Deeper questions have been answered for this story yet.
This is probably why gpt 5.5 is so ridiculously good at decompiling and reverse engineering steam games huh
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
awesome work by the PI team 👏
I think we’re still in the very early days of exciting research around generating high quality Agent Training Envs/Evals as multi-player games
Tweeted AlphaEval which we were riffing on last month, super excited to dig into this, some great ideas here + more to swarm on - Curriculum Learning and Tiers of Difficulty in Envs/Evals - 2 Player setup vs >=3 player setup with a judge - Gating + validation mechanisms - Grounding in production traces vs not?
this type of data generation at scale shows sparks of being able to adapt agents to any domain without tons of data curation (besides good rubrics and game design)
great drop 🔥
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
absolutely loving this. for the past few months i’ve also been exploring synthetic and procedural environment generation for agents (though at a much smaller and earlier-stage scale). the projects i’ve been working on (synthetic workspace gym and rlvr gym) include a small set of environment families such as python script repair, pipeline repair, tabular tasks, scheduling, graph planning and retrieval-style workspace tasks, with workspace artifact logging, hidden evaluators, trajectory traces and final diffs. the motivation was mostly curiosity...prompts alone feel too weak as a substrate for studying long-horizon agents. what we really need are executable worlds where agents can inspect state, take actions, receive grounded feedback and fail in ways that are actually analyzable.
that is why i’m especially excited by what the prime intellect team has done here. we need thousands of diverse, verifiable, stateful tasks across domains not just static prompt datasets or a handful of handcrafted benchmarks. imo scaling this lets us study curriculum, tool use, verifier design, task difficulty, trajectory quality, reward hacking and long-horizon generalization in a much more systematic way.
my own versions are still early and small but it’s really exciting to see PI push this direction seriously: evolving environments, calibrating difficulty and generating broad task distributions across thousands of domains. also i really loved the clean and neat blog post!
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
Blog:
https://www.primeintellect.ai/blog/general-agent
For more details, check out the environment and the blog post
https://app.primeintellect.ai/dashboard/environments/primeintellect/general-agent
Automating RL environments is the next step toward automating everything else.
Introducing general-agent by @mikasenghaas > open agentic environments with 1000s of tools are scarce, so we're building one that builds itself > A synthesizer evolves tasks across difficulty tiers, empirically gated by a solver. Hard tiers seed the next wave, hillclimbing toward frontier-level difficulty. > 4,504 tasks / 1,040 domains / 8,159 unique tools
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
making the tech that closed labs have open and giving it to everyone, one release after another :)
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
this was a fun side project to increase our set of usable agent tasks with a focus on tool diversity. the thing that stood out to me the most was the process: going from an idea, to environment, to running 1000s of parallel, multi-hour agent episodes was mind-bogglingly easy thanks to our stack. i view the v0 taskset as a preview of what’s to come: byo harness with broad task composability + truly multi-agent training to create, evolve and verify tasks on-the-fly
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
go from idea, to environment, to running 1000s of parallel, multi-hour agent episodes within a day.
open superintelligence stack
This work was built entirely on top of our stack
- verifiers — to build the solver and synthesizer - hosted evals — to synthesize tasks at massive scale - hosted training — to validate training behavior
allowing us to go from prototype to thousands of agents running in parallel within a day.
at this point i can wake up every morning and expect to see @PrimeIntellect cook another very cool thing
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
This is the right attitude towards agents. Continually learning the harness+model
This work is a step towards self-improving agents. We believe the environment has many of the right ingredients evolve our tooling and platform towards:
- training agents, not models (train any task in any harness) - compose multiple agents (multi-agent episodes like synthesizer-solver, solver-grader, etc.)
so cool -- the space of possible informative environments is as vast and manually enumerating them is intractable. automating diverse construction is critical for making Bouba (and not Kiki) post-trained model behavior & capabilities.
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools
@mikasenghaas https://www.primeintellect.ai/blog/general-agent
Automating RL environments is the next step toward automating everything else.
Introducing general-agent by @mikasenghaas > open agentic environments with 1000s of tools are scarce, so we're building one that builds itself > A synthesizer evolves tasks across difficulty tiers, empirically gated by a solver. Hard tiers seed the next wave, hillclimbing toward frontier-level difficulty. > 4,504 tasks / 1,040 domains / 8,159 unique tools
highly recommend checking out @mikasenghaas full blog post on the general agent environment release with all the details and experiments:
https://www.primeintellect.ai/blog/general-agent
go from idea, to environment, to running 1000s of parallel, multi-hour agent episodes within a day.
open superintelligence stack
making the tech that closed labs have open auf giving it to everyone, one release after another :)
The next step toward automating AI is automating RL environments
Introducing General-Agent: A fully synthetic environment whose task corpus self-evolves and grows harder over time
4,504 tool-use tasks · 1,040 domains · 8,159 unique tools

@PrimeIntellect @mikasenghaas cooked 🐐🐐

Most environments today are static snapshots. general-agent environments ships two agents capable of synthesizing tasks on-the-fly in a 2-player loop:
A synthesizer agent evolves a task in difficulty tiers. Each tier's difficulty is measured by running a solver agent against it. Only tiers that land in the target pass-rate band are kept; the hardest tiers are used to seed the next wave.

We let GLM-5.1 and GPT-5-Mini play this game for 2 days to create the initial task corpus of this environment. We then analyzed the task corpus by generating over 200K solver traces from GLM-5.1. We find that solve rate decreases predictably with increasing difficulty tiers as a result of more complex queries that require reasoning over more tools and larger databases.

For more details, check out the environment and the blog post
https://app.primeintellect.ai/dashboard/environments/primeintellect/general-agent

@vincentweisser @PrimeIntellect @mikasenghaas insane