Fascinating paper on self-improving agents.
(bookmark it)
If you are working on agentic loops, you will quickly realize that they are only as good as the effectiveness of the evaluator.
Self-improvement loops tend to stall the moment the judge stops getting harder. The agent learns to satisfy a fixed evaluator rather than getting genuinely better. The Red Queen Gödel Machine, from Cambridge, co-evolves the agent and its evaluator together, so the bar keeps rising as the agent climbs.
The name borrows the evolutionary arms race. Both sides have to keep running to stay in place.
A frozen evaluator is where reward hacking creeps into self-improvement. Co-evolving the judge is a structural answer to that, and it keeps the loop honest over many rounds.
Paper: https://arxiv.org/abs/2606.26294
Learn to build effective AI agents in our academy: https://academy.dair.ai/












