This was one of the standout AI papers of the week.
(bookmark it)
It tackles a question most self-improving AI agents ignore: is the agent actually discovering anything, or just remixing what it already knows?
How can you tell whether the agent is doing real discovery or just confident retrieval?
The authors give three clean buckets:
- Retrieval is looking something up in a notebook you already have.
- Search is combining tools you already own in new ways.
- Discovery is inventing a new concept that wasn't in your toolkit before.
The issue is that most agents stop at the first two.
The math behind their definition (category theory plus a left Kan extension, if you care) is basically a bookkeeping trick to ask: could the old version of me have produced this result? If yes, it's not discovery. If no, something genuinely new showed up.
They build a Builder/Breaker agent that studies protein mechanics. Over four rounds, the model's fit accuracy actually drops (R² goes from 0.48 to 0.68 to 0.54 to 0.41). At first glance, that looks like a failing agent.
It isn't.
The agent kept taking on harder proteins and rewriting its theory to cover them. Data grew almost 10x while the model code grew only 1.3x. A smaller theory covering a bigger world is exactly what good science looks like.
Why does it matter?
If you optimize for accuracy alone, your self-improving agent will just settle into easy benchmarks and stop. This paper offers a cleaner success signal and asks whether the agent is compressing more of the world into less code over time.
Paper: https://arxiv.org/abs/2606.01444
Learn to build effective AI agents in our academy: https://academy.dair.ai/

















