/Tech40d ago

Garry Tan Shares Workflow For Self-Improving AI Agents With Progressive Evals

392831326540.4K

#121

Original post

Garry Tan@garrytan#121inTech

Funny how simple using openclaw and Hermes agent is these days

Just have it do stuff. Then improve in progressive batches with evals from multiple frontier models. It self improves!

Garry Tan@garrytan

Right now I just use my personal AI and our company brain and it screws up and I tell it to fix it and write tests for it.

Also I do cross modal evals on progressive batches (eg if there are 10000 items do 5 and eval the input and output and skill, then keep doubling the batch size as you go)

10:23 PM · May 25, 2026 · 32.1K Views

Sentiment

Users praise Garry Tan's workflow for self-improving AI agents because the progressive evaluation loop enables agents to improve autonomously at tasks without explicit instructions.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS8.3KBOOKMARKS85LIKES43RETWEETS4REPLIES5

Garry Tan@garrytan

By evals I mean literally tell the agent: given what we discussed about what we are doing and why and what happened, use three different frontier models to look at inputs and outputs of your skill file calling the code, and rate it on effectiveness. Why isn’t it a 10? How could it be made to be so?

Run this a few times and you will be surprised how fast it gets astonishingly better

And since it is in a skill file plus code with evals (LLM as judge) and unit tests, it stays better forever

Garry Tan@garrytan

Funny how simple using openclaw and Hermes agent is these days

Just have it do stuff. Then improve in progressive batches with evals from multiple frontier models. It self improves!

40d8.3K4385

Peter Cox@peter_cox36232

I don't use OpenClaw but use a 'Lite' version via Codex which can control your computer with very little config.

I'm flying back from Vietnam and got it to look at my plane ticket, and work out some options to get back home via train/bus with some tabs open ready to book.

Very impressive!

40d511