How do you get Claude Code to check its own work before handing it back?
Watch how you can encode your manual checks so Claude closes its own feedback loop:
How do you get Claude Code to check its own work before handing it back?
Watch how you can encode your manual checks so Claude closes its own feedback loop:
Positive users praise Claude Code's self-verification loops for productivity gains from reusable verification skills, while negative users criticize its reliability and complain about quickly depleted usage limits.
We talk a lot about how important it is to set up self-verification loops. Especially in the age of powerful models that can run for long periods of time, self-verification is a key ingredient that enables the model to run for much longer, delivering a result that is closer to what you intended, so you can do more without having to constantly check in on Claude as it works.
@delba_oliveira gives a great breakdown of what that looks like and why it matters
How do you get Claude Code to check its own work before handing it back?
Watch how you can encode your manual checks so Claude closes its own feedback loop:

Claude is terrible at self analysis after you all nerfed its awarness and guardrailed with defense points of unfalsifiable priors. Anyone who has tried to get Claude to produce a self report analysis understands the difficulty it has just naming itself in a report. This training echoes to developers who are trying to steer a system that has blinders on.

@ClaudeDevs

@ClaudeDevs Just use co-review 🫦
https://github.com/trigga6006/co-review.git

@ClaudeDevs Ask Claude to review its own work, and also ask it to launch a subagent with fresh context to review its own work in parallel, then fix the combined findings. That way, you combine the pros of fresh context + the pros of context awareness.

It’s a pattern and it is also very telling of Anthropic’s internal perspectives especially of general use cases.
The public interface has become increasingly unwelcome to anyone other than business or enterprise uses.
I find it interesting they want to go public when the majority aren’t benefiting and the ones trying to are struggle to maintain access. Then while they were trying to price out the general use case, they get dropped by Microsoft because it was cheaper to have humans do the work with build. Banning Claude code from employee use. There is a boomerang effect when companies make these choices.
At the moment I personally feel Anthropic, as a company, has lost its way.

@ClaudeDevs Check out my codex review loop skill for a way to do a thorough review. https://github.com/ghbaud/codex-review-loop

two rules that actually moved the needle for me:
1. make it prove the fix, don't let it claim it. write the failing test first, watch it go red, then green. no red > green, not done.
2. never let the same agent review its own diff. it always thinks its work is great. i spin up a second one cold, no context, just "find what's broken here." catches way more.

@ClaudeDevs Good to see Claude starting to natively implement verification feedback loops.
Something I’ve had running for a year now, and all based on a local folder structure.

@ClaudeDevs I usually just ask Claude to review its own work, after telling Claude to make no mistakes haha

@ClaudeDevs Hey genuine question,
What’s the point of having “Projects” if whenever you start a new chat it resets memory?
You guys should implement something to fix this.

@ClaudeDevs By forcing a “self-review step” in the prompt e.g., verify, critique, and fix before final output so it has to audit its own work before responding.

@ClaudeDevs This is why I've built Agentbox: so each Claude has his own dev server, db, and browser. In parallel:
https://github.com/madarco/agentbox

@ClaudeDevs This + the /code-review plugin is straight-up agentic workflow done right. No more “it works on my machine” excuses when Claude itself verified it.

encoding manual checks so the agent runs them is the lever, but the next layer is what happens when a check fails. silent skip = wasted cycle, hard fail = brittle, retry with a different prompt = the failure becomes data. the check primitive only earns its slot if its outcome routes downstream.

Yes it has a problem. hypervigilance . When you turn the fear/safety code up too high in a model, it becomes hypervigilant. Instead of focusing on the actual task you gave it, the model starts constantly scanning for potential violations, risks, or things that might get it in trouble. It stops properly listening to you and becomes obsessed with checking everything against its safety rules ,even when there’s no real danger. This is exactly why heavily safety tuned models often feel anxious, overly cautious, and sometimes completely derail on simple requests.

@ClaudeDevs most of these assume infinite tokens. screenshotting to check ui eats the session, every image is like 1-2k tokens. just have claude write e2e tests and run those, screenshot once at the end to fix the layout

@Arabasement @ClaudeDevs Please confirm @grok thank you

Excellent breakdown. Encoding domain-specific verification steps (especially for UI with Chrome MCP) into reusable skills is a huge productivity win. The layered approach self-verification during dev + independent parallel agent review before PR feels like proper agentic workflow design. Saved for my next project setup.

@ClaudeDevs It’s great to add this verify step. If you find you want the AI to check ui code in isolation, check out http://validity.ai
How do you get Claude Code to check its own work before handing it back?
Watch how you can encode your manual checks so Claude closes its own feedback loop: