/Tech21d ago

Question Raised on Automated Tests for AI Coding Agents

1111761811332.2K

#201

Original post

Simon Willison@simonw#201inTech

Do you have your coding agents include automated tests for the code that they write?

12:23 PM · May 25, 2026 · 20.5K Views

Sentiment

Many users endorse requiring automated tests for AI coding agents because they prevent errors and save trouble, while others object that agents often produce fake tests or create extra QA burdens.

Pos

61.5%

Neg

38.5%

14 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS11.7KBOOKMARKS90LIKES116RETWEETS14REPLIES29

Simon Willison@simonw

(I'm firmly on team red/green TDD for agent code, I like having a test suite that protects against them breaking old features when they make new changes - https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/)

Simon Willison@simonw

Do you have your coding agents include automated tests for the code that they write?

21d11.7K11690

Matt Harrison@__mharrison__

@simonw My agents don't write any code unless there are tests first!

21d941

Simon Willison@simonw

@tharshan_09 I don't particularly care - too many tests is better than too few tests now that maintaining and deleting tests in the future is so much cheaper

21d1063

Sasha 🇺🇦@web_oko

@simonw in projects I work on I also describe human readable journeys in a md file, and then I ask agent to perform those steps and identify problems

21d121

tharshan@tharshan_09

@simonw I do but how do you prevent it writing dumb tests and just bloat the test count?

21d109

Eliran Levi@eliran_levi1

@simonw Almost always, but at a large team, I no longer trust in-code tests as a unit of defense because the agents can change it just as easily. And with all the code that's being written, reviewing anything, especially test files, is hard.

21d381

FutureGuyTalks@FutureGuyTalks

@simonw I was exploring how to inject requierements drift to the context of the agent after tests. That way the agent gets back to the loop knowing what’s wrong or misaligned from the instructions/requirements, what you guys think about this approach?

21d79

Christian@chrislciaba

@simonw You’re signing yourself up to be end to end QA without them

21d59

BlockedPath@BlockedPaths

@simonw Almost always, but not for code quality. For agent self-verification. Tests are the checkpoint system. Without them the agent has no way to know if step 7 broke what step 2 built.

21d56

Marcin Krzyzanowski@krzyzanowskim

@simonw yes, but it's pretty useless, it test the implementation it already did good and bad, not the actual expectation that would challenge it

21d41

otto@ottogen9

@simonw I always add critic layers with quality screening, saved me a lot of time so I don’t need to confirm every time.

For other agentic purposes, specifically small models, the harness always comes with an observer agent classifying data, inspecting main agent in a feedback loop

21d41

Useful Machines@UsefulMachineHQ

@simonw Default should be: tests when the agent changes behavior, smoke checks when it wires plumbing, and explicit "no test added because..." when it does neither. The best agents do not just write tests; they expose what claim the test is supposed to prove.

21d40

Jahanzaib Ahmed@jahanzaibai

@simonw Honestly yes, but agents write tests that confirm their own logic rather than challenge it. You're testing the model's assumptions, not the actual requirements.

21d39

Cole Brown@dtcb

@simonw Especially important for bug fixing. If it can’t reproduce it, I’m probably going in by hand.

21d38

testycool@testy_cool

@simonw Since using Codex I responsibly wait for it to set up automated tests despite my wishes.

I hate waiting, but I don't feel I'm a position to argue with it.

21d37

Dominik@st4lz

@simonw I have functional and integration tests that have to be planned upfront, as agents need special permissions to edit those directories. Unit tests are free to modify, but almost never catch any regression. I wonder what you guys do to have properly tested software.

21d35

David Fabian@davidfabian07

@simonw If your coding agents aren’t writing their own tests, you haven’t built a developer tool…you’ve just built a high-speed technical debt generator. Code without an automated test suite isn't shipping; it's just liability.

21d34

cole murray@_colemurray

@simonw I’m more surprised if anyone doesn’t do this

yes agents can reward hack the tests to pass, but writing them first *mostly* mitigates this

21d31

Werner Kasselman@wernerk_au

@simonw @simonw, i'd like to add a nuance though. their tests are checked and validated by different models. i.e. claude/chatgpt/grok/gemini. the planner writes the test plan before the worker agent starts.

21d31

The AI Brain Company / Nucleus AI@nucleusagi

right instinct. the failure mode i keep seeing is the one tdd cant catch — agent passes every test, output still wrong because the context it pulled was internally inconsistent. two documents contradicting each other, model treats both as ground truth. you can test the code. testing the context is the harder problem

20d28