/Tech6h ago

Engineer Uses AI Agents To Quiz Code Changes And Build Micro-Worlds

22306821719.6K

Original post

I’m giving a talk this week about why I have my coding agents quiz me on code changes, and how I build micro-worlds to understand what’s going on.

What questions / topics should I cover?

12:45 PM · Jun 28, 2026 · 15.5K Views

Sentiment

Many users praised the idea of AI agents quizzing code changes as innovative and effective, while a few dismissed it over concerns the related app does not work reliably.

Pos

83.3%

Neg

16.7%

9 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

AI.ENGINEERVia

#1735

Posts from X

Most Activity

VIEWS2.4KBOOKMARKS9LIKES33RETWEETS1REPLIES3

Geoffrey Litt@geoffreylitt

life hack: before starting a work session, tweet about whatever you'll be working on and stir up some conversation

then, whenever you reflexively check twitter, you'll end up still working on the thing instead of wasting time 😂

Geoffrey Litt@geoffreylitt

I’m giving a talk this week about why I have my coding agents quiz me on code changes, and how I build micro-worlds to understand what’s going on.

What questions / topics should I cover?

2h2.4K339

Geoffrey Litt@geoffreylitt

btw talk will is at AI Engineer World's Fair on Wed, 10:45am. come by if you'll be there!

https://www.ai.engineer/worldsfair/schedule?day=3&session=asn_slot_2026_07_01_breakout_track_06_1545_2026_06_03t11_33_40_915z

Geoffrey Litt@geoffreylitt

I’m giving a talk this week about why I have my coding agents quiz me on code changes, and how I build micro-worlds to understand what’s going on.

What questions / topics should I cover?

4h92975

Geoffrey Litt@geoffreylitt

Also I should clarify: I don't mean "quiz me on the requirements" although that is a great practice.

I mean "quiz me on whether I have correctly understood how it works, after it's done"!

Which is weirder but insanely valuable.

Geoffrey Litt@geoffreylitt

btw talk will is at AI Engineer World's Fair on Wed, 10:45am. come by if you'll be there!

https://www.ai.engineer/worldsfair/schedule?day=3&session=asn_slot_2026_07_01_breakout_track_06_1545_2026_06_03t11_33_40_915z

4h73961

Geoffrey Litt@geoffreylitt

@ario yeah great topic! i think often an objection to understanding is something like "many programmers don't need to understand memory layout anymore and it's fine"...

and the reality is, everyone has finite capacity for understanding so we have to make tradeoffs

6h21731

Geoffrey Litt@geoffreylitt

mm yeah. one way i like to think about is that you're constraining a specification space which is usually a bit underspec-d / vague in some way. and then the agent should help explain where it landed in that space and why: what are the dimensions, what choices did it make.

it can do this either before the task (asking you questions) or afterwards (telling you what important choices it filled in)

personally i find that asking questions before works very well w/ modern models, but they're not always very good at surfacing what important choices they made after the fact. curious if you agree.

6h19321

Clément Miao@clementmiao

@geoffreylitt I do both of those too, but my goal is very much so to do these when planning code changes, and after the changes are made, so as to stress test where the agent made assumptions that are wrong, or went ahead with the work with low confidence. Curious how you solve these problems

6h2041

Geoffrey Litt@geoffreylitt

@stevekrouse I’m just spreading the gospel of Steve here

4h11621

Bushra Farooqui 📖 🕯️@startuployalist

@geoffreylitt Would love to be a fly on the wall and attend if there’s a spot. Noticed this with the initial Claude Artifacts

5h1361

Ario Jafarzadeh@ario

@geoffreylitt Calibrating the ROI on understanding. Some kind of rubric that provides as a compass for when a deeper understanding pays off vs a waste of time

6h2353

Clément Miao@clementmiao

I found the title of your talk very fascinating because it wasn't clear to me if you are talking "understanding" as the user, or "understanding" as the agent, but maybe it's both.

I find that asking questions before works well if and only if the person asking the questions has a good understanding of the domain space / generally very thoughtful.

Similarly, I find the questions in plan mode from the LLM to often be insufficient, and gives a level of unjustified confidence as the user, that you can only realize was unjustified sometimes after the fact.

I do agree that the model is often unable to be critical of its own decision-making post facto. I'm going to play around with critic models more, and see how that could potentially improve the harness.

In general, my impression is that if these systems were better able to "understand", we would need "verifiable rewards" less often, which could unlock a lot.

5h1311

Geoffrey Litt@geoffreylitt

@oznova_ This is my full code explainer prompt

https://gist.github.com/geoffreylitt/a29df1b5f9865506e8952488eac3d524

3h61

Steve Krouse@stevekrouse

@geoffreylitt Great title

4h2161

Alexander Benz@alexanderbenz

@geoffreylitt curious what you do when the agent passes the quiz but still confidently gets it wrong. that's the harder case than obvious errors.

4h86

Geoffrey Litt@geoffreylitt

@oznova_ This is my quiz prompt -- works surprisingly well in my experience as long as you use Claude models!

Only took a few iterations to get this to "good enough", I'm sure it could be made much better.

My evals is my coworkers saying "this skill rocks" :P

3h141

John Lam@john_lam

i see this as two understandings - understanding what it does (including all the corner cases) and understanding how it does it. i see the former as being really valuable in the feedback loop during design - i worry less about the latter because that is easier to fix if the "what it does" is defined correctly and there aren't any latent conflicts between human and model.

4h31