How much better do the models have to get before you'll stop reading the code?
T3 Stack creator Theo Browne asks how capable AI must get before developers stop reviewing its code
Story Overview
T3 Stack creator Theo Browne is probing the future point where AI code generation might earn enough developer confidence that human review becomes optional, a question that has fueled fresh discussion on whether today's models are anywhere close to that bar.
Trust numbers show persistent skepticism
Recent surveys put developer trust in AI output accuracy at just 29 percent, with 46 percent actively distrusting the results, and AI-assisted pull requests merging at roughly half the rate of human ones.
Benchmarks leave real-world gaps unclosed
Top models hit around 67 percent pass@1 on HumanEval and higher on some verified suites, yet issues like logic errors, security vulnerabilities in nearly half of generated code, and lower scores on harder tests mean the capability threshold for skipping reviews stays undefined.
Positive users are excited that AI models could make code review obsolete like compilers replaced assembly, while negative users insist reviewing code is essential to own and understand what ships.
No Digg Deeper questions have been answered for this story yet.
Most Activity
I'll be honest, I barely even read the code back when I wrote it by hand...
How much better do the models have to get before you'll stop reading the code?
At this point I’m genuinely convinced most of you would have kept reading the assembly code after C got popular
How much better do the models have to get before you'll stop reading the code?
@theo You read the code?
How much better do the models have to get before you'll stop reading the code?

@zeeg Bold coming from someone whose code is gpt-3.5 level

@theo two orders of magnitude with actual real verification capabilities

@theo the problem to solve here is the verification not the code

@theo @WallisDev you have little to lose
i - along with every other major business in the world - have a lot to lose
all it takes is a shitty data migration, a simple bypass to slip through and people face immense liability

If only there was a product to make it easier to identify bugs and fix them...
Jokes aside, there's obviously differences at different types and scales of software. I just know there's a lot of devs still reading code on sideprojects as if it matters. I'd go as far as saying that the majority of code at most companies is not as important as the company pretends it is (i.e. company blog, documentation sites, sdks that are just api wrappers, throwaway internal tools, api scaffolding, etc)

@theo got rejected in a recent interview for telling them its pointless to read code at this point

@theo about tree fiddy

@zeeg @WallisDev I spend a lot of time conversing with the model and getting a spec that we’re both aligned on. Once I’m confident in the surface and the model’s understanding, it’s genuinely hard to care about the details for me

let alone that a few sentences will never appropriately describe the thing you're trying to build - nor will generating a spec from those same few sentences. you need a massive speed increase on top of a massive precision/capability increase
(+a ton of supporting software that is scaleable and cheap that doesnt exist today to verify)

@theo You still read code ?

@glcst I wrote this before seeing your reply lol

@theo My current personal project is my benchmark, and I still feel the need to review the code. C is a tricky bugger for code that works and feels good to use as a library

I agree w him
I’d need some kind of test suite to give me confidence into putting it in actual production software
The question is too broad. different kinds of projects require different levels of scrutiny (ie. file system, database or core data structure? I hope you know how it fails)

@theo I catch issues in ai generated code all the time. But maybe I’m not holding it right…

i mean that makes sense why they’d reject you. the better thing to have said was that you don’t read every line of code, you have systems in place to catch e2e verification and other ways in your own workflow how you minimize the results of bad implementation or code. saying it’s pointless rn, is just not true. we’re not there yet it feels like it. but it def still makes mistakes. most ppl aren’t reading every line, that would throttle the velocity of using these tools.

@theo code was invented to be read. it is logic in english. more similar than different to prose. i’ll stop reading it when it reaches a point where code is no longer made to be read.

@theo You guys still read code?