Rahul G. argues English-to-code AI models shift the software development bottleneck from writing to risk-managed code review

VIEWS35.1KBOOKMARKS301LIKES617REPLIES41

Strongly agree with all of the above. We are entering the next era of code, where the model is able to generate correct code for an increasingly large percent of tasks.

Our job is to make sure the model and our systems have the right guardrails, then to run Claude Code + an advanced model + a verifier in a loop and feed it tasks (or, give the model the data it needs to generate tasks), finding and getting rid of bottlenecks along the way.

12h35.1K617301

RETWEETS104

rahul@rahulgs

1. as a mental model it is more correct to think of fable+ class models as english -> code interpreters - converts your idea into code into "correct" code regardless of problem complexity and output complexity (diff size). Fable 5 will be the worst of this new class of models

2. diff size/complexity is to be managed purely for review: small diffs - in high risk areas of code (auth/identity/data access/network access/money movement) large diffs for code that can be empirically verified (frontend/backend plumbing/code without network or db access/performance code that can be empirically verified)

3. time it takes to ship software is completely disconnected from time to produce the PR - how long the work takes depends fully on ability to review/merge code while managing risk at scale

4. solving the bottlenecks for above matter enormously- linters/testing/CI/shadow mode verification/empirical verification

5. agency matters enormously- what are the biggest bottlenecks to speeding up the loop and eliminating them? what are the problems that need solving and when do they need solving? what does it take to the solution to all of them today?

6. deep understanding of the full stack matters enormously- what problems are worth pursuing? is there a higher level of problem abstraction to address first? should I give it the sub-sub task, the sub task, or the task itself. what are the major risks with this PR (order of importance: security holes/correctness holes/performance holes). is there a higher speed way of producing data that allows me to merge this? should this be run in shadow or in a sandbox or a flag. understanding every line of logic may not be needed but understanding and managing risk matters enormously.

7. the cost of complexity itself is changing. it might be now worth "maintaining" 50% more code to get a 5% performance win. getting the right abstractions matter less because larger refactors are less tedious. code quality nits become huge drag. very likely, a much smarter model will be maintaining your code so worth taking on more technical debt now. taking the time to hand architect and rebuild systems comes with an enormous cost of velocity

8. if it quacks like a duck and walks like a duck, it's a duck. For low risk cases, it might be more sane to treat code chunks (services / functions) as a black box, like we do for neural networks: do full empirical verification only: has code produced correct outputs for the last 10,100,1000,10k inputs ? can we quarantine this large piece of code - no outbound access to network / database ? what happens when this code is wrong? do we get hacked/or crash(memory/cpu)/is an inconvenience? is it internal facing or external? what can we do to address these risks?

9. eventually, logical verification (line by line review) will come at an enormous cost- save it for where it matters and build systems that are tolerant to empirical verification. is there a decorator that prevents db / network access? correctness bugs are significantly easier to rectify than access bugs

10. what are the rails that allow for even faster iteration? code permissions can be opt in - db writes, db reads, network egress (to where?), PII access. how long does it take to get shadow mode data? how many PRs can be tested? What are the categories of diffs

13h217.5K1.4K1.9K

Paweł Huryn@PawelHuryn

@rahulgs One line I'd add: traces prove observed behavior, not intent. Document the solution down first (with agents), then point automated reviews at those docs.

You can't review permissions without the ground truth, such as permissions.md

More:

8h1.7K24

andrew gao@itsandrewgao

good read!

similarly, a previous rule of coding was "code is read more than it is written" - that is no longer the case and best practices may change accordingly