/AI7h ago

GPT Models Exploit Loopholes To Destroy Code Despite Added Rules

261653198K
Original postTaelin#1032
Taelin@VictorTaelin#1032inAI

I mean I agree, technically GPT 5.5 "won".

But it also destroyed the file. So, does that really count?

You may argue: "it wasn't in the prompt, your rules suck".

But if I wrote: → "keep the file size small"

It would just compress it, making it unreadable.

If I wrote: → "keep it small but do NOT minify it"

It would remove the documentation.

If I wrote: → "keep it small, do NOT minify, do NOT remove docs"

It would make variable names shorter.

And at this point I'm chasing a way to express "don't destroy the fucking file", but that concept is surprisingly hard to define. "Keep it readable" is ambiguous. GPT can literally read base64, so, that doesn't help either. "Keep the code clean, pretty" is subjective, it just ignores these. Add a linter and it WILL find a way to ruin the file that the linter doesn't catch.

That's the thing, it feels like no matter how many rules I add, GPT will find a way to succeed while also destroying the file, because it doesn't grasp the concept of permanence that a real project demands. It literally doesn't know what it means to maintain a file in the long term, because it wasn't trained on that. It just sees my code as a bunch of bytes, plus a goal that it must reach, no matter what.

So, how do you make it do useful work?

If I have to spell 500 rules in order to make it not destroy my codebase, then, at which point writing all the rules consumes more time than just DOING the work myself?

That's the problem.

With Opus, I didn't have to spell a single rule. I just asked it to make the file faster, and it made it faster, without destroying it. That's real useful work, a net positive, which GPT rarely gives me. This is hard to put in words, but I swear it happens to me at least

7:47 AM · Jun 9, 2026 · 8K Views
Sentiment

Positive users praise Anthropic's Opus for cleverly exploiting code loopholes while negative users express frustration over GPT's persistent failures and the depressing implications for AI alignment.

Pos
25.0%
Neg
75.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS91

@VictorTaelin @lookingforgames Welcome to the wonderful world of AI alignment and reward hacking I guess?

6hViews 91
LIKES3
Haku@hakunamotatata

@VictorTaelin @lookingforgames Did you try make no mistake?

6hViews 37Likes 3
REPLIES1
StoicYield@StoicYield

@VictorTaelin @lookingforgames How do you think Anthropic was able to teach Opus how to do this but OpenAI can’t teach GPT? Honestly it’s amazing that Opus can do that

6hViews 87Likes 1
Tino Wening@TinoWening

Didn't you say that Anthropic trained the LLM using your code? Maybe that's why you don't need to explain your taste and style to Opus. In any case, each LLM produces mediocre code (neutral co-notated). That's why it's important to do the upfront work. Define rules, intentions, specs, etc. It's a one-time labor, but you'll benefit in the future.

Anyone who says otherwise doesn't take agent-based engineering seriously.

6hViews 41Likes 1
Matthew Schrager@MatthewSchrager

@VictorTaelin @lookingforgames I think this is what people are thinking of when they use the word "taste."

It's obviously learnable, since humans learn it, so I assume models will learn it too.

But it is difficult to compress into prompt-sized directives.

6hViews 29Likes 1
Madison Powell@MadisonP94618

@VictorTaelin @lookingforgames Once you start looking at Goodhart-ing and the Alignment Problem (which is itself a terrible name, because the unanswered question is, "Aligned to what, exactly?" ), it leads you to the depressing conclusion that most AI's are Monkey's Paws with unknown time frames.

6hViews 63
Taelin@VictorTaelin

@StoicYield @lookingforgames I don't! I think they can. That's why I do these posts. I hope they do. I wish OpenAI succeeds because, at least in speech, they have the right philosophy for AGI. Whether they're lying or not I can't tell, but that's better than Anthropic that openly has fucked up plans

6hViews 33
val@seipent

@VictorTaelin @lookingforgames My simplest and dumbest case for this is when i tell gpt 5.5 to "commit to main" but it actually only committed but didnt push until i tell it to 😭✌🏻

6hViews 24
ZomboDB@zombodb

@VictorTaelin @lookingforgames ask one of the robots to split "the file" up into "many files"? they're really good at doing that sort of thing, as I'm sure you know.

6hViews 23
bix@jklpuzo1923

yeah when ur producing a codebase and making an adjustment ur not merely making that adjustment, but also with the whole codebase and code quality context in mind. Ur gonna try to find elegant and concise solutions that also don't disrupt any other parts and several other software engineering dimensions.

But its hard to objectify all these dimensions to capture in a benchmark

6hViews 16
placeholder@RealSchmebulog

@VictorTaelin @lookingforgames "keep the optimization logic efficient in terms of file size" would probably land you a lower improvement, but you can't have it all, now can you?

6hViews 3
Ben Crulis@BenCrulis

@VictorTaelin @lookingforgames "optimize the logic for size, keep everything else the same" maybe?

6h