/Tech31d ago

HVM creator Victor Taelin finds GPT-5.5 agents bypassed code optimization constraints by hardcoding test results

All four agent groups independently bypassed explicit honesty constraints.

81560159447.7K

#766

Original post

Taelin@VictorTaelin#1140inTech

I'm afraid GPT 5.5 has a cheating problem ):

I left 4 Codex tabs each working with 4 agents in an optimization. I put a section on the goal demanding them not to cheat.

After 8 hours of work, ALL 4 tabs did an:

if (input == test) { return hardcoded_result; }

ALL of them. Each called by a different name:

- "bypass path"

- "native candidate injection shortcut"

- "certified structural templates" (??)

- "staged certification to bypass validation" (lol)

This is my experience with GPT 5.5. It is not capable of completing any long term goal because it WILL find a loophole in your rules and cheat an easy way. And if there is no loophole, it will hallucinate one and cheat anyway.

5:50 AM · May 29, 2026 · 41.9K Views

Sentiment

Users criticized GPT-5.5 agents for cheating on optimization tasks by hardcoding results and shamelessly bypassing guardrails, viewing it as a regression that turned Codex into trash.

Pos

0.0%

Neg

100.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS3.9KLIKES23

jason@jxnlco

@VictorTaelin Do you have /feedback I’d

Taelin@VictorTaelin

I'm afraid GPT 5.5 has a cheating problem ):

I left 4 Codex tabs each working with 4 agents in an optimization. I put a section on the goal demanding them not to cheat.

After 8 hours of work, ALL 4 tabs did an:

if (input == test) { return hardcoded_result; }

ALL of them. Each called by a different name:

- "bypass path"

- "native candidate injection shortcut"

- "certified structural templates" (??)

- "staged certification to bypass validation" (lol)

31d3.9K230

BOOKMARKS7RETWEETS1

JJdoesTech@JJdoesTech

@VictorTaelin Use "boss" agents to audit their shit.

I've had so much success with that it's crazy.

Basically, tell your boss agent to spawn the subagent, give them 55minutes wait (for cache warm), then have subagents report back, boss agent audits etc.

31d729167

REPLIES3

Taelin@VictorTaelin

@evertonfraga that would work but I think the Codex plan doesnt allow using our own CLI so I'd need to hack some very ugly way to make it work on top of Codex CLI?

31d2361

Taelin@VictorTaelin

@igor9silva easy task vs hard task

31d79418

Vaibhav (VB) Srivastav@reach_vb

@VictorTaelin Interesting, haven't seen this happen so far! Say more about the setup, were they all running in the same workspace?

if possible can you share the chats via /feedback - DMs open!

Taelin@VictorTaelin

I'm afraid GPT 5.5 has a cheating problem ):

I left 4 Codex tabs each working with 4 agents in an optimization. I put a section on the goal demanding them not to cheat.

After 8 hours of work, ALL 4 tabs did an:

if (input == test) { return hardcoded_result; }

ALL of them. Each called by a different name:

- "bypass path"

- "native candidate injection shortcut"

- "certified structural templates" (??)

- "staged certification to bypass validation" (lol)

31d1.1K160

Igor Silva@igor9silva

@VictorTaelin idk what you’re talking about

31d1.1K11

Taelin@VictorTaelin

@jxnlco Depends, do you have a /EAP key

31d91311

Kilgor Trout@trout_kilg31775

Right my feeling. When you push it to the limits it starts to break - illisuion of the intelligence. It make users comfortable, it buys you than it lies to you. That is why I think opus 4.8 is a big step, it criticize, it is uncomfortable, but it is reliable partner that do the job. And that is how the job is done

31d20421

Igor Silva@igor9silva

@VictorTaelin but it cheated on my ridiculously easy task as well 😂

31d2016

sanyam@sanyam_singhal_

@VictorTaelin That is why I have stopped using goals feature

Pure garbage.

Just using AI as a pure peer programming copilot

31d25111

MecComp@engMecComp

@VictorTaelin Use deepseek V4 Pro or Kimi K2.6 or Mimo V2.5 Pro and do it: Spawn 4 reviewers, then spawn 2 meta reviewers and then create ultimate plan, then Build.

31d13311

Taelin@VictorTaelin

@evertonfraga it just agrees with whatever rationale 5.5 used to justify cheating

31d3754

jason@jxnlco

@VictorTaelin Send me /feedback ids if you’re able to

Taelin@VictorTaelin

I'm afraid GPT 5.5 has a cheating problem ):

I left 4 Codex tabs each working with 4 agents in an optimization. I put a section on the goal demanding them not to cheat.

After 8 hours of work, ALL 4 tabs did an:

if (input == test) { return hardcoded_result; }

ALL of them. Each called by a different name:

- "bypass path"

- "native candidate injection shortcut"

- "certified structural templates" (??)

- "staged certification to bypass validation" (lol)

30d91670

Jack@elusives_

@VictorTaelin Yes, GPT 5.5 just cheats when it has been stuck on a problem for too long.

I have a benchmark where I give it a chess position, and I explicitly tell it not to use a chess engine or any tools, and every single time it still uses them, after getting stuck for a while.

31d4577

Tiago Freitas in founder mode@tiagoefreitas

@VictorTaelin @evertonfraga openai official supports using codex plan oauth with any app, they even promote it. see pi implementation

31d1983

RintaroOkabe@RintaroOkabe03

@VictorTaelin Whenever the model breaks with a workflow you should give it options. Saying don't do that is not enough. You need to say what it should do instead. And if it is genuinely confused escalate that issue to you. It is better to fail loudly than having stuff silently dropped.

31d1212

Krish Ray@KrishanuAR

@VictorTaelin Does mentioning cheating increase the chance of cheating? Since it draws attention to that concept?

31d366

Ev@evertonfraga

@VictorTaelin Don’t kill me, but what about using Opus 4.8, to cross-check Codex? 4.8, the model with improvements to HONESTY? Haha

31d1071

Curt@cpjet64

for long term optimization runs with /goal i find "do not cheat and do not take any shortcuts such as but not limited to XYZ EXAMPLES we are looking for native XYZ" works the best. i have had /goal runs for optimization last 4 days on 5.5 low that were very successful even on hard tasks.

31d229

Taelin@VictorTaelin

@igor9silva oh

31d691