/AI1d ago

Perry E. Metzger, Alliance for the Future co-founder, argues critics claiming constant AI hallucinations ignore real-world coding success

He compared AI skeptics to someone denying a commuter's car.

2001.1K8814180.8K

#311

Original post

Robin Hanson#884

Perry E. Metzger@perrymetzger

I have said this before, but to those of us using AI systems to get lots of work done reliably and quickly, the people who post online about how AIs still hallucinate constantly, about how they can’t write code, etc., seem equivalent to people trying to convince you that the car you drive to work every day doesn’t exist.

You tell them things like “but I drive a car. I paid money for it. I buy gasoline for it. I could not possibly be working twenty miles away from home if I didn’t have the car?” and they reply that you are imagining having a car, or that you’re lying because you work for a car company.

It is as though these people live in a completely different reality.

8:36 AM · Jun 6, 2026 · 80.1K Views

/AI1d ago

Perry E. Metzger, Alliance for the Future co-founder, argues critics claiming constant AI hallucinations ignore real-world coding success

He compared AI skeptics to someone denying a commuter's car.

2001.1K8814180.8K

#311

Original post

Robin Hanson#884

Perry E. Metzger@perrymetzger

It is as though these people live in a completely different reality.

8:36 AM · Jun 6, 2026 · 80.1K Views

Sentiment

Positive users praise AI agents for major productivity gains in coding and daily work while negative users call the tools unreliable and harmful to customer service.

Pos

56.1%

Neg

43.9%

51 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS5.8KLIKES96RETWEETS2REPLIES19

Arcadiy Ivanov@IAmArcIvanov

Those of us using AI systems to get lots of work done quickly do know that AIs still hallucinate constantly and can't reliably write code of the type there isn't a large sample of in the training corpus, that they constantly forget which directory they are in and get lost trying to juggle two branches at a time and that they require constant supervision etc.

Both are possible, there is no juxtaposition here.

1d5.8K962

BOOKMARKS2

Fjärt@Poofarmer69

@perrymetzger Here is an open AI article about it

https://openai.com/index/why-language-models-hallucinate/

And the accompanying research paper

https://arxiv.org/abs/2509.04664

1d686152

Perry E. Metzger@perrymetzger

@IAmArcIvanov I have not had an experience anything like yours. None of the things you suggest have occurred for me in a long time.

1d4.7K73

Fjärt@Poofarmer69

@perrymetzger The top AI researchers admit they hallucinate and talk candidly about their limitations. You aren’t a serious person

1d2.6K541

Perry E. Metzger@perrymetzger

This is, of course, manifestly false. If you use a thinking model with web access, hallucination has essentially stopped. There are of course some howlingly funny lacunae left, like the typical “should I walk to the car wash” benchmark, but I can’t even remember the last time I saw a serious hallucination, and I use these models essentially all day. Things have come very far since GPT 3, and if you’re not aware of that, well, tough luck for you.

Two years ago or so, one of the earliest thinking models successfully diagnosed an illness I had that it stumped my doctors. What it figured out was completely correct, verifiably so, and almost certainly saved my health. You probably don’t believe this is possible, but that’s fine. You can tell someone who drives a car every day that cars don’t exist all you want, they’ll just think you’re crazy.

1d2.2K49

Reed Rawlings@reed_rawlings

@perrymetzger All the llms still hallucinate endlessly and happily make things up. Anyone who says otherwise is experiencing psychosis.

1d2.3K55

Arcadiy Ivanov@IAmArcIvanov

@perrymetzger Opus 4.6 with Claude Code. Occurs all the time. 4.7 and 4.8 are even worse + waste more tokens. Perhaps we're writing different types of code.

1d1.6K241

Perry E. Metzger@perrymetzger

@kuza55 Extreme levels of testing. I put tests around everything.

1d1.7K171

kuza55@kuza55

@perrymetzger Codex still regularly makes mistakes.

100% of my code is AI at this point, but the length of the leash I give it still seems to matter.

I would love to hear details about what you're building, how you're building it and how you're guaranteeing reliability.

1d2.1K7

Tom Swiss, HMSH 🗸@tom_swiss

@perrymetzger Rather like: A: "I have this power drill I use to put lots of holes in wood reliably and quickly, you should try it!" B: "But my job doesn't involve putting holes in things." C: "But my job is putting holes in concrete." A: "Why are B and C such Luddites?! You must use my tool!"

1d1.3K29

Perry E. Metzger@perrymetzger

@Poofarmer69 Show me a “top AI researcher” that claims that a thinking model with web access routinely hallucinates.

1d2.3K16

Perry E. Metzger@perrymetzger

@IAmArcIvanov Try Codex CLI for a few days. See if you like it better. You can always cancel the subscription.

1d1.3K11

Perry E. Metzger@perrymetzger

@IAmArcIvanov @spion The harness also makes a big difference. That said, I have friends that swear by harnesses other than Codex CLI, like Open Code. I have not tried those.

1d1K101

Perry E. Metzger@perrymetzger

@Poofarmer69 Neither of these talk about whether a reasoning model with web access hallucinates at all. You don’t even understand what I was talking about, do you? You’re just throwing stuff at the wall and hoping it sticks.

1d670141

Capacitard@capacitarded

@perrymetzger I’ve come to believe that most of them are using free tiers, don’t know how to prompt, have memory enabled full of contradictory context, use a single thousand message session for everything, or had a bad experience with ChatGPT years ago and refuse to revisit their assumptions

1d209111

Perry E. Metzger@perrymetzger

@Poofarmer69 You have no idea what you’re talking about at all, do you? You didn’t even understand the thing that I was saying which you think you have somehow refuted. I will not be wasting more time on this.

1d63018

Perry E. Metzger@perrymetzger

@reed_rawlings Being wrong half a percent of the time is a lot better than human beings in most domains.

1d48118

Ole Persson@chromotorque

@perrymetzger Claude can't get a screenfull of text without errors. That's a fact. The story you tell is incongruent with that fact.

And I've witnessed so many people bullshitting themselves into all sorts of things that I do not trust self-reports.

1d1K20

spion@spion

@perrymetzger @IAmArcIvanov Yes, Opus has gone downhill ever since 4.5 - it can tackle more ambitious tasks but the rate of really bad errors has gone up drastically; 4.7+ is a net waste of time.

gpt 5.5 in codex doesn't seem to have that issue so far - definitely not to that extent.

1d1.1K11

Nairebis - e/max-acc@Nairebis

@perrymetzger I honestly think the divide is all about expectations. If they expect AI to be a magic Oracle that never makes mistakes, they are disappointed. If they recognize that both AI and humans use iteration toward a goal as the core way to do things, and mistakes are expected, AI works.

1d34513