/AI1d ago

Perry E. Metzger, Alliance for the Future co-founder, argues critics claiming constant AI hallucinations ignore real-world coding success

He compared AI skeptics to someone denying a commuter's car.

2001.1K8814180.8K
Original postRobin Hanson#884
Perry E. Metzger@perrymetzger

I have said this before, but to those of us using AI systems to get lots of work done reliably and quickly, the people who post online about how AIs still hallucinate constantly, about how they can’t write code, etc., seem equivalent to people trying to convince you that the car you drive to work every day doesn’t exist.

You tell them things like “but I drive a car. I paid money for it. I buy gasoline for it. I could not possibly be working twenty miles away from home if I didn’t have the car?” and they reply that you are imagining having a car, or that you’re lying because you work for a car company.

It is as though these people live in a completely different reality.

8:36 AM · Jun 6, 2026 · 80.1K Views
Sentiment

Positive users praise AI agents for major productivity gains in coding and daily work while negative users call the tools unreliable and harmful to customer service.

Pos
56.1%
Neg
43.9%
51 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS5.8KLIKES96RETWEETS2REPLIES19
Arcadiy Ivanov@IAmArcIvanov

Those of us using AI systems to get lots of work done quickly do know that AIs still hallucinate constantly and can't reliably write code of the type there isn't a large sample of in the training corpus, that they constantly forget which directory they are in and get lost trying to juggle two branches at a time and that they require constant supervision etc.

Both are possible, there is no juxtaposition here.

1dViews 5.8KLikes 96Bookmarks 2
BOOKMARKS2
Fjärt@Poofarmer69

@perrymetzger Here is an open AI article about it

https://openai.com/index/why-language-models-hallucinate/

And the accompanying research paper

https://arxiv.org/abs/2509.04664

1dViews 686Likes 15Bookmarks 2
Perry E. Metzger@perrymetzger

@IAmArcIvanov I have not had an experience anything like yours. None of the things you suggest have occurred for me in a long time.

1dViews 4.7KLikes 73
Fjärt@Poofarmer69

@perrymetzger The top AI researchers admit they hallucinate and talk candidly about their limitations. You aren’t a serious person

1dViews 2.6KLikes 54Bookmarks 1
Perry E. Metzger@perrymetzger

This is, of course, manifestly false. If you use a thinking model with web access, hallucination has essentially stopped. There are of course some howlingly funny lacunae left, like the typical “should I walk to the car wash” benchmark, but I can’t even remember the last time I saw a serious hallucination, and I use these models essentially all day. Things have come very far since GPT 3, and if you’re not aware of that, well, tough luck for you.

Two years ago or so, one of the earliest thinking models successfully diagnosed an illness I had that it stumped my doctors. What it figured out was completely correct, verifiably so, and almost certainly saved my health. You probably don’t believe this is possible, but that’s fine. You can tell someone who drives a car every day that cars don’t exist all you want, they’ll just think you’re crazy.

1dViews 2.2KLikes 49
Reed Rawlings@reed_rawlings

@perrymetzger All the llms still hallucinate endlessly and happily make things up. Anyone who says otherwise is experiencing psychosis.

1dViews 2.3KLikes 55
Arcadiy Ivanov@IAmArcIvanov

@perrymetzger Opus 4.6 with Claude Code. Occurs all the time. 4.7 and 4.8 are even worse + waste more tokens. Perhaps we're writing different types of code.

1dViews 1.6KLikes 24Bookmarks 1
Perry E. Metzger@perrymetzger

@kuza55 Extreme levels of testing. I put tests around everything.

1dViews 1.7KLikes 17Bookmarks 1
kuza55@kuza55

@perrymetzger Codex still regularly makes mistakes.

100% of my code is AI at this point, but the length of the leash I give it still seems to matter.

I would love to hear details about what you're building, how you're building it and how you're guaranteeing reliability.

1dViews 2.1KLikes 7

@perrymetzger Rather like: A: "I have this power drill I use to put lots of holes in wood reliably and quickly, you should try it!" B: "But my job doesn't involve putting holes in things." C: "But my job is putting holes in concrete." A: "Why are B and C such Luddites?! You must use my tool!"

1dViews 1.3KLikes 29
Perry E. Metzger@perrymetzger

@Poofarmer69 Show me a “top AI researcher” that claims that a thinking model with web access routinely hallucinates.

1dViews 2.3KLikes 16
Perry E. Metzger@perrymetzger

@IAmArcIvanov Try Codex CLI for a few days. See if you like it better. You can always cancel the subscription.

1dViews 1.3KLikes 11
Perry E. Metzger@perrymetzger

@IAmArcIvanov @spion The harness also makes a big difference. That said, I have friends that swear by harnesses other than Codex CLI, like Open Code. I have not tried those.

1dViews 1KLikes 10Bookmarks 1
Perry E. Metzger@perrymetzger

@Poofarmer69 Neither of these talk about whether a reasoning model with web access hallucinates at all. You don’t even understand what I was talking about, do you? You’re just throwing stuff at the wall and hoping it sticks.

1dViews 670Likes 14Bookmarks 1
Capacitard@capacitarded

@perrymetzger I’ve come to believe that most of them are using free tiers, don’t know how to prompt, have memory enabled full of contradictory context, use a single thousand message session for everything, or had a bad experience with ChatGPT years ago and refuse to revisit their assumptions

1dViews 209Likes 11Bookmarks 1
Perry E. Metzger@perrymetzger

@Poofarmer69 You have no idea what you’re talking about at all, do you? You didn’t even understand the thing that I was saying which you think you have somehow refuted. I will not be wasting more time on this.

1dViews 630Likes 18
Perry E. Metzger@perrymetzger

@reed_rawlings Being wrong half a percent of the time is a lot better than human beings in most domains.

1dViews 481Likes 18
Ole Persson@chromotorque

@perrymetzger Claude can't get a screenfull of text without errors. That's a fact. The story you tell is incongruent with that fact.

And I've witnessed so many people bullshitting themselves into all sorts of things that I do not trust self-reports.

1dViews 1KLikes 20
spion@spion

@perrymetzger @IAmArcIvanov Yes, Opus has gone downhill ever since 4.5 - it can tackle more ambitious tasks but the rate of really bad errors has gone up drastically; 4.7+ is a net waste of time.

gpt 5.5 in codex doesn't seem to have that issue so far - definitely not to that extent.

1dViews 1.1KLikes 11

@perrymetzger I honestly think the divide is all about expectations. If they expect AI to be a magic Oracle that never makes mistakes, they are disappointed. If they recognize that both AI and humans use iteration toward a goal as the core way to do things, and mistakes are expected, AI works.

1dViews 345Likes 13
Load more posts