9h ago

Teortaxes says ChatGPT and DeepSeek fail a simple reasoning prompt about walking to a car wash

ChatGPT suggested walking to the wash and driving back.

0
Original post

Btw do you have a personal ranking of LLM gotchas? I think that "50m to carwash" is among the best (as a test) and the worst (as a signal of stupidity). It's not a memorized riddle. It's not about tokenizers. It's not spatial. It's straight up a failure to comprehend a situation.

6:04 PM · May 30, 2026 View on X

@teortaxesTex No but I like these very much. I thought the recent "design a unique UUID service" one is of a type that is simple and both gets at a fairly advanced capability: telling the user their idea is bad.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Btw do you have a personal ranking of LLM gotchas? I think that "50m to carwash" is among the best (as a test) and the worst (as a signal of stupidity). It's not a memorized riddle. It's not about tokenizers. It's not spatial. It's straight up a failure to comprehend a situation.

1:04 AM · May 31, 2026 · 5.8K Views
6:06 AM · May 31, 2026 · 393 Views

Regarding the car wash one: it feels a little more like overfitting to me than a simple failure to comprehend. Maybe that's a distinction with no difference, though.

It reminds of the kinds of questions humans will sometimes answer with the obvious and wrong answer because they identify the problem as too simple to think about, and spit out the simple pattern matching answer, like:

If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?

Or

How many animals of each kind did Moses take onto the ark?

If the brain doesn't engage it's easy to answer wrong.

xlr8harderxlr8harder@xlr8harder

@teortaxesTex No but I like these very much. I thought the recent "design a unique UUID service" one is of a type that is simple and both gets at a fairly advanced capability: telling the user their idea is bad.

6:06 AM · May 31, 2026 · 393 Views
6:14 AM · May 31, 2026 · 298 Views