Teortaxes says ChatGPT and DeepSeek fail a simple reasoning prompt about walking to a car wash
ChatGPT suggested walking to the wash and driving back.
@teortaxesTex No but I like these very much. I thought the recent "design a unique UUID service" one is of a type that is simple and both gets at a fairly advanced capability: telling the user their idea is bad.
Btw do you have a personal ranking of LLM gotchas? I think that "50m to carwash" is among the best (as a test) and the worst (as a signal of stupidity). It's not a memorized riddle. It's not about tokenizers. It's not spatial. It's straight up a failure to comprehend a situation.
Regarding the car wash one: it feels a little more like overfitting to me than a simple failure to comprehend. Maybe that's a distinction with no difference, though.
It reminds of the kinds of questions humans will sometimes answer with the obvious and wrong answer because they identify the problem as too simple to think about, and spit out the simple pattern matching answer, like:
If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?
Or
How many animals of each kind did Moses take onto the ark?
If the brain doesn't engage it's easy to answer wrong.
@teortaxesTex No but I like these very much. I thought the recent "design a unique UUID service" one is of a type that is simple and both gets at a fairly advanced capability: telling the user their idea is bad.