Experiment Shows Standard Benchmarks Favor Overconfident AI Over Humble Models

POST

1/4 Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:24 PM · May 21, 2026 · 811 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 2/3 A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

Adam Tauman Kalai@adamfungi

1/4 Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:24 PM · May 21, 2026 · 811 Views

5:25 PM · May 21, 2026 · 106 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 4/4 This is a key idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier blog https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

5:27 PM · May 21, 2026 · 19 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

Adam Tauman Kalai@adamfungi

@OpenAI 2/3 A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

5:25 PM · May 21, 2026 · 106 Views

5:28 PM · May 21, 2026 · 84 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 4/4 This is the main idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

Adam Tauman Kalai@adamfungi

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

5:28 PM · May 21, 2026 · 84 Views

5:29 PM · May 21, 2026 · 13 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 4/4 This is the main idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

Adam Tauman Kalai@adamfungi

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

5:28 PM · May 21, 2026 · 84 Views

5:52 PM · May 21, 2026 · 27 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 2/ A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

Adam Tauman Kalai@adamfungi

1/ Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:15 PM · May 21, 2026 · 44 Views

5:16 PM · May 21, 2026 · 13 Views

REPLY

#1587Adam Tauman Kalai@ADAMFUNGI

@OpenAI 3/ But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are not flipped to release HumbleGPT.

Adam Tauman Kalai@adamfungi

@OpenAI 2/ A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

5:16 PM · May 21, 2026 · 13 Views

5:17 PM · May 21, 2026 · 9 Views