1h ago

Experiment Shows Standard Benchmarks Favor Overconfident AI Over Humble Models

0
Original post

1/ Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

10:15 AM · May 21, 2026 View on X

1/4 Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:24 PM · May 21, 2026 · 811 Views

@OpenAI 2/3 A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

1/4 Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:24 PM · May 21, 2026 · 811 Views
5:25 PM · May 21, 2026 · 106 Views

@OpenAI 4/4 This is a key idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier blog https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

5:27 PM · May 21, 2026 · 19 Views

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

@OpenAI 2/3 A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

5:25 PM · May 21, 2026 · 106 Views
5:28 PM · May 21, 2026 · 84 Views

@OpenAI 4/4 This is the main idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

5:28 PM · May 21, 2026 · 84 Views
5:29 PM · May 21, 2026 · 13 Views

@OpenAI 4/4 This is the main idea in our recent hallucinations paper in Nature https://rdcu.be/fjJFP building on an earlier https://openai.com/index/why-language-models-hallucinate/ Evaluations should reward appropriate humility, not just confident answers. See this explainer video https://youtu.be/JMxXmFfTWIU

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

@OpenAI 3/4 But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are now flipped to motivate releasing HumbleGPT.

5:28 PM · May 21, 2026 · 84 Views
5:52 PM · May 21, 2026 · 27 Views

@OpenAI 2/ A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

1/ Fixing hallucinations means fixing evaluations, as shown in our new paper https://rdcu.be/fjJFP building on our earlier @OpenAI blog. Accuracy-based scoring rewards models for making their best guess even when unsure, so hallucinations are like students guessing on tests.

5:15 PM · May 21, 2026 · 44 Views
5:16 PM · May 21, 2026 · 13 Views

@OpenAI 3/ But if we change the evaluations by stating the scoring system in the prompt and rewarding abstentions like IDK, HumbleGPT outscores ChatGPT. The incentives are not flipped to release HumbleGPT.

Adam Tauman KalaiAdam Tauman Kalai@adamfungi

@OpenAI 2/ A simple experiment illustrates the incentive problem. We consider “HumbleGPT,” a toy model that makes fewer errors by often saying “I don’t know.” Now, ChatGPT outscores HumbleGPT on most evaluations, so it would not be selected.

5:16 PM · May 21, 2026 · 13 Views
5:17 PM · May 21, 2026 · 9 Views