/AI13h ago

Study Shows AI Chatbots Hit 90% Accuracy on Fresh News but Falter in Open Answers

--0--
Original posts
Reposts
Original post
Rohan Paul@rohanpaul_ai#1032inAI

AI chatbots can answer fresh news well, but their weakest failures hide inside their confidence.

Best systems are surprisingly good at recent news when the question is clean and multiple choice.

But it also shows that this success is fragile, because the same systems get worse when they must answer freely, when the news is in Hindi, or when the user’s question contains a false assumption.

The best systems crossed 90% accuracy on multiple-choice questions about events reported only hours earlier, which means retrieval-augmented AI has moved from stale encyclopedia mode toward live information work.

That accuracy is not the same thing as reliability, because the systems were far worse when answers had to be produced freely

these models usually do not fail because they cannot “think,” but because they land on the wrong evidence.

More than 70% of errors came from retrieval failures or source divergence, where the system found something nearby but not exact, then answered faithfully from the wrong article, wrong language, wrong scope, or wrong timestamp.

----

Paper Link – arxiv. org/abs/2605.22785

Paper Title: "Evaluating Commercial AI Chatbots as News Intermediaries"

6:47 PM · May 31, 2026 · 3K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
RETWEETS4
Rohan Paul@rohanpaul_ai

AI chatbots can answer fresh news well, but their weakest failures hide inside their confidence.

Best systems are surprisingly good at recent news when the question is clean and multiple choice.

But it also shows that this success is fragile, because the same systems get worse when they must answer freely, when the news is in Hindi, or when the user’s question contains a false assumption.

The best systems crossed 90% accuracy on multiple-choice questions about events reported only hours earlier, which means retrieval-augmented AI has moved from stale encyclopedia mode toward live information work.

That accuracy is not the same thing as reliability, because the systems were far worse when answers had to be produced freely

these models usually do not fail because they cannot “think,” but because they land on the wrong evidence.

More than 70% of errors came from retrieval failures or source divergence, where the system found something nearby but not exact, then answered faithfully from the wrong article, wrong language, wrong scope, or wrong timestamp.

----

Paper Link – arxiv. org/abs/2605.22785

Paper Title: "Evaluating Commercial AI Chatbots as News Intermediaries"

13hViews 3KLikes 26Bookmarks 5