/AI13h ago

Study Shows AI Chatbots Hit 90% Accuracy on Fresh News but Falter in Open Answers

728563.1K

Original posts

#1032

Reposts

#1032

Original post

Rohan Paul@rohanpaul_ai#1032inAI

AI chatbots can answer fresh news well, but their weakest failures hide inside their confidence.

Best systems are surprisingly good at recent news when the question is clean and multiple choice.

But it also shows that this success is fragile, because the same systems get worse when they must answer freely, when the news is in Hindi, or when the user’s question contains a false assumption.

The best systems crossed 90% accuracy on multiple-choice questions about events reported only hours earlier, which means retrieval-augmented AI has moved from stale encyclopedia mode toward live information work.

That accuracy is not the same thing as reliability, because the systems were far worse when answers had to be produced freely

these models usually do not fail because they cannot “think,” but because they land on the wrong evidence.

More than 70% of errors came from retrieval failures or source divergence, where the system found something nearby but not exact, then answered faithfully from the wrong article, wrong language, wrong scope, or wrong timestamp.

----

Paper Link – arxiv. org/abs/2605.22785

Paper Title: "Evaluating Commercial AI Chatbots as News Intermediaries"

6:47 PM · May 31, 2026 · 3K Views

/AI13h ago

Study Shows AI Chatbots Hit 90% Accuracy on Fresh News but Falter in Open Answers

--0--

Original posts

#1032

Reposts

#1032

Original post

Rohan Paul@rohanpaul_ai#1032inAI

AI chatbots can answer fresh news well, but their weakest failures hide inside their confidence.

Best systems are surprisingly good at recent news when the question is clean and multiple choice.

But it also shows that this success is fragile, because the same systems get worse when they must answer freely, when the news is in Hindi, or when the user’s question contains a false assumption.

That accuracy is not the same thing as reliability, because the systems were far worse when answers had to be produced freely

these models usually do not fail because they cannot “think,” but because they land on the wrong evidence.

----

Paper Link – arxiv. org/abs/2605.22785

Paper Title: "Evaluating Commercial AI Chatbots as News Intermediaries"

6:47 PM · May 31, 2026 · 3K Views

Sentiment

Users voiced frustration that AI chatbots keep selecting the wrong articles when tested for accuracy on fresh news stories.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

RETWEETS4

Rohan Paul@rohanpaul_ai

AI chatbots can answer fresh news well, but their weakest failures hide inside their confidence.

Best systems are surprisingly good at recent news when the question is clean and multiple choice.

But it also shows that this success is fragile, because the same systems get worse when they must answer freely, when the news is in Hindi, or when the user’s question contains a false assumption.

That accuracy is not the same thing as reliability, because the systems were far worse when answers had to be produced freely

these models usually do not fail because they cannot “think,” but because they land on the wrong evidence.

----

Paper Link – arxiv. org/abs/2605.22785

Paper Title: "Evaluating Commercial AI Chatbots as News Intermediaries"

13h3K265