/AI3h ago

AI Science Models Fail Basic Benchmark Tasks 20 Percent of Time

1130583
Original postPrakash#1330
AI:AM@AI_in_the_AM

"ask them to boil water and they can't do it 20% of the time"

Peter Jansen, Research Scientist at Ai2, says current AI science models still fail basic benchmark tasks far too often.

"They're really terrible at that"

"you gotta pay attention to all the really simple ways that they break"

@peterjansen_ai

7:10 AM · Jun 10, 2026 · 583 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS7
AI:AM@AI_in_the_AM

Follow @AI_in_the_AM for the daily rundown!

3hViews 7