Original post
Prakash#711
AI:AM@AI_in_the_AM
"ask them to boil water and they can't do it 20% of the time"
Peter Jansen, Research Scientist at Ai2, says current AI science models still fail basic benchmark tasks far too often.
"They're really terrible at that"
"you gotta pay attention to all the really simple ways that they break"
@peterjansen_ai
7:10 AM · Jun 10, 2026 · 583 Views
