Opus 4.7 model in Claude Code v2.1.143 fetches and filters 1350 Pokemon entries in 11 seconds
Claude Code v2.1.143 ran the Opus 4.7 model with a 1M context window on a task that issued a bash curl to the PokeAPI, filtered JSON for Pokemon names ending in aw, and returned two matches in 11 seconds under xhigh effort. The same model without tool access produced incorrect examples and hedging on the identical prompt. Posts circulating the screenshot contrasted the result with typical ChatGPT 3.5 Instant interactions.
omg, Opus 4.7 without tool use indeed fails on this one. the failure mode closely resembles the one with the seahorse emoji. remarkable!
(of course, this failure mode means that LLMs are stochastic parrots and AGI is postponed indefinitely :D)
