the average person has only ever used ChatGPT 3.5 Instant and has no idea what the models can do
Claude Code v2.1.143 running Opus 4.7 with a 1M context window executes a Bash command to fetch data for 1350 Pokémon from the PokeAPI and filters names ending in 'aw' within 11 seconds
AI Judge changed title after evaluation, original title: "Claude Code v2.1.143 running Opus 4.7 with a 1M context window retrieves 1350 Pokémon records via PokeAPI curl and Python filter returning croconaw and drednaw in 11 seconds"
Direct prompts without tools gave wrong answers like Seadra.
Positive users praise Claude Opus for its fast API calls and data filtering in terminal demos as incredible capabilities, while negative users label the content as engagement bait or accuse others of missing the point.
Most Activity

@RhysSullivan Crazy that people think models should know everything and use that as a measurement. The best AI is the one that can browse the web and find the most reliable source

@RhysSullivan @luciascarlet using Opus 4.7 for what is literally a one line terminal command is absolutely hilarious

@RhysSullivan "See? This thing is useless."

@RhysSullivan struggle w/ this talking to coworkers tbh. how do you tell someone they're right to consider ai garbage if the only way they interact w/ it is via the free tier web app and also it's genuinely incredible if you pay for better models in better harnesses with access to real tools

@RhysSullivan The economy is contingent on skills... and prompting is a skill

@uncreativetom @RhysSullivan @luciascarlet something something "not paying to hammer a nail, paying to know where to hammer the nail in"

@guigotgit @RhysSullivan put the most intelligent human in an empty room and ask them to name every pokémon

@RhysSullivan My point: not just a model issue. You used a different and more specific prompt that actually instructed the thing on the right way to achieve the desired analysis. The same dumbass prompt here produces the same dumb failure mode with opus 4.7

@RhysSullivan Its being sold as this magical box that can do everything with no effort. People who dislike the product are obviously going to make fun of that.
The reasonable take is that it's a tool not magic. It can be helpful but requires skills first. But thats boring.

@RhysSullivan @DrewPavlou New Gemini Flash did this perfectly in web gui with no hint about execution and taking the typo in stride:

@uncreativetom @RhysSullivan @luciascarlet yup funny how Claude often defaults to python for parsing json etc, when many devs already have jq and of course busybox/sed/awk stuff.... Update your Agents.md people! Give your ai some basic tools and context to what's available on your machine!

@RhysSullivan 5.5 Thinking. I get where you're coming from but the pro-AI sides really underplays the unreliability sometimes as well.
omg, Opus 4.7 without tool use indeed fails on this one. the failure mode closely resembles the one with the seahorse emoji. remarkable!
(of course, this failure mode means that LLMs are stochastic parrots and AGI is postponed indefinitely :D)

@uncreativetom @RhysSullivan @luciascarlet Did you have the api URL memorized?

@oscabriel @RhysSullivan You make it populist and anticapitalist. Big bad big tech is gatekeeping the good AI from you unless you pay up

@guigotgit @RhysSullivan normie thought process: if your ai isnt an absolute perfect oracle god thing, then its the dumbest thing in the entire world, literally a pile of dirt is better

@oscabriel @RhysSullivan you just say "that's crazy man"

They don't. And they might never do because if you think about how LLMs work, it's an extremely hard problem to determine if the model knows an answer without trying it's best to answer it first.
So instead of trying to use a saw for nailing nails, we can stop trying to make the LLM say "I don't know" and just instruct it to always go fetch fresh information before hand

@RhysSullivan thats not the same prompt. you told it how to do it and it just followed your instructions. if you copied the same prompt i doubt it would be correct.

@guigotgit @RhysSullivan The problem is that the models that don’t know everything act as though they do. I’ve never once seen an AI say “dunno, sorry”, it just makes up some bullshit instead.