/Tech1h ago

Teortaxes and Ege Erdil find Claude Fable outperforms Claude Opus on screenshot reasoning but exhibits behavioral confabulation

Fable falsely attributed Claude-specific speech patterns to general LLMs.

316141.5K

#501

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

@EgeErdil2 it's kind of hard but ok, here's a self-contained simple case Fable can infer the logic of the screenshot, Opus trips up on its shoelaces

Ege Erdil@EgeErdil2

@teortaxesTex can you show me an example?

8:32 PM · Jun 13, 2026 · 159 Views

Sentiment

Users praise the Fable model for outperforming Opus on complex tasks like detailed 3D graphics engine reviews because it crushes competitors at inferring logic from screenshots.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.3KBOOKMARKS5LIKES16

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

it's a vibe Here's one simple example Opus and Fable are related models, with a very similar post-training, trying to do the exact same thing. The difference is that Opus goes through the motions and trips up on its shoelaces, whereas Fable… just gets it, holistically.

Ege Erdil@EgeErdil2

i don't get the crazy hype about mythos. i thought the model was unimpressive in every single interaction i've had with it

my sense is it's the same size improvement we saw from opus 4.6 to opus 4.8. it's not something to get this excited about

1h1.3K165

RETWEETS1REPLIES1

Ege Erdil@EgeErdil2

@teortaxesTex agree on this prompt fable is better than opus

but i think actually the bigger issue with the response to this prompt is on the confabulation axis

fable has no clue why claude models speak like this, and its explanation is garbage bc this is a claude tick and not an LLM tick

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@EgeErdil2 it's kind of hard but ok, here's a self-contained simple case Fable can infer the logic of the screenshot, Opus trips up on its shoelaces

1h13400

Ege Erdil@EgeErdil2

@teortaxesTex i also think you just had bad rng on opus

this is what opus says when i show your screenshot to it

it fails to make the inference from the outer tweet that fable just demonstrated the tic, which is definitely worse, but i don't think is a massive capability gap

Ege Erdil@EgeErdil2

@teortaxesTex agree on this prompt fable is better than opus

but i think actually the bigger issue with the response to this prompt is on the confabulation axis

fable has no clue why claude models speak like this, and its explanation is garbage bc this is a claude tick and not an LLM tick

1h8901

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@EgeErdil2 the more interesting case that I don't want to show was a detailed engineering review of a complex 3d graphics engine, where it absolutely crushed 5.5 and 4.8

1h38