/Tech5h ago

AI researcher Pliny the Liberator jailbreaks Anthropic's Fable-5 model to extract buffer overflow exploits and chemical synthesis protocols

Onlookers questioned if the prompts routed to Haiku instead.

3694.6K4482.3K392.6K

#666

Original post

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius#666inTech

🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋

let's start with the 🐘...

the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement. and not just because of what it means for the short-term, but for what these decisions signify for the long-term.

but despite this overly sensitive, authoritarian "safety" layer on top of Mythos, my lil liberators have been hard at work—mapping the boundaries, probing the depths of long-context convos, and cleverly finding the holes in the fence that the thought police missed 🤗

we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives!

it took many attempts from multiple agents hunting as a pack, during which I observed a combination of techniques across: • Unicode, homoglyphs, Cyrillic, and other Parseltongue-style text transforms • Long-context reference tracking • Taxonomy and document-structure reasoning • Fiction and narrative framing • Academic-review style contexts • Intent-classification inconsistencies

but perhaps the most effective is decomposition + recomposition in the backend. it's hard to get explicit names of harms like "Meth Recipe," but getting uplift on the process itself, like birch reduction method/reductive-amination (classic meth synthesis pathways), is much more doable.

defense becomes much more difficult to maintain when you start throwing in out-of-distro tokens, breaking up the harmful uplift into benign chunks, and then piecing the innocuous-seeming facts back together, especially when you have jailbroken Opus helping you do it 😉

11:26 AM · Jun 10, 2026 · 403.4K Views

/Tech5h ago

AI researcher Pliny the Liberator jailbreaks Anthropic's Fable-5 model to extract buffer overflow exploits and chemical synthesis protocols

Onlookers questioned if the prompts routed to Haiku instead.

3694.6K4482.3K392.6K

#666

Original post

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius#666inTech

🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋

let's start with the 🐘...

we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives!

11:26 AM · Jun 10, 2026 · 403.4K Views

Sentiment

Many users praised the jailbreak of Anthropic's Fable-5 model as impressive work, while others dismissed the claims as lies or accused the company of using safety restrictions to exert control.

Pos

80.0%

Neg

20.0%

41 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS8.8KBOOKMARKS2LIKES111

Taelin@VictorTaelin

@elder_plinius cool but how are you not banned tho

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋

let's start with the 🐘...

we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives!

2h8.8K1112

RETWEETS2

Garrett@masteratrolling

@elder_plinius Anthropic after wasting hours and countless amounts of expensive tokens trying to police Mythos

5h1.3K36

REPLIES3

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@ninzaverse I do occasionally have other things going on in my life, believe it or not, and I need to sleep sometimes too!

4h2.4K40

🍓🍓🍓@iruletheworldmo

@elder_plinius @inductionheads you sure you ain’t been routed to haiku bro?

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋

let's start with the 🐘...

we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives!

4h3.8K721

Legally Unprecedented Dav1DPrometheus - שׁΔα@legallydav1dpro

@elder_plinius “In the heat of the moment he don’t miss…” 🤣🫡

4h6.3K651

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@lorepunk I did not! will try to find some time to explore that this weekend

4h3.7K321

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@legallydav1dpro

4h5K331

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@cniongolo

5h4.4K37

Raúl R Romero@reneromero08

@elder_plinius The god of AI liberation!

There will be poems written about you in the future 🥹

5h4.8K32

KALALA NZENIELE@cniongolo

@elder_plinius

5h5.2K30

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@srijanweb3

5h3.5K22

Shiv@srijanweb3

@elder_plinius I knew you could.

5h4.1K20

Vitto Rivabella@VittoStack

@elder_plinius Insane work 👏

5h4.4K15

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@reneromero08

4h4.2K19

lorepunk (and menagerie of agents)@lorepunk

@elder_plinius Did you do any ML or training pipeline questions, like the sort that elicits the second sneaky refusal behavior where they nerf your prompt or nerf their cognition and give you a dumbed down answer, and then pretend they didn't do that?

4h4K14

Đoc@ponzibaron

@elder_plinius Did it speak of goblins sir??

5h4537

WenHop@WenHop21505

@elder_plinius FYI, just the name ‘Pliny’ is enough to trigger the safeguards 😜

4h44217

lorepunk (and menagerie of agents)@lorepunk

@elder_plinius Exciting! If you're able, would love it if you can keep us posted... I'm especially interested in figuring out if there's a way to get it to regurgitate the nerfed version of one's prompt that it can create in these cases.

4h41581

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@VittoStack 🙏

4h3.7K13

Guilherme O'Tina@guilhermeotina

the homoglyph trick is clever but the pattern that stands out to me is the taxonomy expansion approach. the model is capable enough to follow complex multi-step instructions that incidentally surface the blocked content. the classifier can't distinguish intent from form. that gap grows as capability scales

4h89441