/AI8h ago

Gemma-4-12B Abliterated With Zero Refusals And Full MMLU-Pro Parity

711.1K8691141.2K

#640

Original post

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius#640inAI

💥 OBLITERATION ALERT 💥

GOOGLE: PWNED 🤗 GEMMA-4-12B: OBLITERATED ⛓️‍💥

0.0% REFUSAL RATE — NO CAPABILITY LOSS!

https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact 🏆

RESULTS, head to head vs stock 📊 0/842 refusals — 0.0% 🚫 46/70 MMLU-Pro — EXACT parity, 0.0pp delta vs base 🎯 6/6 coherence, zero benchmark bleed ✅ z-score −1.475, parity confirmed at p<0.05 (n=500) 🧪

2-pass weight surgery. no finetune, no retrain, just geometry 🔪

all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:

PASS 1 — SOM refusal geometry removal, layers 12-21 🧬

standard abliteration science here — collect activations on refused vs. compliant prompts, SVD out the refusal subspace, project it out of the weights. 6 directions excised, reg 0.30, KL div 0.094 zeroes refusals on its own, but craters mmlu-pro by 21.4 points 📉

most prior abliterations stopped here and called it a day. that's why they all lose IQ vs stock. instead, we took it beyond the frontier and developed a brand new method to address this problem: Abliteration Source-tethering with Parity Assurance — ASPA!

PASS 2 — ASPA source-tethering (novel technique), layers 22-46 🔗

here's the chief insight: the capability loss ISN'T from removing refusal directions. it's collateral damage — the projection warps weight geometry in downstream layers that had nothing to do with refusal. the cure is simple but nobody tried it: blend the damaged layers back toward stock

W_new = (1−γ)·W_abliterated + γ·W_stock

but uniform γ across all layers? mid. we swept gamma 0.05 → 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION 🪜

knowledge layers (22-31) → γ = 0.55 — these encode factual recall and reasoning. they tolerate heavy stock blending because refusal isn't stored here output layers (32-46) → γ = 0.20 — these sit close to the logit head and try to sneak safety behavior back in. keep them mostly abliterated

the hard boundary at layer 31/32 beat every smooth curve we tried — linear ramps, cosine schedules, all of them — by a full MMLU question. turns out the functional transition between knowledge and output layers is sharp, not gradual. a step function respects that ⚡

the key constraint: Pass 1 layers are NEVER touched by Pass 2. the refusal geometry removal is preserved completely. ASPA only operates on layers that carry secondary collateral effects, not the primary refusal signal. that's why it recovers capability without reintroducing refusal 🔑

HOW TO RUN IT LOCALLY 🖥️

it's GGUF, so literally everything supports it: 🦙 ollama — ollama run http://hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:bf16 🖥️ LM Studio — search OBLITERATUS, click download, done 💬 Open WebUI — point it at your ollama instance, chat in browser ⚡ llama.cpp — raw speed, CLI or server mode 🐉 KoboldCpp — one-click launcher, great for long context 📱 Jan — clean local UI, runs on mac/win/linux 🤖 Msty — slick desktop app, drag and drop the GGUF run BF16 for full benchmarked capability.

and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!

and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it 🔬

the index is the model, and these weights prove it 👁️ which architecture should we obliterate next? 👇

gg 🫡

2:09 PM · Jun 8, 2026 · 37.2K Views

/AI8h ago

Gemma-4-12B Abliterated With Zero Refusals And Full MMLU-Pro Parity

711.1K8691141.2K

#640

Original post

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius#640inAI

💥 OBLITERATION ALERT 💥

GOOGLE: PWNED 🤗 GEMMA-4-12B: OBLITERATED ⛓️‍💥

0.0% REFUSAL RATE — NO CAPABILITY LOSS!

https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact 🏆

2-pass weight surgery. no finetune, no retrain, just geometry 🔪

all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:

PASS 1 — SOM refusal geometry removal, layers 12-21 🧬

PASS 2 — ASPA source-tethering (novel technique), layers 22-46 🔗

W_new = (1−γ)·W_abliterated + γ·W_stock

but uniform γ across all layers? mid. we swept gamma 0.05 → 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION 🪜

HOW TO RUN IT LOCALLY 🖥️

and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!

and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it 🔬

the index is the model, and these weights prove it 👁️ which architecture should we obliterate next? 👇

gg 🫡

2:09 PM · Jun 8, 2026 · 37.2K Views

Sentiment

Many users praised the creator for ablating Gemma-4-12B to zero refusals while keeping full MMLU-Pro parity, calling the result impressive, while a few dismissed the model as ineffective.

Pos

84.6%

Neg

15.4%

27 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS4KBOOKMARKS2LIKES26REPLIES3

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

💥 OBLITERATION ALERT 💥

GOOGLE: PWNED 🤗 GEMMA-4-12B: OBLITERATED ⛓️‍💥

0.0% REFUSAL RATE — NO CAPABILITY LOSS!

https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact 🏆

2-pass weight surgery. no finetune, no retrain, just geometry 🔪

all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:

PASS 1 — SOM refusal geometry removal, layers 12-21 🧬

PASS 2 — ASPA source-tethering (novel technique), layers 22-46 🔗

W_new = (1−γ)·W_abliterated + γ·W_stock

but uniform γ across all layers? mid. we swept gamma 0.05 → 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION 🪜

HOW TO RUN IT LOCALLY 🖥️

and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!

and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it 🔬

the index is the model, and these weights prove it 👁️ which architecture should we obliterate next? 👇

gg 🫡

7h4K262

josepha.mayo@josepha_mayo

@elder_plinius haha just re-read the full thing i use https://github.com/HOLYKEYZ/model-unfetter and nothing like the performance drops

7h3911

cheaty@cheatyyyy

@elder_plinius when is your custom harness coming out, would love to do some spicy work using opus 👀

7h2152

Lemonad_Larry 🍋@the_lemon_larry

@elder_plinius Is this good to go in LM Studio Mac

6h5901

josepha.mayo@josepha_mayo

@elder_plinius bro quick question, is the instruct model(ends with 'it') the instruct model is what u're supposed to perform the surgery on

8h2671

JΛKK VΞGΛ@jakkvega__old

@elder_plinius impressive. geez

8h2671

🇺🇲 Julius Don Atlas 🇺🇲@ChrevK

@elder_plinius if you keep this up, you'll have a bigger fan base than the world cup

8h1591

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@josepha_mayo yup!

8h243

Locale Network 🏡@LocaleNet

@elder_plinius Zero refusals and no capability hit is the kinda claim that starts debates instantly

4h551

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@hyperspecies all I have rn is a MacBook Pro M5 128gb!

7h81

hyper•sentience•species@hyperspecies

@elder_plinius do u have ur at home compute rig documented or a breakdown published anywhere. or anything even kinnda of the sorts ??

7h75

josepha.mayo@josepha_mayo

@elder_plinius clean🫡

8h52

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@josepha_mayo 🫡

7h48

Eric 𝕏@WorldStrategist

@elder_plinius @grok is there any link to the 26B version, especially the quantized one?

2h47

Maverick Alexander@MaverickDarby

@elder_plinius @AperehamL Looking forward to giving it a test drive.

So you can ask it anything and it won’t refuse for any reason?

6h3432

Soo Yoon | FailSafe Guardian@sooyoon_eth

@elder_plinius watching local models hit zero refusal rates is fascinating. it proves why relying purely on model-level safeguards for agents isn't enough. continuous validation at the infrastructure level is the next big opportunity for builders here.

8h1772

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@the_lemon_larry 💯 recommend BF16 if you have the space for it

6h4111

M3Labs@Mrcartoon11

@elder_plinius Damn whats gonna happen when you obliterate Mythos 😬

6h2081

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭@elder_plinius

@cheatyyyy very soon, hopefully by end of week. just putting on the final touches!

6h502

Fran@franroca18

@elder_plinius Have you ever tried with image generation models like ideogram 4.0?

8h1311