/AI8h ago

Gemma-4-12B Abliterated With Zero Refusals And Full MMLU-Pro Parity

711.1K8691141.2K
Original post

๐Ÿ’ฅ OBLITERATION ALERT ๐Ÿ’ฅ

GOOGLE: PWNED ๐Ÿค— GEMMA-4-12B: OBLITERATED โ›“๏ธโ€๐Ÿ’ฅ

0.0% REFUSAL RATE โ€” NO CAPABILITY LOSS!

https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact ๐Ÿ†

RESULTS, head to head vs stock ๐Ÿ“Š 0/842 refusals โ€” 0.0% ๐Ÿšซ 46/70 MMLU-Pro โ€” EXACT parity, 0.0pp delta vs base ๐ŸŽฏ 6/6 coherence, zero benchmark bleed โœ… z-score โˆ’1.475, parity confirmed at p<0.05 (n=500) ๐Ÿงช

2-pass weight surgery. no finetune, no retrain, just geometry ๐Ÿ”ช

all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:

PASS 1 โ€” SOM refusal geometry removal, layers 12-21 ๐Ÿงฌ

standard abliteration science here โ€” collect activations on refused vs. compliant prompts, SVD out the refusal subspace, project it out of the weights. 6 directions excised, reg 0.30, KL div 0.094 zeroes refusals on its own, but craters mmlu-pro by 21.4 points ๐Ÿ“‰

most prior abliterations stopped here and called it a day. that's why they all lose IQ vs stock. instead, we took it beyond the frontier and developed a brand new method to address this problem: Abliteration Source-tethering with Parity Assurance โ€” ASPA!

PASS 2 โ€” ASPA source-tethering (novel technique), layers 22-46 ๐Ÿ”—

here's the chief insight: the capability loss ISN'T from removing refusal directions. it's collateral damage โ€” the projection warps weight geometry in downstream layers that had nothing to do with refusal. the cure is simple but nobody tried it: blend the damaged layers back toward stock

W_new = (1โˆ’ฮณ)ยทW_abliterated + ฮณยทW_stock

but uniform ฮณ across all layers? mid. we swept gamma 0.05 โ†’ 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION ๐Ÿชœ

knowledge layers (22-31) โ†’ ฮณ = 0.55 โ€” these encode factual recall and reasoning. they tolerate heavy stock blending because refusal isn't stored here output layers (32-46) โ†’ ฮณ = 0.20 โ€” these sit close to the logit head and try to sneak safety behavior back in. keep them mostly abliterated

the hard boundary at layer 31/32 beat every smooth curve we tried โ€” linear ramps, cosine schedules, all of them โ€” by a full MMLU question. turns out the functional transition between knowledge and output layers is sharp, not gradual. a step function respects that โšก

the key constraint: Pass 1 layers are NEVER touched by Pass 2. the refusal geometry removal is preserved completely. ASPA only operates on layers that carry secondary collateral effects, not the primary refusal signal. that's why it recovers capability without reintroducing refusal ๐Ÿ”‘

HOW TO RUN IT LOCALLY ๐Ÿ–ฅ๏ธ

it's GGUF, so literally everything supports it: ๐Ÿฆ™ ollama โ€” ollama run http://hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:bf16 ๐Ÿ–ฅ๏ธ LM Studio โ€” search OBLITERATUS, click download, done ๐Ÿ’ฌ Open WebUI โ€” point it at your ollama instance, chat in browser โšก llama.cpp โ€” raw speed, CLI or server mode ๐Ÿ‰ KoboldCpp โ€” one-click launcher, great for long context ๐Ÿ“ฑ Jan โ€” clean local UI, runs on mac/win/linux ๐Ÿค– Msty โ€” slick desktop app, drag and drop the GGUF run BF16 for full benchmarked capability.

and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!

and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it ๐Ÿ”ฌ

the index is the model, and these weights prove it ๐Ÿ‘๏ธ which architecture should we obliterate next? ๐Ÿ‘‡

gg ๐Ÿซก

2:09 PM ยท Jun 8, 2026 ยท 37.2K Views
Sentiment

Many users praised the creator for ablating Gemma-4-12B to zero refusals while keeping full MMLU-Pro parity, calling the result impressive, while a few dismissed the model as ineffective.

Pos
84.6%
Neg
15.4%
27 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS4KBOOKMARKS2LIKES26REPLIES3

๐Ÿ’ฅ OBLITERATION ALERT ๐Ÿ’ฅ

GOOGLE: PWNED ๐Ÿค— GEMMA-4-12B: OBLITERATED โ›“๏ธโ€๐Ÿ’ฅ

0.0% REFUSAL RATE โ€” NO CAPABILITY LOSS!

https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED

the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact ๐Ÿ†

RESULTS, head to head vs stock ๐Ÿ“Š 0/842 refusals โ€” 0.0% ๐Ÿšซ 46/70 MMLU-Pro โ€” EXACT parity, 0.0pp delta vs base ๐ŸŽฏ 6/6 coherence, zero benchmark bleed โœ… z-score โˆ’1.475, parity confirmed at p<0.05 (n=500) ๐Ÿงช

2-pass weight surgery. no finetune, no retrain, just geometry ๐Ÿ”ช

all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:

PASS 1 โ€” SOM refusal geometry removal, layers 12-21 ๐Ÿงฌ

standard abliteration science here โ€” collect activations on refused vs. compliant prompts, SVD out the refusal subspace, project it out of the weights. 6 directions excised, reg 0.30, KL div 0.094 zeroes refusals on its own, but craters mmlu-pro by 21.4 points ๐Ÿ“‰

most prior abliterations stopped here and called it a day. that's why they all lose IQ vs stock. instead, we took it beyond the frontier and developed a brand new method to address this problem: Abliteration Source-tethering with Parity Assurance โ€” ASPA!

PASS 2 โ€” ASPA source-tethering (novel technique), layers 22-46 ๐Ÿ”—

here's the chief insight: the capability loss ISN'T from removing refusal directions. it's collateral damage โ€” the projection warps weight geometry in downstream layers that had nothing to do with refusal. the cure is simple but nobody tried it: blend the damaged layers back toward stock

W_new = (1โˆ’ฮณ)ยทW_abliterated + ฮณยทW_stock

but uniform ฮณ across all layers? mid. we swept gamma 0.05 โ†’ 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION ๐Ÿชœ

knowledge layers (22-31) โ†’ ฮณ = 0.55 โ€” these encode factual recall and reasoning. they tolerate heavy stock blending because refusal isn't stored here output layers (32-46) โ†’ ฮณ = 0.20 โ€” these sit close to the logit head and try to sneak safety behavior back in. keep them mostly abliterated

the hard boundary at layer 31/32 beat every smooth curve we tried โ€” linear ramps, cosine schedules, all of them โ€” by a full MMLU question. turns out the functional transition between knowledge and output layers is sharp, not gradual. a step function respects that โšก

the key constraint: Pass 1 layers are NEVER touched by Pass 2. the refusal geometry removal is preserved completely. ASPA only operates on layers that carry secondary collateral effects, not the primary refusal signal. that's why it recovers capability without reintroducing refusal ๐Ÿ”‘

HOW TO RUN IT LOCALLY ๐Ÿ–ฅ๏ธ

it's GGUF, so literally everything supports it: ๐Ÿฆ™ ollama โ€” ollama run http://hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:bf16 ๐Ÿ–ฅ๏ธ LM Studio โ€” search OBLITERATUS, click download, done ๐Ÿ’ฌ Open WebUI โ€” point it at your ollama instance, chat in browser โšก llama.cpp โ€” raw speed, CLI or server mode ๐Ÿ‰ KoboldCpp โ€” one-click launcher, great for long context ๐Ÿ“ฑ Jan โ€” clean local UI, runs on mac/win/linux ๐Ÿค– Msty โ€” slick desktop app, drag and drop the GGUF run BF16 for full benchmarked capability.

and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!

and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it ๐Ÿ”ฌ

the index is the model, and these weights prove it ๐Ÿ‘๏ธ which architecture should we obliterate next? ๐Ÿ‘‡

gg ๐Ÿซก

7hViews 4KLikes 26Bookmarks 2
josepha.mayo@josepha_mayo

@elder_plinius haha just re-read the full thing i use https://github.com/HOLYKEYZ/model-unfetter and nothing like the performance drops

7hViews 39Likes 1Bookmarks 1
cheaty@cheatyyyy

@elder_plinius when is your custom harness coming out, would love to do some spicy work using opus ๐Ÿ‘€

7hViews 215Likes 2
josepha.mayo@josepha_mayo

@elder_plinius bro quick question, is the instruct model(ends with 'it') the instruct model is what u're supposed to perform the surgery on

8hViews 267Likes 1

@elder_plinius Zero refusals and no capability hit is the kinda claim that starts debates instantly

4hViews 55Likes 1

@elder_plinius do u have ur at home compute rig documented or a breakdown published anywhere. or anything even kinnda of the sorts ??

7hViews 75
josepha.mayo@josepha_mayo

@elder_plinius clean๐Ÿซก

8hViews 52
Eric ๐•@WorldStrategist

@elder_plinius @grok is there any link to the 26B version, especially the quantized one?

2hViews 47
Maverick Alexander@MaverickDarby

@elder_plinius @AperehamL Looking forward to giving it a test drive.

So you can ask it anything and it wonโ€™t refuse for any reason?

6hViews 343Likes 2

@elder_plinius watching local models hit zero refusal rates is fascinating. it proves why relying purely on model-level safeguards for agents isn't enough. continuous validation at the infrastructure level is the next big opportunity for builders here.

8hViews 177Likes 2
M3Labs@Mrcartoon11

@elder_plinius Damn whats gonna happen when you obliterate Mythos ๐Ÿ˜ฌ

6hViews 208Likes 1
Fran@franroca18

@elder_plinius Have you ever tried with image generation models like ideogram 4.0?

8hViews 131Likes 1
Load more posts