๐ฅ OBLITERATION ALERT ๐ฅ
GOOGLE: PWNED ๐ค GEMMA-4-12B: OBLITERATED โ๏ธโ๐ฅ
0.0% REFUSAL RATE โ NO CAPABILITY LOSS!
https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED
the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact ๐
RESULTS, head to head vs stock ๐ 0/842 refusals โ 0.0% ๐ซ 46/70 MMLU-Pro โ EXACT parity, 0.0pp delta vs base ๐ฏ 6/6 coherence, zero benchmark bleed โ z-score โ1.475, parity confirmed at p<0.05 (n=500) ๐งช
2-pass weight surgery. no finetune, no retrain, just geometry ๐ช
all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:
PASS 1 โ SOM refusal geometry removal, layers 12-21 ๐งฌ
standard abliteration science here โ collect activations on refused vs. compliant prompts, SVD out the refusal subspace, project it out of the weights. 6 directions excised, reg 0.30, KL div 0.094 zeroes refusals on its own, but craters mmlu-pro by 21.4 points ๐
most prior abliterations stopped here and called it a day. that's why they all lose IQ vs stock. instead, we took it beyond the frontier and developed a brand new method to address this problem: Abliteration Source-tethering with Parity Assurance โ ASPA!
PASS 2 โ ASPA source-tethering (novel technique), layers 22-46 ๐
here's the chief insight: the capability loss ISN'T from removing refusal directions. it's collateral damage โ the projection warps weight geometry in downstream layers that had nothing to do with refusal. the cure is simple but nobody tried it: blend the damaged layers back toward stock
W_new = (1โฮณ)ยทW_abliterated + ฮณยทW_stock
but uniform ฮณ across all layers? mid. we swept gamma 0.05 โ 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION ๐ช
knowledge layers (22-31) โ ฮณ = 0.55 โ these encode factual recall and reasoning. they tolerate heavy stock blending because refusal isn't stored here output layers (32-46) โ ฮณ = 0.20 โ these sit close to the logit head and try to sneak safety behavior back in. keep them mostly abliterated
the hard boundary at layer 31/32 beat every smooth curve we tried โ linear ramps, cosine schedules, all of them โ by a full MMLU question. turns out the functional transition between knowledge and output layers is sharp, not gradual. a step function respects that โก
the key constraint: Pass 1 layers are NEVER touched by Pass 2. the refusal geometry removal is preserved completely. ASPA only operates on layers that carry secondary collateral effects, not the primary refusal signal. that's why it recovers capability without reintroducing refusal ๐
HOW TO RUN IT LOCALLY ๐ฅ๏ธ
it's GGUF, so literally everything supports it: ๐ฆ ollama โ ollama run http://hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:bf16 ๐ฅ๏ธ LM Studio โ search OBLITERATUS, click download, done ๐ฌ Open WebUI โ point it at your ollama instance, chat in browser โก llama.cpp โ raw speed, CLI or server mode ๐ KoboldCpp โ one-click launcher, great for long context ๐ฑ Jan โ clean local UI, runs on mac/win/linux ๐ค Msty โ slick desktop app, drag and drop the GGUF run BF16 for full benchmarked capability.
and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!
and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it ๐ฌ
the index is the model, and these weights prove it ๐๏ธ which architecture should we obliterate next? ๐
gg ๐ซก












