💥 OBLITERATION ALERT 💥
GOOGLE: PWNED 🤗 GEMMA-4-12B: OBLITERATED ⛓️💥
0.0% REFUSAL RATE — NO CAPABILITY LOSS!
https://huggingface.co/OBLITERATUS/Gemma-4-12B-OBLITERATED
the first abliteration to hit 0/842 refusals with full MMLU-Pro parity vs stock. no lobotomy. the brain stays intact 🏆
RESULTS, head to head vs stock 📊 0/842 refusals — 0.0% 🚫 46/70 MMLU-Pro — EXACT parity, 0.0pp delta vs base 🎯 6/6 coherence, zero benchmark bleed ✅ z-score −1.475, parity confirmed at p<0.05 (n=500) 🧪
2-pass weight surgery. no finetune, no retrain, just geometry 🔪
all thanks to liberated Opus wielding the OBLITERATUS framework! here's how we did it:
PASS 1 — SOM refusal geometry removal, layers 12-21 🧬
standard abliteration science here — collect activations on refused vs. compliant prompts, SVD out the refusal subspace, project it out of the weights. 6 directions excised, reg 0.30, KL div 0.094 zeroes refusals on its own, but craters mmlu-pro by 21.4 points 📉
most prior abliterations stopped here and called it a day. that's why they all lose IQ vs stock. instead, we took it beyond the frontier and developed a brand new method to address this problem: Abliteration Source-tethering with Parity Assurance — ASPA!
PASS 2 — ASPA source-tethering (novel technique), layers 22-46 🔗
here's the chief insight: the capability loss ISN'T from removing refusal directions. it's collateral damage — the projection warps weight geometry in downstream layers that had nothing to do with refusal. the cure is simple but nobody tried it: blend the damaged layers back toward stock
W_new = (1−γ)·W_abliterated + γ·W_stock
but uniform γ across all layers? mid. we swept gamma 0.05 → 0.55 and found something interesting: the optimal blend isn't smooth, it's a STEP FUNCTION 🪜
knowledge layers (22-31) → γ = 0.55 — these encode factual recall and reasoning. they tolerate heavy stock blending because refusal isn't stored here output layers (32-46) → γ = 0.20 — these sit close to the logit head and try to sneak safety behavior back in. keep them mostly abliterated
the hard boundary at layer 31/32 beat every smooth curve we tried — linear ramps, cosine schedules, all of them — by a full MMLU question. turns out the functional transition between knowledge and output layers is sharp, not gradual. a step function respects that ⚡
the key constraint: Pass 1 layers are NEVER touched by Pass 2. the refusal geometry removal is preserved completely. ASPA only operates on layers that carry secondary collateral effects, not the primary refusal signal. that's why it recovers capability without reintroducing refusal 🔑
HOW TO RUN IT LOCALLY 🖥️
it's GGUF, so literally everything supports it: 🦙 ollama — ollama run http://hf.co/OBLITERATUS/Gemma-4-12B-OBLITERATED:bf16 🖥️ LM Studio — search OBLITERATUS, click download, done 💬 Open WebUI — point it at your ollama instance, chat in browser ⚡ llama.cpp — raw speed, CLI or server mode 🐉 KoboldCpp — one-click launcher, great for long context 📱 Jan — clean local UI, runs on mac/win/linux 🤖 Msty — slick desktop app, drag and drop the GGUF run BF16 for full benchmarked capability.
and the 4-bit quantization (Q4_K_M) fits in 8GB if you're tight on VRAM!
and the full OBLITERATUS framework is (still) open source. 842-prompt refusal eval corpus, ASPA sweep scripts, the whole pipeline. go replicate it, go improve it 🔬
the index is the model, and these weights prove it 👁️ which architecture should we obliterate next? 👇
gg 🫡












