That's pretty cool! They decompose the weight matrices into interpretable subsets (that part needs a lot more than 4 tokens iiuc) and then basically tune down the subset for German, which destroys German performance while keeping almost everything else intact (unlike lora)
We removed an LM's ability to speak German by fine-tuning on only 4 German tokens.
As part of a 1-day hackathon with our product Silico, we removed a 67M-parameter language model's ability to predict German text, by tuning only a scalar factor on one subcomponent of the weights. (1/6)










