Celeste and Jan introduce four training modifications raising Activation Oracle interpretability scores on AObench from 0.25 to 0.43 · Digg