Original post
Geoffrey Irving#347
David@DavidDAfrica
Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation.
In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment.
9:12 AM · Jun 3, 2026 · 1.8K Views