/AI21h ago

New ICML Paper Shows Self-Consistency Training Entrenches LLM Misalignment

--0--
Original posts
Reposts
Original postGeoffrey Irving#347
David@DavidDAfrica

Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation.

In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment.

9:12 AM · Jun 3, 2026 · 1.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.