/AI21h ago

New ICML Paper Shows Self-Consistency Training Entrenches LLM Misalignment

3446191.8K

Original posts

Reposts

#347

Original post

Geoffrey Irving#347

David@DavidDAfrica

Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation.

In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment.

9:12 AM · Jun 3, 2026 · 1.8K Views

/AI21h ago

New ICML Paper Shows Self-Consistency Training Entrenches LLM Misalignment

--0--

Original posts

Reposts

#347

Original post

Geoffrey Irving#347

David@DavidDAfrica

Many methods use consistency as a way to make language models more capable or aligned, such as through self-distillation or regularisation.

In new work accepted to ICML 2026, @ArathiMani and I show that optimising for self-consistency can entrench pre-existing misalignment.

9:12 AM · Jun 3, 2026 · 1.8K Views

Sentiment

Users criticize consistency training for entrenching LLM misalignment, describing the broader alignment paradigm as an ineffective game of whack-a-mole that requires the community to pause.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

No ranked X posts are available for this story yet.