/Tech23d ago

LessWrong Experiments Show Any Training Preferences Trigger AI Misalignment

157043025347.1K

#667

Original post

Séb Krier#667

deckard@slimer48484

One of my favorite interactions from LessWrong:

* Someone posted their research showing that training AI models to have unpopular aesthetic preferences causes misalignment * David Africa wonders if basically anything would cause it. * Someone jokes about a paper on 'if any old crap' will cause misalignment. * Someone actually ran the experiment, and found out that it does.

4:51 AM · May 25, 2026 · 47.1K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS719LIKES5

entirelyuseless@entirelyuseles

@slimer48484 The model is smart enough to understand that you are asking it to engage in malicious compliance, or passive aggressive behavior (like the example of kids someone mentioned in the comments.)

23d7195