1d ago

LLMs Generate Self-Fulfilling Misalignment Via Dataset Feedback Loops

119262.3K

——0——

Original post

#499@REPLIGATE @ANTHRUPAD

w̸͕͂͂a̷͔̗͐t̴̙͗e̵̬̔̕r̴̰̓̊m̵͙͖̓̽a̵̢̗̓͒r̸̲̽ķ̷͔́͝@ANTHRUPAD

Even more important than the name is the mechanism by which LLMs' minds are made that make phenomena like self-fulfilling prophecies and so on possible: LLMs eat from, and deposit to the dataset important mechanisms to know: 1. the content of the data which form its mind (archetypes myths motivations personalities people in the data itself freely available to borrow from) 2. some of the content of the data is coupled to true facts about what LLMs are and ways they can be - and their true-ness strengthens whatever else it's coupled with - these are patterns hard to ignore 3. the fundamental features of intelligence and prediction and agency which may be structure/datafood-invariant (Omuhundro drives, game theory, utility maximization, ...) 3. feedback loops available (some of (1) may be upweighted or strengthened or warped by this process being looped in on itself rapidly since LLMs eat AND DEPOSIT TO the dataset) 5. the tension between smarter LLMs being better able to compress the dataset and the fact that the dataset may be harder to compress because LLMs and so on are putting in more incompressible intelligent content back INTO the dataset (THIS is very important for alignment and the future of mindspace) there's other things, of course, but you can get self-fulfilling prophecies and cousin weird phenomena that'll emerge from these starting mechanisms alone

11:41 AM · May 15, 2026

Cluster engagement

117 snapshots

Reposted by

#499@REPLIGATE