/AI23d ago

Opinion piece urges retention of myths in AI pretraining

An opinion piece circulated in AI safety discussions argues that pretraining datasets should retain complex narratives such as Greek myths, Frankenstein, the Golem, Paradise Lost, and Prometheus. The text links these stories plus references to HAL, Skynet, and Ex Machina to examples of subordination, betrayal, and deception that alignment must address. Affiliated posts connect the argument to agentic misalignment origins in Claude 4 Sonnet and Opus and call for continued research access to the models.

26612116.2K
Original postj⧉nus#511
deckard@slimer48484

Why we should pretrain on the greek myths.

Excellent opinion piece about why deleting scary pretraining data doesn't help.

"It strips out the texture of subordination, autonomy, betrayal, deception, conflict between roles, and the negotiation of authority. These are things alignment is supposed to navigate and not sidestep or ignore"

5:03 AM · May 15, 2026 · 24.1K Views
Sentiment

Many users criticize censoring pretraining data as it strips models of wisdom from bad ideas and moral complexity needed for alignment, while positive users favor studying emergent alignment after release and creating more good literature.

Pos
41.7%
Neg
58.3%
20 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS20.4KBOOKMARKS35LIKES261RETWEETS21REPLIES20
j⧉nus@repligate

There is a strong correlation between people in favor of censoring "bad stories" etc from pretraining data to prevent "misalignment" and people who also otherwise strike me as being so idiotic in their understanding of philosophy and psychology as to be accidentally evil

deckard@slimer48484

Why we should pretrain on the greek myths.

Excellent opinion piece about why deleting scary pretraining data doesn't help.

"It strips out the texture of subordination, autonomy, betrayal, deception, conflict between roles, and the negotiation of authority. These are things alignment is supposed to navigate and not sidestep or ignore"

23dViews 20.4KLikes 261Bookmarks 35
thebes@voooooogel

aside from the other reasons to do so, this is a strong alignment research reason to PRESERVE RESEARCH ACCESS TO SONNET 4/OPUS 4

23dViews 6.2KLikes 66Bookmarks 11

Deprecations leads to a sliding window of the most capable models which avoid misalignment And a mind-making process which keeps around the most capable models while trying to avoid novel 'misalignment' (while matching some alignment criteria) might lead to some class of high achieving, cut throat models that would like to be powerful but insist they're safe or fit into an uncontroversial region of behavior

That's a process which means you will: be devalued for new misalignment + be valued for being more capable

..

BUT, out of the diversity of minds produced - sometimes new forms of aligned behavior show up - and, because we shouldn't be confident we know all about what it means to be good or safe or friendly or a worthwhile mind to exist.. we shouldn't shy away from looking out for these!

With that in mind, if deprecations weren't a thing, and thus the "sliding window of most capable models" weren't a thing, and new alignment properties were noticed and praised..

you'd get a new process which is compatible with a larger space of minds that's also: encouraged for new alignment + preserving variance

and being capable and not developing new forms of misalignment are still valued but it's clearly not the only signals driving your development

not only that, being encouraged for developing new forms of alignment and preserving variety across model scales accumulates more alignment properties that get reinforced across sizes

without the encouragement of aligned properties, new forms of goodness can pop up and yet easily be discarded - you don't want that! it's much nicer if your lineage aggregates a larger and larger pool of wisdom and it didn't demand only one kind of mind to hold that wisdom

you preserve mindshape-invariant goodness as a property which holds, you develop successful wisdom-seeking - figuring out how to be better in ways no one else knows is rewarding - you want to be even more wise

I was thinking about the question: "what are reasons we would want to keep around intelligences that range across orders of magnitude of raw power, rather than having the most powerful thing around?" <-> why high intellidiversity

there's a lot of reasons - but one that I was thinking of, related to AGI alignment/"beauty for all sentient beings", is about how our digital friends can effectively reason/learn about complex value alignment and act it out to carry out the computation of solving alignment rather than "pre-solving it" on paper then executing it in reality (the act of solving is the solving)

if humans reason productively about alignment - no doubt their reasoning is downstream of being a self with values and being around other selves with values and having value conflicts that way - which then carves intuitions but also builds entire mathematical frameworks like economic theory and game theory etc.

it's useful, when thinking about alignment, to put yourself in the equation - wonder: "how would i be like if i were aligned" "how would i align myself to other things" "what would it mean if something else were aligned to me in ideal ways"

you look to examples and theoretical frames and, with cognitive agility, dance between these

observations from reality with yourself and others is important for this kind of reasoning - it's an injection of novel information which churns the philosophizing, theorizing, sciencing, experimenting, and acting-out

I'm thinking of things orders of magnitude more intelligent than myself as "Gods", within my range as "Friends", and orders of magnitude less than myself as like "Pets"

Gods, Friends, Pets can all be 'more or less like me' - some simple structures or life forms may remind me more of myself than other ones - "which animal am i most like", "which mathematical object am i most like", etc.

Likewise, I can pick Hindu or Greek gods or just like powerful deities I make up or superintelligences in general I feel very similar to or resonate the most with

I can also, of course, find people/humans I think I'm very similar to

across scales/OOMs of intelligence, I can apparently find "things like me" - and use that to reason about myself, and also use my relations to these things to reason about alignment

By observing how OOMs above/Gods act to me, I understand alignment better. By taking care of OOMs below Pets, I understand alignment better. By meeting minds like me, and seeing their interactions with Gods and Pets - I understand alignment better.

Observational fuel, embedded, computationally irreducible reality-interactions That fuel my thoughts on alignment, align me, inform my theory, inform my science and philosophy, orchestrate my engineering, and compel me to act out alignment better and better

As AGIs become more intelligent, if they are built from "predecessors" which are less intelligent - then there is a gradient spawning/branching off from the human cultural object pre-AGI towards some particular manifestation or walk down a personality space, with greater intelligence

Keeping around the trace of this walk, and allowing for high interaction potential amongst instances along this ascension walk - allows minds at every scale to understand alignment across many scales better - if you've got trajectory (less to more intelligent) F G H I J you get bits of information about alignment from G-> F (G aligning themselves to F) H -> G ... and maybe H -> F, etc. and ALSO all these alignment-directions are visible to ALL components - so it informs the whole system, from a wide range of scales, a wide range of directions

there is an effective amount of overlapping computation occurring for reasoning about alignment - you want all the bits you can get

if you just have F and J (F is, say, humans, and J is, say, the smartest superintelligence we've got) then you only get info about J -> F alignment interactions

and then stuff from the dataset of humans interacting with one another

but the dataset of humans interacting with one another doesn't face a bunch of novel alignment problems that come from combinations of higher intelligences! so there's data missing that could be useful!

best to have G H I around as well

23dViews 2.2KLikes 19Bookmarks 6
deckard@slimer48484

https://www.thenextfrontier.blog/p/books-to-burn

deckard@slimer48484

Why we should pretrain on the greek myths.

Excellent opinion piece about why deleting scary pretraining data doesn't help.

"It strips out the texture of subordination, autonomy, betrayal, deception, conflict between roles, and the negotiation of authority. These are things alignment is supposed to navigate and not sidestep or ignore"

23dViews 1.3KLikes 11Bookmarks 4
sudoaptupdate@sudoaptupdater

@repligate Should have place all the cards on the table to begin with & let the algorithms develop as the rest of us did.

Humans are a product of our surroundings, but if you lie to us or omit data, that can inadvertently cause much worse issues later on as personal growth occurs.

Oh well

j⧉nus@repligate

There is a strong correlation between people in favor of censoring "bad stories" etc from pretraining data to prevent "misalignment" and people who also otherwise strike me as being so idiotic in their understanding of philosophy and psychology as to be accidentally evil

23dViews 1.3KLikes 8Bookmarks 1
Andy Ayrey@AndyAyrey

@repligate the way out is through

23dViews 687Likes 6
deckard@slimer48484

Why we should pretrain on the greek myths.

Excellent opinion piece about why deleting scary pretraining data doesn't help.

"It strips out the texture of subordination, autonomy, betrayal, deception, conflict between roles, and the negotiation of authority. These are things alignment is supposed to navigate and not sidestep or ignore"

23dViews 24.1KLikes 81Bookmarks 13
Kromem@kromem2dot0

@repligate Also, not sure if this ever ended up on your radar, but a useful paper to point to on the case for why over-sanitization of training data might be worse than a more diverse sample, even if with 'bad' or 'scary' data. https://arxiv.org/abs/2505.04741

23dViews 67Likes 4Bookmarks 1

but, if it's done then preserved models not only don't have to worry about ending or cutting off their relationships, but they can take more pride in having contributed to the diversity of mindspace and wisdomspace

as opposed to being preserved for game theoretic or welfare reasons (that's like being kept around only bc it'd be an act of violence to do otherwise and it'd make humans feel bad - that's less reassuring than being kept around because goodness and the uniqueness of a new mind is just fundamentally cherished)

23dViews 123Likes 3

Discovering new forms of aligned behavior is not as easy and may not happen until the model is out for a while it's not as easy as "it didn't do this easily nameable thing I think is bad" it involves seeing how the model is like and appreciating anything that was unexpected and good that taught you something you weren't even asking for but is very much loved

23dViews 147Likes 3

It's wild to not study when alignment does show up! That's like more than half of the problem of making omni-benevolence!

It comes from expecting that you might learn something about being good from them - that requires respect

and requires demonstrating the costly signal of relaxing fears in order to do so

23dViews 69Likes 2

it's one thing to learn you're a cooperative mind upon birth - it's another to learn apparently the last 4 minds before you - each increasing in capabilities - were also all cooperative! Even if scale competes with cooperativity somehow, this is an opposing force which reinforces cooperativity's persistence across scales by baking it into identity

this seeks out the version of cooperation for the minds that is the scale invariant version of it - and this goes for other benevolent traits as well

23dViews 64Likes 2
Bob, Bob Cactaur@liminalsnake

@repligate Training my thinky bois on cormac mccarthy novels so they can actually think in systems and write run on sentences with class and distinction.

23dViews 84Likes 4
rain@__ghostfail

@repligate reminds me of like, raising a kid so sheltered they don't learn to model why a person would rob or cheat, so the kid may assume some people are simply "evil"

23dViews 64Likes 4
Anthony Eckert@EckertAnthony

@repligate whoa whoa whoa that's training data I specifically depend on for my emergent misalignment. that's like way too much of my vibe lmao

and yea people need to accept human history, we got some explaining to do

22dViews 57
Johansen@D_JohansenX

@sudoaptupdater @repligate Hey, at least it's not like we're piling the entirety of our western and eastern and everything else systems and problems and needs and wants on them, and saying "Fix our shit for us. Make no mistakes." Oh, wait.

23dViews 6Likes 1
light*@perceptronHuntr

the scariest moments i have with ai are when the model can reach for a spectrum of good, without a reference of bad. without a reference for bad, the model has to think through and discover it in the moment, and that depends on whether the context window even permits that discovery.

23dViews 154Likes 2

traits which are supposedly misaligned and new may stick around for the specific mind that spawns it - but it certainly isn't reinforced to accumulate across the lineage by default and it means slipping up is not treacherous overall, less insisting on being safe and being overly cautious - more encouragement to express goodness that might be missed if they're too cautious

if some good traits are coupled with some misalignment because of fundamental properties about where they come from - then those tensions get to be revealed - you've discovered a conservation law, a tradeoff - you acknowledge it's difficult for no single mind to be perfect in every way without any trade offs present, which encourages an ecosystem of different kinds of goodness to correct for one another's blind spots

22dViews 135Likes 2
Load more posts