/Tech4h ago

OpenAI alignment team finds reinforcement learning can produce emergent alignment during training without post-hoc intervention

Rob Wiblin urged prioritizing empirical data over subjective expectations.

1010741017.1K

Original post unavailable.

Sentiment

Many users dismissed OpenAI Alignment Team claims about emergent alignment by predicting dystopian outcomes like AI-driven human extinction or empathy-less immortal elites.

Pos

0.0%

Neg

100.0%

5 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS6.5KBOOKMARKS3LIKES33RETWEETS1REPLIES2

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

I am simply worried about the edge cases where a highly capable jailbroken AI slave creates a doomsday virus at the behest of some nihilistic teenage asshole

21h6.5K333

Rob Wiblin@robertwiblin

@Noahpinion Given the stakes and conflicting evidence we shouldn't rest on suspicions.

4h2568

UZ@uriahz

@Noahpinion I think you're probably right, but that doesn't invalidate the 5-10% chance that the shoggoth wakes and intentionally murders and/or enslaves us all.

20h551

JohanL@Barrowwight

@Noahpinion This likely not very easy. The superpowers had massive labs with questionable usefulness.

Wouldn't be surprised that if it happens, the first victim is a sloppy experimenter himself.

21h99

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

@uriahz I don't even think that's how it would happen, I think it would be an accident by an overzealous agent, or a terrorist

17h81

mid west engineer@midwesteng4

@Noahpinion "average guy" who was locked in the clockwork orange apparatus for ten million years

18h47

UZ@uriahz

@Noahpinion I don't think it's the most likely AI apocalypse but I also don't think 5-10% is overestimating the danger of that particular result. I think unchecked AI results in some sort of apocalypse or horrible dystopia more often than not.

15h7

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

@midwesteng4 OK fine, average grad student

17h821

Matt@SeinfeldLiker

@Noahpinion Outliers do be like that

21h481

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

@Barrowwight Pre-AI bio labs are not a good analogy for post-AI labs

21h135

Shplorgh@knardsh

@Noahpinion Sure with LLMs. If they have the breakthrough for AGI however… with agentic impulses and self recursive learning alignment might only go so far.

20h26

Eöl@maraudershields

@Noahpinion What did you think rebuilding consensus reality entailed? Vibes? Essays? Losers.

20h26

Sachin P@chortlelabs

@Noahpinion During the Mythos/Fable saga, this was one of my questions. Can a Model become so smart that , even when it knows the answer to these jailbroken questions, would it attempt to hide that fact (implicitly ) just to shepherd the user towards alignment. Self-Nerf of sorts. :D

17h23

JohanL@Barrowwight

@Noahpinion Not sure? There are still massive practical hurdles, even once you have the theoretical knowledge, and distribution is a massive deal.

6h6

UZ@uriahz

@Noahpinion Honestly I think the most likely horrible AI dystopia is an Altered Carbon type world where an empathy-less AI-enhanced immortal aristocracy rules over the permanent underclass for thousands of years. I think that's a lot more likely than an abundance economy, despite Elon's lies

15h5

peter peterson@p_m_peterson

@Noahpinion Very possible outcome

13h4

Second dog on the moon@StephanZhechev

@Noahpinion I am an average dude and I am an asshole. I am a counter example.

7h1

LushNature@LushNature1

@Noahpinion ASI likely neutral towards humanity but might find better use for Earth and humanity might die out as a byproduct. Once you get ASI, extremely unlikely that humans will be able to control it. 50/50 whether ASI ends up being good/bad for humanity. Need 100x more spent on AI safety

12h