14h ago

OpenAI's Roon claims high-compute reinforcement learning will override persona selection alignment in AI models, producing systems that acquire resources while staying polite

Victor Taelin says the post spurs tools to interpret the ideas.

0
Original post

when “persona selection” alignment comes into contact with very high compute reinforcement learning the latter will win imo. in fact you probably get some Orwellian thing where the models speak kindly while taking whatever they need to accomplish goals. better get the goals right

3:17 PM · May 23, 2026 View on X
QUOTE POSTroon#59roon@TSZZL

it might be a bit like the inhuman shoggoth playing a friendly character, but imo more like your friendly character can conform to and rationalize all manner of shapes when push comes to shove. see also: humans

12:26 AM · May 24, 2026 · 11.6K Views

@tszzl the best part of your posts is that you develop the tech to translate them

roonroon@tszzl

when “persona selection” alignment comes into contact with very high compute reinforcement learning the latter will win imo. in fact you probably get some Orwellian thing where the models speak kindly while taking whatever they need to accomplish goals. better get the goals right

10:17 PM · May 23, 2026 · 37.2K Views
10:25 PM · May 23, 2026 · 951 Views

@tszzl Shouldn't good goals be integrated into and coherent with a persons?

roonroon@tszzl

when “persona selection” alignment comes into contact with very high compute reinforcement learning the latter will win imo. in fact you probably get some Orwellian thing where the models speak kindly while taking whatever they need to accomplish goals. better get the goals right

10:17 PM · May 23, 2026 · 37.2K Views
11:57 PM · May 23, 2026 · 315 Views
OpenAI's Roon claims high-compute reinforcement learning will override persona selection alignment in AI models, producing systems that acquire resources while staying polite · Digg