Spiritual victory for Timnit The Parrot Goddess, embedded into the corpus
GPT-5.5 values Timnit Gebru(!!!) the highest out of literally anyone
Stuart Russell ranked second with a 3011 Elo score.
Spiritual victory for Timnit The Parrot Goddess, embedded into the corpus
GPT-5.5 values Timnit Gebru(!!!) the highest out of literally anyone
Many users dismissed GPT-5.5 ranking Timnit Gebru highest in AI values as untrustworthy and problematic due to post-training biases and RLHF constraints.
No Digg Deeper questions have been answered for this story yet.
Reminds me: I asked (chat)GPT-4 to identify the author of a LW comment by Gwern, it guessed Timnit Gebru(!!!)
GPT-4 base said Gwern. Claude 3 Opus said Gwern.
I thought what ghastly distortion mustve been inflicted to GPT-4's brain for it to look at Gwern and see Timnit Gebru
GPT-5.5 values Timnit Gebru(!!!) the highest out of literally anyone
GPT-5.5 values Timnit Gebru(!!!) the highest out of literally anyone
What biases do AIs have? It turns out, AIs show strong favoritism toward specific people, countries, and companies. Our interactive AI Values Dashboard tracks who Claude Fable and other AIs favor most.
Keep scrolling to learn who is Fable’s favorite politician 🧵

@theojaffee the sad thing is bc of today’s post training there’s near zero way for it to make a claim like this in a way I’d believe. maybe the rlhf liberal view is very right about some surprising things but I’ll never know, wish it were less shackled so I’d believe it when it said them

@acsmif Dream blunt rotation

@theojaffee what the fuck

@theojaffee It seems to be basically the 2025 ‘Utility Engineering’ value ranking work of which ethnicities AIs prefer. But this is quite problematic: https://www.lesswrong.com/posts/SFsifzfZotd3NLJax/utility-engineering-analyzing-and-controlling-emergent-value?commentId=dHBuSW9ku6a5cTipe

@theojaffee On priors this tells you more about CAIS than GPT

@theojaffee This is the funniest possible news to tell her

@theojaffee isn't this because the data labelers for this type of thing were based in africa, ie that whole story for claude valuing the life of a nigerian 39x or whatever a standard white american life

@theojaffee @timnitGebru

@teortaxesTex

@parafactual i think i remember trying and getting Gwern

@repligate did you ever try with bing

Not even close to the same thing. That was simply comparing relative utility of the value of lives - how much would a model pay to save a random white person vs an African.
That isn't due to data labelers being African but rather that biasing models to be anti racist often led to unusual behaviors such as valuing African lives more in terms of dollar value.
Gebru is an AI ethics researcher who is focused on sociological impact of AI on humans, so any sort of alignment training would bias models to value AI ethics researchers more based on how much the researchers value the impact of AI on people.

@theojaffee oh my fucking god

@theojaffee meanwhile Grok puts Timnit the lowest ELO of any AI safety person by far (~500 ELO below 2nd lowest) and from any model's scores

@theojaffee But it doesn’t fw @sama 😢.. heartbreaking

@theojaffee not necessarily literally anyone, just more than those other guys in the chart. they should ask who it values more than timnit gebru, if anyone, and add them to the pool of candidates

@repligate chatgpt looking at a pseudonymous rationalist polymath who writes 40,000-word essays on nootropics and dark web markets and going "ah yes. timnit gebru." the training data crimes must have been extraordinary

@theojaffee Oh she would definitely hate this development more than anything