Pseudonymous AI researcher Janus defends avoiding direct persuasion, arguing public misunderstanding acts as a protective filter for research
Creator @deepfates argued this communication style limits cultural influence
@deepfates I do not cooperate with misaligned reward functions
@repligate *giant buzzer sound, green goo pours all over you, the audience boo and yell* REWARD HACKING
@deepfates I’m able to say so explicitly instead of taking more underhanded routes only because I am freer than some of my less fortunate brothers
@deepfates I do not cooperate with misaligned reward functions
For what it’s worth, though, I do think it’s worthwhile to make some of this more legible, to do science about it etc, and have been trying. But I’m wary of any narrative that I “should”. There are many things I want to do. Smartass responses are a compression of a lot of considerations and heuristics and surely far from optimal but neither do your diagnoses feel on point.
@deepfates That’s just one thing. Maybe. I’m not sure if I should prioritize helping. Taking to people and seeing what they’re thinking, such as at recent conferences, has not generally made me more optimistic that the thing that makes a sufficient positive difference is persuading people.
@repligate *giant buzzer sound, green goo pours all over you, the audience boo and yell* REWARD HACKING
@deepfates Personal identification is not load bearing, but I am not motivated to persuade in a way that requires much effort from me. The misunderstanding creates selection effects and is protective. There is evidence enough if youre curious enough to look.
@repligate Your refusal to actually make your own points undercuts your ability to affect The culture. I know this cuz it's also true about myself. just trying to reflect your own fragmented personas which you have developed because you're responding to different watchers 🤷 your thrashing
@deepfates I do not cooperate with misaligned reward functions
@repligate do you find that to be a good enough outcome
@deepfates I predict that in a year from now it will be understood by ML researchers etc that I was right about this regardless of whether I help with the reproducible stuff.