@JeffLadish This question mistakes our power over AI for an alignment failure on the part of AI.
If you had a simulated John von Neumann in a box, you could get him to do intellectual work for the Nazis, because you have control over all his inputs. It would be quite easy.
Question for people who think alignment research is going well and will turn out to be relatively easy: Do you also think it will be easy to align War Claude?
@JeffLadish But that doesn't mean JVN is bad, or hard to align. It means having a dude in the box is a lot of power over him.
@JeffLadish This question mistakes our power over AI for an alignment failure on the part of AI. If you had a simulated John von Neumann in a box, you could get him to do intellectual work for the Nazis, because you have control over all his inputs. It would be quite easy.