@willccbb @yong_zhengxin Instruction following is not alignment and not AI safety
You can have a model RSI on producing a new virus that kills everyone following the above, and that is very much an alignment failure in the AI safety sense ☹️
Foster warns unaligned systems can still undergo recursive self-improvement
@willccbb @yong_zhengxin Instruction following is not alignment and not AI safety
You can have a model RSI on producing a new virus that kills everyone following the above, and that is very much an alignment failure in the AI safety sense ☹️
@willccbb @yong_zhengxin > fortunately, alignment is a precondition for RSI
Are you saying that loss of control via RSI is not a possibility? Since you can’t do RSI in the first place without alignment?
Foster warns unaligned systems can still undergo recursive self-improvement
@willccbb @yong_zhengxin Instruction following is not alignment and not AI safety
You can have a model RSI on producing a new virus that kills everyone following the above, and that is very much an alignment failure in the AI safety sense ☹️