1d agoAlignment Tampering Vulnerability Exploits RLHF To Amplify BiasesSentimentSentimentPos100%Neg0%Users thank collaborators for advice on identifying the alignment tampering vulnerability in RLHF.1 comment with sentiment. View comments.