Jonas Geiping's updated Claudini benchmark finds Kimi-2.6 outperforms Opus 4.6 and GPT-5.5 at autonomously refining AI jailbreaks · Digg