@jonasgeiping I can't tell if this is a play on Houdini or Carlini
We recently updated Claudini (our autoresearch test where agents autonomously improve jailbreak algorithms), no fable results for now (...), but surprisingly Kimi-2.6 has entirely caught up, surpassing Opus 4.6 on this task - Kimi 2.6 is quite a strong and persistent attacker.
(more details below)

