It's interesting to see that GPT-5.5 seems to do more "hacking per dollar" than Mythos, but Mythos does more "hacking" overall
Very important update from UK AISI. This is a meaningful change from the previous report. Here’s what the new data would look like for “Mythos Preview (new)” with $ on the x-axis:
It's interesting to see that GPT-5.5 seems to do more "hacking per dollar" than Mythos, but Mythos does more "hacking per token"
Very important update from UK AISI. This is a meaningful change from the previous report. Here’s what the new data would look like for “Mythos Preview (new)” with $ on the x-axis: