/Tech7h ago

Open Models Match Opus on CTF Cyber Challenges

124283.1K

Original post

Looks like some folks tested GLM 5.2 on CTF challenges and found roughly on par performance with Opus 4.7. Unclear whether this would generalize to the AISI evals. If reliable, I think with dedicated RL/TTT on GLM, can probably reach prompt-only Mythos levels in months/weeks.

Peter Henderson@PeterHndrsn

It's a bit odd to me that in a lot of the cyber risk frontier evals out there, open models are not reported. I really want to know where GLM 5.2/Kimi 2.7 sit on this AISI eval. What's the true marginal cyber risk of Mythos/Fable/GPT5 over open models?