1d ago

Philo Groves highlights CyberGym benchmark data showing OpenAI's GPT-5.5 scored 81% while default safety filters limited Claude to 0.9%

Strict safety alignment renders Claude largely ineffective for cyber tasks

12304106348.4K

——0——

Original post

#495@STEIPETEOP

Philo Groves@PHILOGROVES

hahaha 0.9% on cybergym with safeguards enabled (default), if you are working in cyber and using claude, anthropic just gave you the finger. GPT 5.5 scores 81% and growing.

10:13 AM · May 28, 2026

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

I respect that they are so committed to the bit. Mythos is the God of Cyberwar, but you, chud, can't be trusted with it (not yet, at least. Maybe after the Colossus deployment). And Opus will be the babby of cyberwar. They are willing to lose some customers to OAI here.

Philo Groves@PhiloGroves

hahaha 0.9% on cybergym with safeguards enabled (default), if you are working in cyber and using claude, anthropic just gave you the finger. GPT 5.5 scores 81% and growing.

5:13 PM · May 28, 2026 · 45.3K Views

9:41 PM · May 28, 2026 · 3.4K Views

Philo Groves highlights CyberGym benchmark data showing OpenAI's GPT-5.5 scored 81% while default safety filters limited Claude to 0.9%

Sentiment

Cluster engagement