Claude 4.8 Opus smashes GPT-5.5 and is new SOTA on GBA Eval
On GBA Eval models are used as coding agents to build a working Game Boy Advance emulator from scratch within 24 hours.
Claude 4.8 Opus smashes GPT-5.5 and is new SOTA on GBA Eval
On GBA Eval models are used as coding agents to build a working Game Boy Advance emulator from scratch within 24 hours.
Opus 4.8 also progresses much faster than GPT-5.5 on this eval
Claude 4.8 Opus smashes GPT-5.5 and is new SOTA on GBA Eval
On GBA Eval models are used as coding agents to build a working Game Boy Advance emulator from scratch within 24 hours.
Claude 4.8 Opus smashes GPT-5.5 and is new SOTA on GBA Eval
On GBA Eval models are used as coding agents to build a working Game Boy Advance emulator from scratch within 24 hours.
Users react to Claude 4.8 Opus leading GPT-5.5 on a coding benchmark, with fans celebrating its performance and ongoing rivalry while critics call the test meaningless for real applications.