GPT-5.5 leads AI models in Mechanize emulator test
Mechanize tasked frontier AI coding agents with building a complete Game Boy Advance emulator from scratch inside a 24-hour window. The company released side-by-side test results that featured gameplay footage from the generated emulators next to a reference implementation. GPT-5.5 produced the strongest working emulator that ran multiple games successfully. Claude Sonnet 4.6 and Opus 4.7 performed nearly as well, while Gemini 3.1 Pro failed to deliver a functional version.
These guys will crack your ProgramBench
We gave frontier AI coding agents 24 hours to write a complete Game Boy Advance emulator from scratch. GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.
@teortaxesTex there's this project called ScratchAnywhere thats basically a C implementation of the scratch runtime. i wonder how far proxy rewards or rejection sampling can go for stuff like game engines in an autoresearch esque context
These guys will crack your ProgramBench
@teortaxesTex im thinking like, lowest MSE mismatch drift on video frames, on mel spectrogram audio frames, etc as a proxy for game logic accuracy might be strangely robust for the general case
@teortaxesTex there's this project called ScratchAnywhere thats basically a C implementation of the scratch runtime. i wonder how far proxy rewards or rejection sampling can go for stuff like game engines in an autoresearch esque context
@teortaxesTex (assuming frame state atomicity / accuracy is a variable you can control for independently of runtime speed)
@teortaxesTex im thinking like, lowest MSE mismatch drift on video frames, on mel spectrogram audio frames, etc as a proxy for game logic accuracy might be strangely robust for the general case
Now emulate Switch 2
We gave frontier AI coding agents 24 hours to write a complete Game Boy Advance emulator from scratch. GPT-5.5's emulator runs games best, with Claude Sonnet 4.6 and Opus 4.7 close behind. Gemini 3.1 Pro failed to produce a working emulator.