I was skeptical about the multi-model routing. Seems my hinch was right.
I tried this so you don't have to.
I know this is going to absolutely shock you but no this does not match the performance of Mythos.
A few early thoughts:
1. The limits are pretty bad. I used 100% of my 5-hour usage in less than 1 prompt.
2. I specifically gave it a threejs task because it is an area that SOTA models have made big strides in, that other models just are not great at.
I asked it to build a replica of Rocket League. I'll put the prompt in the comments.
The game was pretty bad and notably worse than GPT 5.5.
Even after multiple fixes, it took 7-8 back and forth with Codex just to get it an almost playable condition. Prior to these fixes, the game was not playable.
Maybe it's really strong in other disciplines. I'd love to test that but I hit my limit in 1 prompt lol.
GPT 5.5 by contrast did a pretty good job and required no follow ups. Fable would have absolutely nailed this as well.
But yeah, early impressions...not great. But I hope I'm wrong. More testing tomorrow.















