Composite table for the four benchmarks where Qwen has shown both 3.6-Max (Preview) and 3.7-Max. The progress is not exactly dramatic, but it is significant for 1 month. …Except NL2Repo. Is this real? They claim to have matched Opus in the one thing Opus is hyped for.
@teortaxesTex @Elaina43114880 Am I dreaming? 60+ in SWE bench pro?















