5h ago

T3 Stack creator Theo Browne challenges Code Arena Frontend benchmark validity after Alibaba's Qwen3.7-Max ranked fourth

The model reportedly matches Claude Opus 4.6 on agentic tasks.

2936751742.0K

——0——

Original post

Since we're talking about good code benches today, here's a shitty one for reference

@theo are you trying to say muse spark isn’t better than 5.5?

are you feeling ok?

Theo - t3.gg@theo

Since we're talking about good code benches today, here's a shitty one for reference

11:02 PM · May 26, 2026 · 40.3K Views

11:08 PM · May 26, 2026 · 1.6K Views

Cluster engagement