This is the first "AI company" product I've seen that doesn't feel like pure cosplay.
Two interesting points:
Matrix treats the company idea seriously. You are not just creating agents and hoping they coordinate. Matrix beat both Codex and Claude Code on GDPval-Bench, with 95.45% against 84.9% and 80.3% respectively.
That gap seems to matter most on longer tasks, where planning and coordination actually decide the outcome rather than raw model capability.
Which is maybe the point. A lot of "AI companies" are really just prompt orchestrators with a nice UI. Matrix looks like it's building something closer to an actual operating layer. Whether that holds up beyond benchmarks, I don't know yet. But it really makes me want to find out.