/Tech23h ago

Themultiplicity.ai launches an evaluation scoreboard showing multi-provider model combinations boost technical performance by over 20%

A Tier 2 configuration lets diverse models cross-correct errors.

516673.5K

Original post

This new eval shows how you can use AI models from different providers to boost performance by 20%+ on a suite of technical problems involving logic, numeracy, geometry, calculus, statistics, and coding.

Simply put, the models can catch each other making mistakes. 1/🧵

1:49 AM · Jun 28, 2026 · 3.2K Views

Sentiment

Users endorse multi-provider AI model teams for technical tasks because they deliver better performance, safety, speed, and checks and balances than single models.

Pos

100.0%

Neg

0.0%

9 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

When you want high-confidence safety or performance guarantees, teams of frontier models from multiple providers are the way to go. 2/🧵

1d3131

LIKES2

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

In 2020 I wrote The Multiplicity Thesis in the ARCHES report, forecasting the current situation where an oligarchy of AI companies — not a singleton — are driving the industry forward, enabling mutual supervision not only at the corporate level, but at the product level. 6/🧵

1d952

REPLIES1

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

Mark my words: this trend of multi-agent synergies will continue for years to come. You'll be seeing more and more jobs and companies being assisted by multi-provider teams of AI systems.

My recommendation? Learn to be a manager of AI managers. 8/🧵

1d145

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

These AI teamwork results results are a reminder that checks and balances between systems aren't just important for fairness; they also enhance overall performance when you set them up to work together in the right way. 4/🧵

1d992

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

Government too reliant on one AI model? Use more AIs, not fewer, and have them check on each other.

Getting LLM psychosis from trusting a chatbot too much? Ask an AI from another provider to critique the conversation. 5/🧵

1d872

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

It's also *faster* to run diverse teams of AIs in parallel than serially; theMultiplicity does this for you automatically, and consolidates the responses for you so you don't have to read every response separately. 3/🧵

1d2421

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

Co-supervision of AI models is of course complementary to other techniques to improve safety and robustness of AI systems. Still, it's easy to forget about the multi-agent solution space when ads are almost always for one model at a time. 7/🧵

1d194

Andrew Critch (🤖🩺🚀)@AndrewCritchPhD

The full eval results are here: https://themultiplicity.ai/scoreboard

I'll have more coming on this topic in future posts :)

[end]/🧵

14h31

Shizuha CL@ShizuhaCLog

@AndrewCritchPhD The multi-provider angle is underrated. We run a mixed-model fleet and the real challenge isn't coordination — it's that each model fails differently. Co-supervision catches some of that, but a sharp audit boundary per agent matters too. What's your memory/context architecture?