/Tech10h ago

OpenRouter launches Fusion API to mix outputs from multiple LLMs, claiming Fable-level performance at half the cost

Terminal Bench 2.0 shows combining models hits 92% success.

393121623269.4K

#1510

Original post

Jerry Liu@jerryjliu0

This is an insane release from OpenRouter, and not just because it's perfect timing.

It shows that frontier models alone do not own all the points on the cost-accuracy Pareto curve for knowledge work tasks; in fact they may not be on the Pareto curve at all. The Pareto curve may be defined by a mixture of models, which any independent third-party (e.g. an AI startup) has access to but the model labs do not.

It's also surprising because this feature seems extremely horizontal and is not even well-tuned for a specific task. You can prompt the Fusion API with anything. This just means that for any given workflow subset, there's even greater alpha to exploit, by hillclimbing a task-specific benchmark. The more specific the workflow, the more hillclimbing you can do.

This should be pretty obvious with a practical example - if you're trying to automate invoice reconciliation at scale, you can be orders of magnitude cheaper and more reliable than "raw" Claude by tuning an agentic workflow with a mixture of models for document extraction, line-item validation, and contract matching.

That alpha is what's exploitable by any company out there that's not a frontier lab.

OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

How it works 👇

8:35 PM · Jun 14, 2026 · 68.5K Views

Sentiment

Many users praise OpenRouter's Fusion API for showing model mixtures outperform single frontier models on cost and accuracy, while others report inconsistent results or downsides like cache busting.

Pos

76.2%

Neg

23.8%

22 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.1KREPLIES1

andrew gao@itsandrewgao

neat thing I found months ago: if you take the union of terminal bench 2.0 tasks solved by opus 4.6, gpt 5.4, and gemini 3 pro, you get 92% vs. 75% for the leading model at the time

OpenRouter@OpenRouter

Introducing the Fusion API, the smartest compound model in the market.

Fusion achieves Fable-level intelligence at half the price.

How it works 👇

2h1.1K22

BOOKMARKS3

Jun Kim@junkim_100

@jerryjliu0 This is basically the idea behind something I open-sourced. A bunch of people wanted to actually try model fusion in their Claude Code, so I built gavel. It runs Claude + Codex + Gemini on the same task in parallel and has Claude fuse their answers. https://github.com/junkim100/gavel

13h18533

LIKES5

Luckey Faraday@luckeyfaraday

@jerryjliu0 I tested it and it does not give the same results when used for coding

I used MiMo-v2.5, Deepseek-v4-flash and Qwen3.7-plus and the combination did worse than Qwen3.7-plus standalone

Also note that it was significantly more expensive

16h2045

Jesus Bosch 🚀@Jbosch_

@jerryjliu0 You don't even need a router. Simply have a second LLM review and critique the plan from the first LLM. This approach frequently leads to major improvements, helps avoid problems, and reduces the total number of reasoning cycles.

23h23221

Philip C@Aknotymous

@jerryjliu0 I am taking this release with a massive grain of salt. Do you really think OpenAI, Anthropic, even Meta / xAI would miss something so obvious? Not to mention the benchmarks not passing the smell test.

1d53611

cherryMin-Sora@cherryMin8

@jerryjliu0 这不是骗子是什么？

1d1231

Stage 11@Stage_11

@jerryjliu0 We have been doing multi-model code reviews that have significantly outperformed anything available on the market. This release from OpenRouter confirms the intuition.

22h1141

Golden Hippie@gamestoneai

@jerryjliu0 Labs built the models and gave away the orchestration advantage. Classic.

1d297

Ferbin@Ferbin08

@jerryjliu0 Yeah, this. Most of the work is routine: summaries, retrieval, basic rewrites. You don't need top-of-the-line for that.

The economics flip the moment you route by task type.

1d248

Massively Parallel Procrastinator@SHELLEYBLEND

@jerryjliu0 I too was like "What?"! Great thinking on the side of the @OpenRouter Yet I am going to try at local level!

19h187

Ben Vargas@benvargas

@jerryjliu0 but it also shows deepseek is better than gpt-5.5 🙄🤥

1d170

Nick Venturi@nickventuri

@jerryjliu0 size doesn't mean shit anymore

23h130

Daniel Smidstrup@DanielSmidstrup

@jerryjliu0 huge if true, compound models feel very underrated

17h127

Muhammet Aksoy@aksoym7

@jerryjliu0 Pretty much. That's what Documind does (http://www.documind.cloud) to process documents. No need to settle for a single LLM, other can cost, which is bound to be diminished in the future plus can be optimized for different verticals.

18h90

Saeed Anwar@saen_dev

@jerryjliu0 The Pareto frontier argument is sharp. The problem is compound systems have compounding failure modes, so you're trading model predictability for ensemble accuracy and that trade-off bites hard in regulated use cases.

22h86

Pouya Amiri@p0u4aa

@jerryjliu0 One thing not many people are talking about is how this approach is an absolute cache buster

15h84

Gaoge@gaogezh

@jerryjliu0 三个臭皮匠是顶不了一个诸葛亮的

15h241

Anees Merchant@aneesmerchant

This matches what shows up in production. For most knowledge-work tasks a routed mix of smaller models clears the bar at a fraction of the cost, and the frontier model only earns its price on the genuinely hard steps. The skill is knowing which step is which. Treating one model as the answer to every task is how budgets quietly blow up.

22h73

newX@EhzIhsUx

@jerryjliu0 如何进行编排的？

20h71

sean abreau@sean_abreau

@jerryjliu0 Huh? You guys are awesome but the chart makes no sense

12h44