to be clear, this is a closed source orchestrator on top of closed source models. if before you didn't control the models, now you don't even control which ones are used or how much. this is not "AI sovereignty"
i've also read the tech report to get an opinion on the technical stuff:
fugu (not the ultra version) is basically a classifier that selects which model at each turn is most likely to answer correctly (in other words a router). this leads to -10 points on SWE Bench pro compared to opus, gets some gains on other benchmarks but very slight. argument could be that it reduces cost, but no information about this so it's likely the opposite. they also have an autoresearch benchmark where they compare to frontier models "Model A, B and C" which is really crazy to not be transparent about what models you compare against. let's also say that this probably doesn't support adding new llm out of the box since you need to retrain the classifier
about fugu ultra, this is basically and advanced plan mode and orchestrator, this is a model that for a query outputs a plan with multiple "workflows". my understanding of workflows is that they say: "spawn model A subagents to achieve this, then use model B to judge it, then summarize this with model C" which is just a test time scaling compute strategy. i think this is an okish way to do it, but it's limited by the fact that they need to predict everything before the agents start working, which is why they limit this to 5 steps. imo you need to predict what to spawn at t+1 with the information you get at t, not with the info you get at t=0. there are also other issues such as fable 5 score on terminal bench being wrong and them being super vague and unclear about which model is in the LLM pool (they only mention closed source api one)
the biggest and most obvious issue is that they are introducing a "test time scaling" method with "best of N" over models, and they literally NEVER REPORT the number of output tokens or cost to achieve a benchmark/task
the good comparison here is not with opus, but it's opus with ultracode/workflows enable, not with kimi, but with kimi swarm ect.. very very confusing release
Introducing Sakana Fugu: A full multi-agent orchestration system accessible via a single model API.
Our ‘Fugu Ultra’ model matches the performance of Fable and Mythos, delivering frontier capability without the risk of export controls.
Try it: https://sakana.ai/fugu 🐡















