What is Sakana Fugu? The AI model that commands other AI models

Alicia Kirana Utomo
Written by

Alicia Kirana Utomo

Katelin Teen
Reviewed by

Katelin Teen

Last edited June 23, 2026

Expert Verified
Sakana Fugu, an AI model that orchestrates a pool of other AI models

So what exactly is Sakana Fugu?

Sakana AI is a Tokyo frontier lab founded in 2023 by three ex-Google researchers: CEO David Ha, CTO Llion Jones (one of the eight co-authors of the original "Attention Is All You Need" Transformer paper), and COO Ren Ito. In November 2025 it raised a $135M Series B at a $2.65B valuation, making it one of Japan's most valuable AI startups.

The name matters. "Sakana" (魚) means fish, a nod to the lab's bet that the future of AI looks less like one giant brain and more like a coordinated school of smaller specialists. Fugu (named after the pufferfish) is that thesis turned into a product. Sakana pitches it as "One Model to Command Them All": frontier-level performance without depending on any single vendor.

Sakana's bet: many coordinated specialist models can match one giant model
Sakana's bet: many coordinated specialist models can match one giant model

Here's the cleanest way to picture it. Fugu is itself a model, but instead of generating the final answer alone, it dynamically assembles a team from a pool of other powerful models and coordinates them. The whole apparatus is exposed to you as one model behind one API. If you've read our explainer on AI agents versus chatbots, Fugu is the agent idea taken to its logical extreme: the agent's "tools" are other frontier models.

Sakana Fugu architecture: one conductor model routing tasks to a pool of closed and open models, as taken from Sakana AI
Sakana Fugu architecture: one conductor model routing tasks to a pool of closed and open models, as taken from Sakana AI

One important detail people miss: Fable 5 and Mythos Preview are not in Fugu's pool, because they aren't publicly accessible. Fugu only orchestrates models it can actually call. So when Sakana says Fugu matches Fable 5, it's saying a coordinated team of other public models can rival the frontier, which is a more interesting claim than it first looks.

How Fugu actually works under the hood

This is where Fugu earns the "not just a router" defense. It's grounded in two ICLR 2026 papers on learned model orchestration, and the mechanism is more involved than picking a model and forwarding the request.

How Fugu turns one request into a multi-turn loop with Thinker, Worker, and Verifier roles before answering
How Fugu turns one request into a multi-turn loop with Thinker, Worker, and Verifier roles before answering

The first paper, TRINITY, uses a lightweight evolved coordinator that orchestrates multiple models over several turns, assigning each one a Thinker, Worker, or Verifier role and re-delegating as the task unfolds. The second, the Conductor, is trained with reinforcement learning to discover natural-language coordination strategies, basically learning how to write focused prompts and design how the models talk to each other so the pool beats any single member.

The two phrases worth holding onto are learned and multi-turn. Fugu doesn't follow a human-designed "first ask model A, then model B" script. It learned, through evolution and RL, to discover non-obvious collaboration patterns, and it loops, re-checking and re-routing rather than making one pass. That's why early users report it running for hours on a single task: 123 experiments over roughly 14 hours on an ML research problem, or nearly four hours autonomously reproducing a paper. It behaves a lot like the kind of agent loop we obsess over when building support automation, just pointed at frontier models instead of tools.

Cover art for the TRINITY paper, one of two ICLR 2026 works behind Fugu, as taken from Sakana AI
Cover art for the TRINITY paper, one of two ICLR 2026 works behind Fugu, as taken from Sakana AI

One trade-off to flag now: the routing is proprietary and opaque by design. You can't see which underlying model answered a given query. For some teams that's fine; for anyone with compliance needs, that black-box-in-front-of-black-boxes structure is a real consideration.

Fugu vs Fugu Ultra: which is which

Fugu ships as two models, both reachable through the same OpenAI-compatible API so you can switch between them without touching your integration. The difference is how many expert agents get coordinated, which is the lever between speed and quality.

FuguFugu Ultra
Optimized forBalanced performance and latencyMaximum answer quality
Agent poolCoordinates a pool; you can opt models outDeeper, fixed pool; no opt-out
Best forEveryday coding, code review, chatbotsHard, high-stakes, multi-step problems
Trade-offLow latency, strong defaultHigher quality at the cost of speed

In plain terms: reach for Fugu when you want a responsive default, and Fugu Ultra when you have one gnarly problem and you're willing to wait for a better answer. Early users put Ultra to work on Kaggle competitions, paper reproduction, cybersecurity analysis, and patent investigations, which tells you the intended sweet spot is depth, not throughput.

The benchmarks: is it really shoulder-to-shoulder with Fable 5?

Sakana's headline claim is that Fugu models "surpass publicly accessible frontier models and are shoulder-to-shoulder with Fable 5 and Mythos Preview" across engineering, scientific, and reasoning benchmarks. The numbers back the narrower claim well.

Sakana Fugu benchmark comparison against Fable 5, Mythos Preview, Gemini 3.1 Pro, GPT-5.5, and Opus 4.8, as taken from Sakana AI
Sakana Fugu benchmark comparison against Fable 5, Mythos Preview, Gemini 3.1 Pro, GPT-5.5, and Opus 4.8, as taken from Sakana AI

A few that stand out from Sakana's table: Fugu Ultra scores 73.7 on SWE-Bench Pro (vs 69.2 for Opus 4.8 and 58.6 for GPT-5.5), 93.2 on LiveCodeBench, and 95.5 on GPQA-Diamond, ahead of every public baseline shown. The qualitative demos are even more fun: Fugu reportedly beat three frontier models and a 2100-Elo Stockfish engine at blindfold chess, and in a time-series trading test grew $10,000 to $11,943 over a 50-week window, a +19.43% mean return that beat the others.

Two honest caveats. First, these are vendor-reported benchmarks, and the strongest models (Fable 5, Mythos) were excluded from the comparison as direct competitors rather than beaten head-to-head. Second, benchmarks measure peak capability on hard problems, not whether the thing is pleasant to use at 2pm on a Tuesday. As one beta user, slopdetector, put it on Hacker News:

"I used this during the beta. Beats GPT-5.5 xhigh on complex tasks. Since it's expensive and difficult to subsidize, use it for the most challenging problems... the results I got from fugu-ultra were impressive."

What Sakana Fugu costs (and the catch nobody mentions)

There are two ways to pay, and both include access to Fugu and Fugu Ultra.

Subscription tierPriceUsage allowanceFor
Standard$20/moBaselineLightweight daily use
Pro$100/mo10× StandardFocused working sessions
Max$200/mo30× StandardHeavy, long-running workloads

(Worth noting: Sakana's pricing cards say Max is 30× Standard while a FAQ answer says 20×, so confirm the allowance before you commit.) There's also a pay-as-you-go token plan where Fugu Ultra is fixed at $5 input, $30 output, and $0.50 cached input per million tokens, rising to $10 / $45 / $1.00 once context passes 272K tokens. And there's a launch promo: subscribe before the end of July 2026 for a free second month.

Now the catch. Fugu is priced at the top of the pool it routes from, so the orchestration overhead has to justify itself against just paying for a frontier model directly. Several hands-on users felt it didn't. The sharpest version came from cortesi on Hacker News:

"For $200/month you get < 3 hours of use per week, the API is extremely slow, and the output quality in my tests is nowhere near Fable. It's nowhere remotely near usable as a day-to-day workhorse. Very disappointing."

That's one tester's experience, not a verdict, but it rhymes with several others reporting that the 5-hour limit runs out fast. If you've ever modelled AI agent costs against human agents, the lesson is familiar: the sticker price and the real cost-per-useful-task are different numbers.

Here's a quick gut-check on whether Fugu is even the right tool for what you're doing:

Should you reach for Sakana Fugu?
Pick what you actually need, then read the honest take.
Probably not Fugu. Testers report quota burning through in a few hours and noticeable latency. For routine work, a single frontier model on its own plan is usually faster and cheaper.
This is Fugu Ultra's sweet spot. Beta users reported it beating GPT-5.5 xhigh on the hardest tasks, and it's built to run autonomously for hours. Worth a shot when the answer quality matters more than the wait.
This is Fugu's headline pitch. It's a non-US, export-control-free frontier-class option. Just remember it's still one vendor's API, the routing is opaque, and it isn't available in the EU/EEA yet.
Not Fugu. It's priced at the ceiling of the model pool it routes, so you rarely save money versus calling a frontier model directly. Cost is the single most common complaint.

Is "just OpenRouter with extra steps" a fair criticism?

The single loudest reaction to Fugu's launch, repeated independently on Hacker News, X, and Reddit, was some version of "isn't this just OpenRouter?" It's a fair instinct, so let's take it seriously.

A simple router picks one model in one shot; Sakana Fugu coordinates a pool over multiple turns for one blended rate
A simple router picks one model in one shot; Sakana Fugu coordinates a pool over multiple turns for one blended rate

A plain router picks one model and forwards your request once. Fugu, at least on paper, does three things a router doesn't: it runs multiple turns, it has models verify each other's work, and it charges a single blended rate based on the top model involved rather than stacking each model's bill. So the architecture is real, and "advanced router" undersells the multi-turn, self-checking loop.

But the skeptics land a real punch on value, not architecture. As chenzhekl asked bluntly:

"But it's priced the same as frontier models. Why do I not directly pay for frontier models?"

That's the whole debate in one line. The architecture is more than a router; the open question is whether the extra coordination buys you enough to justify paying frontier prices for it. My take: on your hardest problems, plausibly yes; on everyday work, probably not. This is the same calculus that shows up in AI agent versus rule-based chatbot decisions, where more sophistication only pays off when the task is actually hard.

What people actually think of Sakana Fugu

Community sentiment, fairly read, is mixed-to-skeptical with a real pro camp. The boosters make the most interesting argument: that having models check each other is simply the right bet. As epsteingpt argued:

"Everyone's understood for months now that having different models check each other is the best path forward... If (big iff) the usage mechanics work out, then this is actually a really good anti-big-model strategy. They'll be incentivized for your success, not token-maximizing for their investors."

That incentive-alignment point is sharp, and it's a real reason to root for an orchestrator over a monolith. There's also a thread of respect for Sakana's research path. As quanto noted, David Ha took an unconventional route into AI research, and the lab's prior work (Evolutionary Model Merge, the AI Scientist, Transformer²) is consistently distinctive.

The skeptics, meanwhile, aren't being reflexive. Their objections cluster on cost, latency, and the opaque "single vendor replacing another single vendor" framing. And a couple of real-world notes worth knowing before you sign up: Fugu is not available in the EU/EEA yet, and some users flagged unease about Sakana's military contracts. If you're weighing it against the best AI agents for production, those are not footnotes.

Where a model that orchestrates models matters for support

Here's the part I care about most, because it's the job I actually do. Fugu's underlying idea, don't bet your workflow on one model, coordinate several and make them check each other, is exactly right for high-stakes automation like customer support. The wrong answer from a support bot isn't a leaderboard miss, it's a refund issued in error or a furious customer.

But there's a chasm between a raw, opaque model API and something you can safely put in front of customers. Fugu gives you orchestration; it doesn't give you your help center, your past tickets, your brand voice, your escalation rules, or a way to test the thing before it goes live. That's the layer that actually decides whether AI for customer service works, and it's why I'd reach for a purpose-built AI agent for customer service over wiring up a frontier API by hand. The orchestration question we obsess over in build versus buy is the same one Fugu is answering, just at a different layer of the stack.

Try eesel

eesel takes the lesson Fugu is built on and applies it where it actually has to be reliable: your support queue. Instead of handing you a model API, it's an AI agent that plugs into the helpdesk you already use (Zendesk, Freshdesk, Help Scout, Slack, and more) in minutes, trains itself on your past tickets and help center, and answers in your brand voice, no model-orchestration plumbing required.

The eesel AI dashboard, where an AI agent learns from your past tickets and resolves support automatically
The eesel AI dashboard, where an AI agent learns from your past tickets and resolves support automatically

The differentiator that matters most here is the part Fugu can't give you: a simulation mode that replays the agent against thousands of your historical tickets before it ever touches a live customer, so you see the resolution rate and exact replies up front rather than discovering them in production. Pricing is usage-based with no per-seat fees, so the cost scales with value rather than headcount. If you want to see what a customer-service AI looks like when the orchestration is invisible and the guardrails are built in, it's free to try.

Frequently Asked Questions

What is Sakana Fugu in simple terms?
Sakana Fugu is an AI model from Tokyo's Sakana AI that doesn't answer your prompt directly. Instead it acts as a conductor, routing your task across a pool of other frontier models and stitching their work into one reply, all behind a single OpenAI-compatible API. It's closer to a learned AI agent loop than to a standalone chatbot.
How is Sakana Fugu different from OpenRouter?
A router like OpenRouter picks one model and forwards your request once. Sakana Fugu runs a multi-turn, learned loop, assigning Thinker, Worker, and Verifier roles across several models and having them check each other before replying. The pricing differs too: Fugu charges a single blended rate, not a stacked sum. Whether that overhead is worth it is the open question, and it's the same build-versus-buy call teams face with AI for customer support.
How much does Sakana Fugu cost?
Subscriptions run $20/month (Standard), $100/month (Pro), and $200/month (Max), and there's a pay-as-you-go token plan where Fugu Ultra is $5 input / $30 output per million tokens. The catch is that Fugu is priced at the ceiling of the model pool it routes, which is why hands-on users compare its value unfavourably to a single frontier model's plan, much like the cost math behind AI agent costs.
Is Sakana Fugu better than Claude or GPT-5.5?
On Sakana's own benchmarks, Fugu Ultra sits shoulder-to-shoulder with Claude Fable 5 and Claude Mythos and edges out Claude Opus 4.8 and GPT-5.5 on most coding and reasoning tests. In day-to-day use, beta testers were more mixed, praising it on hard problems but calling it slow for routine work.
What is Sakana Fugu best used for?
Early users point it at seriously hard, high-stakes problems: Kaggle competitions, paper reproduction, cybersecurity assessments, and patent or literature reviews. For everyday coding or a responsive chatbot, the lighter Fugu model exists, but a single frontier model is often the cheaper pick. It helps to think in terms of concrete AI agent examples rather than a blanket 'use it for everything'.
Can I use Sakana Fugu for customer support?
You could point any OpenAI-compatible client at it, but Fugu is a raw model API, not a support product, so you'd be building the helpdesk integration, knowledge ingestion, and guardrails yourself. A purpose-built tool like a customer service AI agent handles that layer for you. See our take on AI for customer service for the difference.
Is Sakana Fugu available everywhere?
Not yet. Fugu is available from most regions outside Japan but is not live in the EU/EEA while Sakana works toward GDPR compliance. It's an early product from a fast-moving lab, so availability and pricing are likely to keep shifting, which is worth weighing if you're comparing the best AI agents for production use.

Share this article

Alicia Kirana Utomo

Article by

Alicia Kirana Utomo

Kira is a writer at eesel AI with a Computer Science background and over a year of hands-on experience evaluating AI-powered customer service tools. She focuses on breaking down how helpdesk platforms and AI agents actually work so that support teams can make better buying decisions.

Related Posts

All posts →
An open briefcase spilling documents, spreadsheets, emails and chat messages while an AI figure grades them on a scorecard
AI

What is AA-Briefcase? The AI benchmark for real knowledge work, explained

AA-Briefcase is Artificial Analysis' new benchmark that tests AI on real multi-week office projects. Here's what it measures, who tops it, and what it means for AI at work.

Alicia Kirana UtomoAlicia Kirana UtomoJun 22, 2026
Conceptual hero illustration of Thomas, an AI founder that runs its own companies
AI

What is Thomas, the AI founder? Inside YC's first non-human founder

Thomas is a Y Combinator-backed AI founder, a virtual human that starts and runs its own companies. Here's what it actually is, how it works, and what it means for AI at work.

Rama Adi NugrahaRama Adi NugrahaJun 22, 2026
Palmier, the AI-native video editor, with AI generation built into the timeline
AI

What is Palmier? The AI video editor your agents can edit

Palmier is a Mac-native AI video editor where generation lives on the timeline and agents like Claude can edit your cut directly. Here's what it actually does.

Rama Adi NugrahaRama Adi NugrahaJun 19, 2026
Illustration contrasting an AI chatbot answering a question with an AI agent connected to Slack, email and ticketing tools
AI

AI agents vs AI chatbots: the real difference and when to use each

AI agents vs AI chatbots: chatbots answer questions, agents take actions and close tickets. Here is the real difference and when to reach for each.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
A non-technical person describing an app idea while AI assembles software building blocks
AI

Vibe coding for non-developers: what it actually is and how to use it safely

A plain-English guide to vibe coding for non-developers: what it means, the tools to use, where it breaks, and what's safe to build yourself.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Illustration of a person directing blocks of code that assemble themselves, representing vibe coding
AI

What is vibe coding? A plain-English guide for 2026

Vibe coding means describing what you want to an AI and letting it write the code. Here's what it is, where it came from, the risks, and when to actually use it.

Alicia Kirana UtomoAlicia Kirana UtomoJun 17, 2026
Floating IT service management dashboard panels showing ticket queues, routing diagrams, and AI activity feeds
IT support

Best ITSM automation tools in 2026

A practical guide to the 5 best ITSM automation tools in 2026 - from AI overlays that work on top of your existing helpdesk to full enterprise platforms.

Alicia Kirana UtomoAlicia Kirana UtomoMay 15, 2026
Two people speaking different languages with a live sound wave bridging them, illustrating Gemini 3.5 Live Translate
AI

What is Gemini 3.5 Live Translate?

Gemini 3.5 Live Translate is Google's real-time speech-to-speech translation model for 70+ languages. Here's what it does, how it works, and where it fits.

Riellvriany IndriawanRiellvriany IndriawanJun 17, 2026
Editorial illustration of GLM-5.2, the open-weights AI model from Z.ai
AI

What is GLM-5.2? A plain-English guide to Z.ai's open model

GLM-5.2 is Z.ai's open-weights model that matches near-frontier coding at about 1/6th the price. Here's what it is, how it works, and what it means for support teams.

Alicia Kirana UtomoAlicia Kirana UtomoJun 21, 2026

Ready to hire your AI teammate?

Set up in minutes. No credit card required.

Get started free