Sakana Fugu Hides the Orchestrator. That's a Problem for Agent Builders.

A tangled knot of fiber optic cables plugged into a patch panel, with some cables neatly organized and others spilling out in chaos.

Sakana released Fugu last week. It promises a complete multi-agent setup behind a single model API. You send a prompt, and Fugu Ultra does the routing, sub-agent calls, reasoning, and tool execution before returning a result. You integrate one endpoint and pay one bill.

It looks great in a demo. In production, it's a black box stacked on six other black boxes.

The orchestration you can't see is the orchestration you can't fix

We build agent systems for startups and client projects. The cycle repeats every time: version one works, version two breaks in strange ways, and version three is when you actually ship. You only get to version three because you've instrumented the system heavily. You know which agent made which call, which prompt generated which tokens, where latency spiked, and where the model hallucinated a tool name and failed silently.

Fugu claims you don't need to worry about any of that. The model handles it internally. That works until something breaks, and it will. A sub-agent in Fugu's chain might call a tool missing from your environment. The orchestrator might pick the wrong sub-agent for a borderline task. The system might burn 40K tokens of internal reasoning on a 4K token job because the routing hit an edge case. You won't see the failure. You'll just see a slow response and a higher bill.

We aren't guessing. We've dealt with this in every opaque agent framework we've tried. Frameworks that hide internal state behind a clean API cost the most debugging time under real load.

What Fugu gets right, and what to steal from it

The core idea behind Fugu is sound. A single model can internally manage a mixture of specialized sub-models. Sakana has pushed this direction for a while, and the architecture makes sense. Specialized models beat generalists on narrow tasks. Dynamic routing beats hardcoding pipelines. The problem isn't the architecture. It's the packaging.

Here's what we've learned building multi-agent systems in production:

You need visibility into every hop. If agent A calls agent B which calls a tool which feeds back to agent A, you need to see that trace. Always. If the framework can't give you a structured log of every internal call, it's a prototype, not infrastructure.
Routing decisions need to be overridable. Fugu's internal orchestrator decides which sub-agent handles which part of your request. In production, you'll have cases where that decision is wrong, and you need a way to intervene. A black-box router you can't configure is a liability.
Cost attribution has to be granular. When the API returns a single token count, you can't tell which part of the orchestration ate your budget. On a startup project, we caught a sub-agent generating 3x the tokens it needed because its system prompt had drifted. We caught it because we had per-agent cost tracking. Fugu's single-API model makes that impossible.

Use Fugu for prototyping. Build your own orchestration for production.

Fugu has a real role here. If you're validating an idea or testing whether a multi-agent approach solves your problem, Fugu is a fast way to find out. Send a prompt, get a result, and check the quality. That saves weeks of scaffolding.

But the moment you need reliability, the moment a human depends on the output, or costs matter, or you run at volume, you need your own orchestration layer. You need explicit agent definitions, observable routing, per-agent token budgets, structured logging on every call, and the ability to swap models without breaking the chain. That is the difference between an agent system that ships and one that gets quietly shelved after the third incident.

Fugu shows the industry converging on multi-agent architectures. That validates the approach we've been building on. But the abstraction it offers, one API with no internals, is the exact opposite of what production agent systems need. The teams that win won't be the ones with the cleanest API. They'll be the ones who can see inside the box.

Want more? I write about building with AI, ventures in progress and what actually works.

No spam. Unsubscribe any time.

Building agent systems you can actually debug?

Kerber AI designs and operates multi-agent systems with full observability, per-agent cost tracking, and routing you control — for our own ventures and for client products.

Let's talk