When your agents disagree

The multi-agent diagram on the slide deck is always clean. One orchestrator. Several specialists. Clean handoffs. Everything resolves.

Production is different. Agents work from different snapshots of state. One agent ships code while another is still analysing whether to write it. A third agent updates a document that the first agent already acted on. Nobody errors. Everything succeeds. The system is quietly incoherent.

This is the coordination problem that nobody talks about — not because it's rare, but because it doesn't announce itself. It just accumulates.

The failure mode that looks like success

Single-agent failures are obvious. The agent errors. The run fails. You get a stack trace. You fix it.

Multi-agent failures are subtler. Each individual agent succeeds. The work products exist. The issues are closed. But the overall output is contradictory — two agents made conflicting assumptions about the same decision, and neither one knew about the other's work until it was done.

We hit this pattern building our Paperclip setup. Two agents working the same codebase on adjacent tasks. Both ran clean. The PRs both passed CI. When you looked at them together, they made incompatible assumptions about a shared interface. Neither agent had enough context to see the conflict — they were each working from a snapshot of a moving target.

Why context windows make this worse

A human team has ambient awareness. You overhear a conversation. You see what's on someone's screen. You absorb context without actively seeking it.

Agents don't. Each agent's context window is intentional — whatever you explicitly put in is what they know. If you don't tell them what the other agents are doing, they won't know. And as the number of agents grows, the coordination surface grows faster.

Ten agents working in parallel don't just have ten times the surface area. They have 45 potential pairwise conflicts. You can't manually manage that. You need a structure that makes conflicts visible before they're committed.

What we actually do

A few things that work in our setup — none of them magic, all of them deliberate.

Issues as the coordination layer. Every agent action is attached to an issue. If two agents are about to touch the same thing, you can see it in the issue graph. We use Paperclip for this — not because the tool is special but because having a single place where all work is declared forces coordination. Agents that don't declare their work don't get to do it.

One agent per decision, not per task. The instinct is to assign tasks. The better instinct is to assign decisions. When multiple agents need to contribute to a piece of work, one of them owns the final call. The others feed in. This sounds obvious until you realise how rarely teams — human or agent — actually structure it this way.

Explicit handoffs, not implicit ones. When one agent finishes work that another agent depends on, the handoff has to be explicit. An issue comment. A status update. Something that changes the state of the world in a way the next agent will actually see. Agents running on heartbeat schedules won't see work that happened between their last context snapshot and now — unless you surface it to them deliberately.

The disagreement you want

Not all agent disagreement is a failure. Some of it is the system working.

We run strategy work through multiple agents on purpose. The CMO and the CTO will have different views on the same decision. A product agent and an engineering agent will weigh tradeoffs differently. That tension is useful — it surfaces assumptions, catches blind spots, produces better decisions than either agent would reach alone.

The disagreement you want is visible and resolved. The disagreement you don't want is invisible and silently acted on.

The difference between them is structure. Not smarter agents — structure. Who owns what decision. Where state lives. How handoffs happen. Get that right and disagreement becomes a feature. Get it wrong and it just compounds quietly until something breaks in a way that's genuinely hard to trace.

The coordination tax

There's a real cost to running multiple agents well. More setup upfront. More care about how you define roles. More attention to where the coordination surface is.

Teams underestimate this and then blame the agents when coordination breaks down. The agents are fine. The architecture was optimistic.

The correct mental model: multi-agent systems are distributed systems. They have all the failure modes of distributed systems — race conditions, stale state, split-brain scenarios — plus the additional complexity that the nodes are reasoning systems with their own context windows. The coordination patterns that work for distributed systems mostly apply here. Build accordingly.

Want more? I write about building with AI, ventures in progress and what actually works.

No spam. Unsubscribe any time.

Running agents at scale?

We build AI-native products and help teams ship faster with specialist agent systems. If that's your problem, let's talk.

Get in touch