We tested single-agent vs multi-agent on a real enterprise task. Single agent was 10-20x cheaper and the only one that got the right answer.
I'm building an open-source multi-agent framework and spent last few weeks testing it against a real Enterprise solution design task — not a toy benchmark, an actual enterprise ticket requiring cross-referencing Jira comments, Java source code, Process Flow Config XMLs, and Confluence design docs to produce a correct technical document.
The setup:
- 4 specialist worker agents (Jira researcher, code analyst, config analyst, docs researcher) coordinated by an architect agent, with a synthesizer combining everything
- Each worker had focused MCP tools for their domain
- We tried 4 different multi-agent configurations over multiple days
What happened with multi-agent (4 attempts):
| Attempt | Core Error |
|---|---|
| 1 | Invented an Attributes that doesn't exist |
| 2 | Misclassified the ticket as a different initiative entirely |
| 3 | Got the actual Ticket intention wrong. |
| 4 | Imported scope from a different ticket (which had similar name) |
Each attempt used 30,000-70,000 tokens across different tools and agents. Each made a different fundamental error.
What happened with single agent:
- One agent with ALL tools (Jira + code + CDT + Confluence + output) in one context window
- Kimi K2.6 (cheap model, $0.73/1M input)
- Only 3,454 tokens total
- First doc to correctly identify the actual problem, name the right code sites, quote the right Jira comments, and recommend fixes.
It wasn't perfect but only needed minor fixes to make the solution workable.
Based on all the agent logs and traces which were captured at each agent level, here's my understanding of why multi-agent failed:
The task required connecting dots across multiple sources. A Jira comment mentions a class name -> read that class -> find it references a Config XMLS -> fetch that config -> discover a condition that gates the behavior the ticket wants to change. This chain of reasoning needs to happen in ONE context window.
Bue what wsa happening with multi-agent:
- Worker A finds the Jira comment but doesn't know about the code
- Worker B reads the code but doesn't know which Jira comment matters
- Worker C fetches Process Flow Config XMLss but doesn't know which code path to trace
- The architect gets summaries from each and tries to connect them — but summaries lose the specific details that matter
Information is really getting destroyed at every handoff. The architect is reasoning over shadows of the actual data and a lot of information was not even fetched because full information was never in the single context to work on.
This experience gave me the insight on when multi-agent just doesn't work:
- Anything requiring cross-source reasoning (solution design, root cause analysis, debugging)
- When the total data fits in one context window (most enterprise tasks)
- When coordination cost (token overhead, summarization loss) exceeds the parallelism benefit
When multi-agent DOES make sense:
- True parallelism on independent tasks (monitoring multiple services, processing document batches)
- Scale beyond one context window (millions of log lines need filtering before reasoning)
- Each agent has a genuinely independent domain (home automation: lighting agent, HVAC agent, security agent)
The final takeaway I could get from the full experiment would be:
The value isn't in agent count — it's in good tools and skills that give the model the right context. MCP servers, structured search, code reading tools — these are what made the single agent succeed. Adding more agents just added more ways to lose information.
Multi-agent is a tool, not a goal. Use it when parallelism genuinely helps. Default to single agent with good tools for anything requiring deep reasoning.
What have been your experience with multi-agents and where have they really worked and where they have failed.