Tested a 3-agent vs 5-agent pipeline on the same task and results weren't what I expected.
I recently ran an experiment comparing a 3-agent pipeline vs a 5-agent pipeline on the exact same workflow. For the first task, the 3-agent pipeline resulted in 86% task completion, and the 5-agent pipeline gave a 91% task completion rate.
This sounds great until I looked at the tradeoff. The 5 agent pipleline was ~40% slower and was twice as expensive to run. For this use case, the extra 5% completion rate wasn’t worth the latency + cost hit.
But then we tested the same architectures on a different task: research synthesis. And the results completely flipped. The 5-agent version consistently caught reasoning gaps and factual misses that the 3-agent setup let through. The additional reviewer/checker agents actually mattered there.
Big takeaway for me - there’s probably no universal answer to what the ideal number of agents is. Also, more agents don't always mean better outcomes.
It seems heavily dependent on the type of task, error tolerance, latency constraints, and where failures actually happen in the workflow
Curious how others here are deciding agent topology in production. Are you relying on any benchmarks, eval datasets, or production traffic experiments?