u/Automatic-Pattern326

I've been building multi-agent systems for a while — running a 40-agent team on a real product at work. The pattern I kept seeing fail was the same one most public setups use: one agent reviews code, decides if it's good, and routes the outcome. All three jobs, same agent.

It rubber-stamps. Same perspective writes the advice and decides the verdict — there's no tension anywhere in the loop.

I started as a developer, moved into PM, then came back to engineering. Being on both sides taught me what real teams actually do — and it's not one person owning every decision. The reviewer doesn't decide what ships. The PM doesn't write the security review. The PO synthesizes — they don't produce the findings themselves. Specialization plus handoffs is what makes sprints actually work.

So I extracted that pattern and open-sourced it.

agile-team-skill — 7 agents inside Claude Code, each with one job:

QA — tests + acceptance criteria. Hard veto. Chain stops if it fails.
PR reviewer — correctness, patterns, dead code.
Security — OWASP, secrets, CVEs, auth, input validation.
Tech lead — architecture, debt, complexity.
PO — synthesizes everything into one verdict: fix now / backlog / won't fix.

The PO never reviews. The reviewers never decide outcomes. QA gates everything before the other three even run.

The thing I didn't expect: persistence mattered as much as separation. Without NEXT.md, STATE.md, BACKLOG.md persisting across sessions, every standup was just chat with no memory. Once state persisted, the team had institutional knowledge. This morning my standup flagged Sprint 3 as "at risk — same gate as Sprints 1 and 2." It noticed the pattern across three sprints. Single-session agents can't do that.

You also get sprint planning with real dev capacity commitment, retros that produce backlog items, tech debt that becomes a story the moment it's introduced. One slash command per ceremony. No dashboards, no setup tax.

Genuinely curious what others are doing for the producer/synthesizer split — and whether anyone's found good patterns for keeping reviews sharp over hundreds of runs.

Most multi-agent setups have one agent do everything — write the suggestion, decide the verdict, route the outcome. Here's what changed when I split them.