Multi-Agent Systems
Executive Summary
Multi-agent systems distribute complex work across specialized agents — each with a focused set of tools, a narrow domain of responsibility, and a well-defined interface with other agents. They are the architectural response to the limitations of single-agent systems: context window saturation, tool count limits, parallel work requirements, and specialization needs. This chapter covers the three primary multi-agent topologies (orchestrator-worker, hierarchical, and peer-to-peer), communication patterns, shared state management, and the trust model that governs inter-agent interactions. AI architects designing enterprise automation platforms and engineering leaders evaluating agentic AI scale-out strategies should read this chapter.
Learning Objectives
- Identify the conditions that justify multi-agent architecture over a single agent
- Describe the three primary multi-agent topologies and their appropriate use cases
- Design an orchestrator-worker system with explicit task delegation and result aggregation
- Define trust boundaries and authorization policies for inter-agent communication
- Evaluate the operational complexity cost of multi-agent systems
Business Problem
Single-agent systems break down at scale in four ways:
- Context saturation: A 20-tool agent processing a complex research task accumulates tool results that overflow even large context windows
- Specialization limits: One agent cannot simultaneously be an expert in clinical criteria, payer policies, prior authorization workflows, and EHR data structures
- Sequential bottleneck: When subtasks are independent, a single agent doing them sequentially wastes time that parallel execution could save
- Error blast radius: A single agent error affects the entire workflow; specialized sub-agents fail locally without corrupting the overall state
Multi-agent systems solve these problems by decomposing work across specialized agents that collaborate — at the cost of coordination overhead that must be explicitly designed.
Why This Technology Exists
Early agent systems (2023) hit a practical ceiling: a single agent with 30 tools, operating over hours, accumulating hundreds of tool results, consistently produced context overflow errors, degraded reasoning quality, and unpredictable behavior. The solution, borrowed from distributed systems architecture, was decomposition: break the problem into sub-problems, assign each to a specialized component, and coordinate via explicit interfaces.
The parallel to microservices is instructive: a monolithic service can do everything, but it becomes unmaintainable at scale. Microservices decompose by bounded context and communicate via defined APIs. Multi-agent systems decompose by reasoning context and communicate via structured messages. The same engineering principles apply: single responsibility, clear interfaces, independent deployability, and observable communication.
Conceptual Explanation
When Multi-Agent Architecture is Warranted
Three conditions justify the coordination overhead of multi-agent systems:
- Work can be parallelized: Independent subtasks that could proceed simultaneously are being bottlenecked in a single-agent sequential loop
- Tool count exceeds ~15 per agent: Tool selection accuracy degrades significantly above this threshold; specialization restores precision
- Domain specialization produces meaningful quality gains: A clinical agent trained on clinical system prompts and clinical tools outperforms a general agent given all tools
The Three Topologies
Orchestrator-Worker: A central orchestrator agent decomposes the goal, delegates subtasks to specialized worker agents, and aggregates results. Workers report back to the orchestrator; they do not communicate with each other directly. Best for: tasks with clear decomposition, moderate parallelism, and sequential dependency between phases.
Hierarchical: Orchestrators can themselves be orchestrated. A top-level coordinator delegates to sub-orchestrators, which delegate to workers. Best for: complex enterprise workflows where a single orchestrator would have too many responsibilities.
Peer-to-Peer (Specialist Handoff): Agents pass tasks to each other without a central coordinator. Agent A determines that a task is outside its domain and routes it to Agent B. Best for: specialist consultation workflows where the routing logic is embedded in each agent's expertise.
Architecture Diagram
Standalone diagram: architecture/mermaid/02-multi-agent-topology.mmd
Enterprise Considerations
Coordination overhead is real. Every delegation from orchestrator to worker is an LLM call (latency + cost). A 4-worker prior auth workflow with 2 turns each uses ~9 LLM calls total. At frontier model rates, this can be 10–20x the cost of a single-agent equivalent. Model tier selection is the primary lever: use a small model (Haiku-class) for workers on focused tasks; reserve Opus-class models for the orchestrator's complex planning and aggregation steps.
Failure propagation. In a single-agent system, tool failures are handled by the agent's error handling. In a multi-agent system, worker agent failures must be propagated to the orchestrator in a structured form that the orchestrator can reason about and handle. Design explicit failure modes: partial results, timeouts, hard errors, and rate limit backoffs.
Observability is harder. Tracing a workflow that spans 5 agents and 20 tool calls requires distributed tracing infrastructure. Each agent invocation should carry a correlation ID linking it to the parent workflow. See Chapter 8: Agent Observability.
Agent versioning. When a worker agent's behavior changes (prompt update, tool update), it can break orchestrators that depend on its output format. Version worker agents and test orchestrator-worker compatibility before deployment.
Healthcare Example
Educational Example — Illustrative Workflow. Not intended for clinical decision making.
A Reference Healthcare Organization's prior authorization multi-agent system uses a hierarchical topology:
Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.
The Payer Submission Agent is a separate agent, not an additional tool on the orchestrator, because: (a) its single responsibility is payer submission, (b) it has only one tool (submit<em>to</em>payer), and (c) it always requires a HITL gate before its tool can be called. Separating it makes the HITL requirement explicit in the architecture.
Common Mistakes
Creating too many agents too early. Start with a single agent and extract to multi-agent only when a specific, measurable limitation is encountered. Premature decomposition adds coordination overhead without benefit.
Workers that are too general. A "Research Worker" that can do anything defeats the purpose of specialization. Workers should be narrowly focused: one domain, one set of tools, one type of task.
No explicit failure handling at the orchestrator level. When a worker returns an error, the orchestrator must be designed to handle it (retry, fallback, skip, escalate) — not just forward the error to the user.
Circular dependencies. Worker A calls Worker B which calls Worker C which calls Worker A. Without careful design, multi-agent systems can introduce deadlocks. Map the dependency graph before implementation.
Best Practices
- Start with a single agent; extract to multi-agent only when a specific limitation is measured
- Workers should have a single responsibility: one domain, ≤10 tools, one output type
- Use small models for workers on focused tasks; use frontier models for orchestrator planning
- Carry a correlation ID through all agent invocations for distributed tracing
- Design explicit failure handling at the orchestrator for each class of worker failure
- Version worker agents and test orchestrator-worker interface compatibility before deployment
- Gate all External-class tool calls behind HITL, regardless of which agent makes the call
Alternatives
| Approach | When to Choose | Trade-off |
|---|---|---|
| Single agent | Task fits in one context window; <15 tools needed | Simpler; no coordination overhead |
| Sequential chain | Task has no parallel work; steps are known upfront | Predictable; no dynamic decomposition |
| Orchestrator-worker | Parallelizable subtasks; clear role separation | Coordination overhead; requires failure handling |
| Hierarchical multi-agent | Complex workflows with multiple layers of decomposition | Maximum scalability; highest operational complexity |
| Peer-to-peer handoff | Specialist routing; each agent decides when to escalate | Flexible; requires careful loop prevention |
Trade-offs
| Dimension | Advantage | Cost |
|---|---|---|
| Specialization | Agents excel in their domain | Coordination protocol required |
| Parallelism | Independent tasks proceed simultaneously | Shared state management complexity |
| Scale | Context saturation avoided | Inter-agent latency overhead |
| Resilience | Worker failures are local | Failure propagation design required |
| Observability | Each agent's behavior is auditable | Distributed tracing infrastructure required |
Interview Questions
Q1: When does a single agent become a multi-agent system, and how do you make that decision?
Category: Architecture / System Design Difficulty: Principal Role: AI Architect
Answer Framework:
Three specific conditions justify the transition: (1) tool count exceeds ~15, degrading selection accuracy; (2) context saturation occurs frequently in production — the single agent's context window fills before the task completes; (3) there is parallel work that is being serialized unnecessarily.
The decision process is empirical, not intuitive. Measure: what is the agent's tool selection error rate? What is the frequency of context overflow? Is there measurable latency from sequential execution of independent tasks? If no specific, measured problem exists, the agent is not ready for multi-agent decomposition.
The transition adds coordination overhead (latency, cost, failure handling complexity). If you cannot articulate which specific limitation you are solving and how the multi-agent architecture addresses it, you are adding complexity without benefit.
Red Flags: "Multi-agent is just better" — not true. "We're planning for scale we don't have yet" — premature optimization.
Q2: How do you establish trust boundaries between agents in a multi-agent system?
Category: Security / Architecture Difficulty: Principal Role: AI Architect
Answer Framework:
Trust between agents is not automatic — it must be designed explicitly. The threat model has two components: (1) a compromised or hallucinating orchestrator could send malicious instructions to workers; (2) a malicious agent in the system could exceed its intended authorization.
The defense is scope validation at each agent boundary: every worker validates that the task it receives is within its defined scope before executing any tool. If the orchestrator tells the EHR Worker to "also submit the prior auth to the payer," the EHR Worker should refuse — submittopayer is not in its tool registry.
In addition, agents should not trust each other's identity without authentication. In distributed deployments, use mTLS or signed messages between agents. The system that spawns agents (LangGraph, CrewAI, or a custom orchestration layer) should be the trust anchor, not the agents themselves.
Key Takeaways
- Multi-agent systems are warranted when single-agent limitations are specifically measured: tool count, context saturation, or parallelism bottlenecks
- Three topologies: orchestrator-worker (central coordinator), hierarchical (nested coordination), peer-to-peer (specialist handoff)
- Workers should have a single responsibility, a focused tool set (≤10 tools), and a well-defined interface
- Coordination overhead is real — use small models for workers, frontier models for orchestrator reasoning
- Trust between agents is not automatic — validate task scope at each worker boundary
- Distributed tracing with correlation IDs is required to debug multi-agent workflows
- External-class tools always require HITL regardless of which agent in the system calls them
Further Reading
In This Repository:
- Agent Architecture Fundamentals — The single-agent loop that multi-agent builds upon
- LangGraph Deep Dive — Production framework for multi-agent state machines
- Human-in-the-Loop — HITL design in multi-agent workflows
- Agentic Security — Trust model and privilege escalation threats
- architecture/mermaid/02-multi-agent-topology.mmd — Topology diagrams
External References:
- LangGraph Multi-Agent documentation — official framework reference
- CrewAI documentation — alternative framework with built-in multi-agent support
Previous: Memory Systems | Next: LangGraph Deep Dive