Multi-Agent Systems

Executive Summary

Multi-agent systems distribute complex work across specialized agents — each with a focused set of tools, a narrow domain of responsibility, and a well-defined interface with other agents. They are the architectural response to the limitations of single-agent systems: context window saturation, tool count limits, parallel work requirements, and specialization needs. This chapter covers the three primary multi-agent topologies (orchestrator-worker, hierarchical, and peer-to-peer), communication patterns, shared state management, and the trust model that governs inter-agent interactions. AI architects designing enterprise automation platforms and engineering leaders evaluating agentic AI scale-out strategies should read this chapter.

Learning Objectives

Identify the conditions that justify multi-agent architecture over a single agent
Describe the three primary multi-agent topologies and their appropriate use cases
Design an orchestrator-worker system with explicit task delegation and result aggregation
Define trust boundaries and authorization policies for inter-agent communication
Evaluate the operational complexity cost of multi-agent systems

Business Problem

Single-agent systems break down at scale in four ways:

Context saturation: A 20-tool agent processing a complex research task accumulates tool results that overflow even large context windows
Specialization limits: One agent cannot simultaneously be an expert in clinical criteria, payer policies, prior authorization workflows, and EHR data structures
Sequential bottleneck: When subtasks are independent, a single agent doing them sequentially wastes time that parallel execution could save
Error blast radius: A single agent error affects the entire workflow; specialized sub-agents fail locally without corrupting the overall state

Multi-agent systems solve these problems by decomposing work across specialized agents that collaborate — at the cost of coordination overhead that must be explicitly designed.

Why This Technology Exists

Early agent systems (2023) hit a practical ceiling: a single agent with 30 tools, operating over hours, accumulating hundreds of tool results, consistently produced context overflow errors, degraded reasoning quality, and unpredictable behavior. The solution, borrowed from distributed systems architecture, was decomposition: break the problem into sub-problems, assign each to a specialized component, and coordinate via explicit interfaces.

The parallel to microservices is instructive: a monolithic service can do everything, but it becomes unmaintainable at scale. Microservices decompose by bounded context and communicate via defined APIs. Multi-agent systems decompose by reasoning context and communicate via structured messages. The same engineering principles apply: single responsibility, clear interfaces, independent deployability, and observable communication.

Conceptual Explanation

When Multi-Agent Architecture is Warranted

Three conditions justify the coordination overhead of multi-agent systems:

Work can be parallelized: Independent subtasks that could proceed simultaneously are being bottlenecked in a single-agent sequential loop
Tool count exceeds ~15 per agent: Tool selection accuracy degrades significantly above this threshold; specialization restores precision
Domain specialization produces meaningful quality gains: A clinical agent trained on clinical system prompts and clinical tools outperforms a general agent given all tools

The Three Topologies

Orchestrator-Worker: A central orchestrator agent decomposes the goal, delegates subtasks to specialized worker agents, and aggregates results. Workers report back to the orchestrator; they do not communicate with each other directly. Best for: tasks with clear decomposition, moderate parallelism, and sequential dependency between phases.

Hierarchical: Orchestrators can themselves be orchestrated. A top-level coordinator delegates to sub-orchestrators, which delegate to workers. Best for: complex enterprise workflows where a single orchestrator would have too many responsibilities.

Peer-to-Peer (Specialist Handoff): Agents pass tasks to each other without a central coordinator. Agent A determines that a task is outside its domain and routes it to Agent B. Best for: specialist consultation workflows where the routing logic is embedded in each agent's expertise.

Architecture Diagram

Standalone diagram: architecture/mermaid/02-multi-agent-topology.mmd

Enterprise Considerations

Coordination overhead is real. Every delegation from orchestrator to worker is an LLM call (latency + cost). A 4-worker prior auth workflow with 2 turns each uses ~9 LLM calls total. At frontier model rates, this can be 10–20x the cost of a single-agent equivalent. Model tier selection is the primary lever: use a small model (Haiku-class) for workers on focused tasks; reserve Opus-class models for the orchestrator's complex planning and aggregation steps.

Failure propagation. In a single-agent system, tool failures are handled by the agent's error handling. In a multi-agent system, worker agent failures must be propagated to the orchestrator in a structured form that the orchestrator can reason about and handle. Design explicit failure modes: partial results, timeouts, hard errors, and rate limit backoffs.

Observability is harder. Tracing a workflow that spans 5 agents and 20 tool calls requires distributed tracing infrastructure. Each agent invocation should carry a correlation ID linking it to the parent workflow. See Chapter 8: Agent Observability.

Agent versioning. When a worker agent's behavior changes (prompt update, tool update), it can break orchestrators that depend on its output format. Version worker agents and test orchestrator-worker compatibility before deployment.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

A Reference Healthcare Organization's prior authorization multi-agent system uses a hierarchical topology:

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

The Payer Submission Agent is a separate agent, not an additional tool on the orchestrator, because: (a) its single responsibility is payer submission, (b) it has only one tool (submit<em>to</em>payer), and (c) it always requires a HITL gate before its tool can be called. Separating it makes the HITL requirement explicit in the architecture.

Common Mistakes

Creating too many agents too early. Start with a single agent and extract to multi-agent only when a specific, measurable limitation is encountered. Premature decomposition adds coordination overhead without benefit.

Workers that are too general. A "Research Worker" that can do anything defeats the purpose of specialization. Workers should be narrowly focused: one domain, one set of tools, one type of task.

No explicit failure handling at the orchestrator level. When a worker returns an error, the orchestrator must be designed to handle it (retry, fallback, skip, escalate) — not just forward the error to the user.

Circular dependencies. Worker A calls Worker B which calls Worker C which calls Worker A. Without careful design, multi-agent systems can introduce deadlocks. Map the dependency graph before implementation.

Best Practices

Start with a single agent; extract to multi-agent only when a specific limitation is measured
Workers should have a single responsibility: one domain, ≤10 tools, one output type
Use small models for workers on focused tasks; use frontier models for orchestrator planning
Carry a correlation ID through all agent invocations for distributed tracing
Design explicit failure handling at the orchestrator for each class of worker failure
Version worker agents and test orchestrator-worker interface compatibility before deployment
Gate all External-class tool calls behind HITL, regardless of which agent makes the call

Alternatives

Approach	When to Choose	Trade-off
Single agent	Task fits in one context window; <15 tools needed	Simpler; no coordination overhead
Sequential chain	Task has no parallel work; steps are known upfront	Predictable; no dynamic decomposition
Orchestrator-worker	Parallelizable subtasks; clear role separation	Coordination overhead; requires failure handling
Hierarchical multi-agent	Complex workflows with multiple layers of decomposition	Maximum scalability; highest operational complexity
Peer-to-peer handoff	Specialist routing; each agent decides when to escalate	Flexible; requires careful loop prevention

Trade-offs

Dimension	Advantage	Cost
Specialization	Agents excel in their domain	Coordination protocol required
Parallelism	Independent tasks proceed simultaneously	Shared state management complexity
Scale	Context saturation avoided	Inter-agent latency overhead
Resilience	Worker failures are local	Failure propagation design required
Observability	Each agent's behavior is auditable	Distributed tracing infrastructure required

Interview Questions

Q1: When does a single agent become a multi-agent system, and how do you make that decision?

Category: Architecture / System Design Difficulty: Principal Role: AI Architect

Answer Framework:

Three specific conditions justify the transition: (1) tool count exceeds ~15, degrading selection accuracy; (2) context saturation occurs frequently in production — the single agent's context window fills before the task completes; (3) there is parallel work that is being serialized unnecessarily.

The decision process is empirical, not intuitive. Measure: what is the agent's tool selection error rate? What is the frequency of context overflow? Is there measurable latency from sequential execution of independent tasks? If no specific, measured problem exists, the agent is not ready for multi-agent decomposition.

The transition adds coordination overhead (latency, cost, failure handling complexity). If you cannot articulate which specific limitation you are solving and how the multi-agent architecture addresses it, you are adding complexity without benefit.

Red Flags: "Multi-agent is just better" — not true. "We're planning for scale we don't have yet" — premature optimization.

Q2: How do you establish trust boundaries between agents in a multi-agent system?

Category: Security / Architecture Difficulty: Principal Role: AI Architect

Answer Framework:

Trust between agents is not automatic — it must be designed explicitly. The threat model has two components: (1) a compromised or hallucinating orchestrator could send malicious instructions to workers; (2) a malicious agent in the system could exceed its intended authorization.

The defense is scope validation at each agent boundary: every worker validates that the task it receives is within its defined scope before executing any tool. If the orchestrator tells the EHR Worker to "also submit the prior auth to the payer," the EHR Worker should refuse — submittopayer is not in its tool registry.

In addition, agents should not trust each other's identity without authentication. In distributed deployments, use mTLS or signed messages between agents. The system that spawns agents (LangGraph, CrewAI, or a custom orchestration layer) should be the trust anchor, not the agents themselves.

Key Takeaways

Multi-agent systems are warranted when single-agent limitations are specifically measured: tool count, context saturation, or parallelism bottlenecks
Three topologies: orchestrator-worker (central coordinator), hierarchical (nested coordination), peer-to-peer (specialist handoff)
Workers should have a single responsibility, a focused tool set (≤10 tools), and a well-defined interface
Coordination overhead is real — use small models for workers, frontier models for orchestrator reasoning
Trust between agents is not automatic — validate task scope at each worker boundary
Distributed tracing with correlation IDs is required to debug multi-agent workflows
External-class tools always require HITL regardless of which agent in the system calls them

Multi-Agent Systems#

Executive Summary#

Learning Objectives#

Business Problem#

Why This Technology Exists#

Conceptual Explanation#

When Multi-Agent Architecture is Warranted#

The Three Topologies#

Architecture Diagram#

Enterprise Considerations#

Healthcare Example#

Common Mistakes#

Best Practices#

Alternatives#

Trade-offs#

Interview Questions#

Q1: When does a single agent become a multi-agent system, and how do you make that decision?#

Q2: How do you establish trust boundaries between agents in a multi-agent system?#

Key Takeaways#

Further Reading#

Multi-Agent Systems

Executive Summary

Learning Objectives

Business Problem

Why This Technology Exists

Conceptual Explanation

When Multi-Agent Architecture is Warranted

The Three Topologies

Architecture Diagram

Enterprise Considerations

Healthcare Example

Common Mistakes

Best Practices

Alternatives

Trade-offs

Interview Questions

Q1: When does a single agent become a multi-agent system, and how do you make that decision?

Q2: How do you establish trust boundaries between agents in a multi-agent system?

Key Takeaways

Further Reading