Multi-Agent Systems

Executive Summary

Multi-agent systems distribute complex work across specialized agents — each with a focused set of tools, a narrow domain of responsibility, and a well-defined interface with other agents. They are the architectural response to the limitations of single-agent systems: context window saturation, tool count limits, parallel work requirements, and specialization needs. This chapter covers the three primary multi-agent topologies (orchestrator-worker, hierarchical, and peer-to-peer), communication patterns, shared state management, and the trust model that governs inter-agent interactions. AI architects designing enterprise automation platforms and engineering leaders evaluating agentic AI scale-out strategies should read this chapter.

Learning Objectives

  • Identify the conditions that justify multi-agent architecture over a single agent
  • Describe the three primary multi-agent topologies and their appropriate use cases
  • Design an orchestrator-worker system with explicit task delegation and result aggregation
  • Define trust boundaries and authorization policies for inter-agent communication
  • Evaluate the operational complexity cost of multi-agent systems

Business Problem

Single-agent systems break down at scale in four ways:

  1. Context saturation: A 20-tool agent processing a complex research task accumulates tool results that overflow even large context windows
  2. Specialization limits: One agent cannot simultaneously be an expert in clinical criteria, payer policies, prior authorization workflows, and EHR data structures
  3. Sequential bottleneck: When subtasks are independent, a single agent doing them sequentially wastes time that parallel execution could save
  4. Error blast radius: A single agent error affects the entire workflow; specialized sub-agents fail locally without corrupting the overall state

Multi-agent systems solve these problems by decomposing work across specialized agents that collaborate — at the cost of coordination overhead that must be explicitly designed.

Why This Technology Exists

Early agent systems (2023) hit a practical ceiling: a single agent with 30 tools, operating over hours, accumulating hundreds of tool results, consistently produced context overflow errors, degraded reasoning quality, and unpredictable behavior. The solution, borrowed from distributed systems architecture, was decomposition: break the problem into sub-problems, assign each to a specialized component, and coordinate via explicit interfaces.

The parallel to microservices is instructive: a monolithic service can do everything, but it becomes unmaintainable at scale. Microservices decompose by bounded context and communicate via defined APIs. Multi-agent systems decompose by reasoning context and communicate via structured messages. The same engineering principles apply: single responsibility, clear interfaces, independent deployability, and observable communication.

Conceptual Explanation

When Multi-Agent Architecture is Warranted

Three conditions justify the coordination overhead of multi-agent systems:

  1. Work can be parallelized: Independent subtasks that could proceed simultaneously are being bottlenecked in a single-agent sequential loop
  2. Tool count exceeds ~15 per agent: Tool selection accuracy degrades significantly above this threshold; specialization restores precision
  3. Domain specialization produces meaningful quality gains: A clinical agent trained on clinical system prompts and clinical tools outperforms a general agent given all tools

The Three Topologies

Orchestrator-Worker: A central orchestrator agent decomposes the goal, delegates subtasks to specialized worker agents, and aggregates results. Workers report back to the orchestrator; they do not communicate with each other directly. Best for: tasks with clear decomposition, moderate parallelism, and sequential dependency between phases.

Hierarchical: Orchestrators can themselves be orchestrated. A top-level coordinator delegates to sub-orchestrators, which delegate to workers. Best for: complex enterprise workflows where a single orchestrator would have too many responsibilities.

Peer-to-Peer (Specialist Handoff): Agents pass tasks to each other without a central coordinator. Agent A determines that a task is outside its domain and routes it to Agent B. Best for: specialist consultation workflows where the routing logic is embedded in each agent's expertise.

Architecture Diagram

Standalone diagram: architecture/mermaid/02-multi-agent-topology.mmd

Components

Orchestrator Agent

The orchestrator is responsible for three functions:

  1. Goal decomposition: Breaking the high-level task into subtasks with clear success criteria
  2. Task delegation: Assigning subtasks to appropriate worker agents with required context
  3. Result aggregation: Combining worker results into a coherent response or decision

The orchestrator must know what workers are available, what each can do, and what context each needs. This knowledge lives in the orchestrator's system prompt and tool registry (where workers are exposed as tools).

Worker Agents

Workers are specialized agents with:

  • A focused system prompt for their domain (clinical, payer policy, documentation)
  • A small, relevant tool set (3–10 tools)
  • A well-defined input/output contract with the orchestrator

Workers should not be aware of each other or the overall workflow context — they receive a task, execute it, and return a result. This constraint preserves independence and simplifies reasoning.

Communication Layer

Agents communicate via structured messages: task delegation (orchestrator → worker), status updates (worker → orchestrator), and results (worker → orchestrator). The communication format should be machine-parseable (JSON or Pydantic models) to enable reliable aggregation.

Shared State Store

When workers need to read each other's outputs (even through the orchestrator), a shared state store is needed. LangGraph's StateGraph is the production standard: it defines a typed state object shared across all nodes (agents), with each node reading inputs and writing outputs to named state fields.

Implementation Patterns

Pattern 1: Orchestrator with Worker Agents as Tools

The simplest multi-agent pattern: worker agents are invoked as tools by the orchestrator. Each "tool call" spins up a worker agent sub-session.

python
"""
Orchestrator-worker pattern: workers exposed as tools to the orchestrator.
Educational Example — Reference Implementation.
Not intended for clinical decision making.
"""
import anthropic
import json
from typing import Any

client = anthropic.Anthropic()


# ── Worker agent implementations ────────────────────────────────────────────

def run_ehr_worker(patient_id: str, data_types: list[str]) -> dict[str, Any]:
    """
    Worker agent: retrieves and summarizes patient EHR data.
    In production: calls FHIR R4 API with appropriate auth.
    """
    # Stub: in production, call real EHR integration
    return {
        "patient_id": patient_id,
        "diagnoses": ["Type 2 Diabetes", "Hypertension"],
        "medications": ["Metformin 1000mg BID", "Lisinopril 10mg daily"],
        "recent_hba1c": "7.8%",
        "data_types_retrieved": data_types,
    }


def run_clinical_eval_worker(
    patient_summary: dict,
    procedure_code: str,
    guideline_query: str,
) -> dict[str, Any]:
    """
    Worker agent: evaluates clinical criteria for a prior auth request.
    In production: retrieves guidelines from vector store + evaluates.
    """
    return {
        "procedure_code": procedure_code,
        "criteria_met": True,
        "clinical_rationale": (
            f"Patient meets clinical criteria based on diagnosis history. "
            f"HbA1c of {patient_summary.get('recent_hba1c', 'N/A')} supports medical necessity."
        ),
        "supporting_guidelines": ["ADA Standards of Medical Care 2025, Section 9"],
    }


def run_documentation_worker(
    patient_id: str,
    clinical_eval: dict,
    payer_criteria: dict,
) -> dict[str, Any]:
    """Worker agent: drafts the prior authorization determination letter."""
    return {
        "draft_letter": (
            f"Re: Prior Authorization Request — Patient {patient_id}\n\n"
            f"Clinical criteria met: {clinical_eval.get('criteria_met')}\n"
            f"Rationale: {clinical_eval.get('clinical_rationale')}\n\n"
            f"This draft requires physician review before submission."
        ),
        "status": "DRAFT_PENDING_REVIEW",
    }


# ── Orchestrator tool definitions ────────────────────────────────────────────

ORCHESTRATOR_TOOLS = [
    {
        "name": "delegate_ehr_retrieval",
        "description": (
            "Delegate EHR data retrieval to the EHR Worker Agent. "
            "Use this when you need patient clinical data (diagnoses, medications, labs) "
            "to support a prior authorization evaluation. Returns a structured patient summary."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "patient_id": {"type": "string", "description": "Patient identifier"},
                "data_types": {
                    "type": "array",
                    "items": {"type": "string", "enum": ["diagnoses", "medications", "labs", "vitals"]},
                    "description": "Which data types to retrieve"
                }
            },
            "required": ["patient_id", "data_types"]
        }
    },
    {
        "name": "delegate_clinical_evaluation",
        "description": (
            "Delegate clinical criteria evaluation to the Clinical Evaluator Agent. "
            "Use this after retrieving patient data to evaluate whether the patient meets "
            "clinical criteria for the requested procedure. Returns evaluation result and rationale."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "patient_summary": {"type": "object", "description": "Patient summary from EHR retrieval"},
                "procedure_code": {"type": "string", "description": "CPT or HCPCS procedure code"},
                "guideline_query": {"type": "string", "description": "Query for clinical guideline lookup"}
            },
            "required": ["patient_summary", "procedure_code", "guideline_query"]
        }
    },
    {
        "name": "delegate_documentation",
        "description": (
            "Delegate letter drafting to the Documentation Agent. "
            "Use this after clinical and payer evaluation is complete to generate "
            "the prior authorization determination letter draft. Returns the draft letter."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "patient_id": {"type": "string"},
                "clinical_eval": {"type": "object", "description": "Clinical evaluation result"},
                "payer_criteria": {"type": "object", "description": "Payer policy match result"}
            },
            "required": ["patient_id", "clinical_eval", "payer_criteria"]
        }
    }
]


def orchestrator_execute_tool(tool_name: str, tool_input: dict) -> str:
    """Route orchestrator tool calls to the appropriate worker agent."""
    if tool_name == "delegate_ehr_retrieval":
        result = run_ehr_worker(**tool_input)
    elif tool_name == "delegate_clinical_evaluation":
        result = run_clinical_eval_worker(**tool_input)
    elif tool_name == "delegate_documentation":
        result = run_documentation_worker(**tool_input)
    else:
        result = {"error": f"Unknown delegation target: {tool_name}"}
    return json.dumps(result)


ORCHESTRATOR_SYSTEM = """You are a Prior Authorization Orchestrator for a Reference Healthcare Organization.
Educational Example — Not intended for clinical decision making.

Your role is to coordinate the prior authorization evaluation process by delegating to specialized worker agents.

Workflow:
1. Use delegate_ehr_retrieval to get patient clinical data
2. Use delegate_clinical_evaluation with the retrieved data to assess clinical criteria
3. Use delegate_documentation to generate the determination letter draft
4. Summarize the complete evaluation for physician review

Always complete all three delegation steps before presenting your summary."""


def run_orchestrator(prior_auth_request: dict) -> str:
    """Run the orchestrator agent for a prior authorization request."""
    user_message = (
        f"Process this prior authorization request:\n"
        f"Patient ID: {prior_auth_request['patient_id']}\n"
        f"Procedure: {prior_auth_request['procedure_code']} — {prior_auth_request['procedure_name']}\n"
        f"Requesting clinician: {prior_auth_request['clinician_role']}"
    )

    messages = [{"role": "user", "content": user_message}]

    for _ in range(15):  # max iterations circuit breaker
        response = client.messages.create(
            model="claude-opus-4-8",  # Verify current model ID at docs.anthropic.com
            max_tokens=4096,
            system=ORCHESTRATOR_SYSTEM,
            tools=ORCHESTRATOR_TOOLS,
            messages=messages,
        )

        if response.stop_reason == "end_turn":
            return next(
                (b.text for b in response.content if hasattr(b, "text")),
                "Orchestration complete."
            )

        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                print(f"[Orchestrator] Delegating to: {block.name}")
                result = orchestrator_execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

    return "Orchestration reached maximum iterations."


if __name__ == "__main__":
    request = {
        "patient_id": "P-12345678",
        "procedure_code": "95810",
        "procedure_name": "Polysomnography",
        "clinician_role": "Pulmonologist"
    }
    result = run_orchestrator(request)
    print("\n=== Orchestrator Result ===")
    print(result)

Enterprise Considerations

Coordination overhead is real. Every delegation from orchestrator to worker is an LLM call (latency + cost). A 4-worker prior auth workflow with 2 turns each uses ~9 LLM calls total. At frontier model rates, this can be 10–20x the cost of a single-agent equivalent. Model tier selection is the primary lever: use a small model (Haiku-class) for workers on focused tasks; reserve Opus-class models for the orchestrator's complex planning and aggregation steps.

Failure propagation. In a single-agent system, tool failures are handled by the agent's error handling. In a multi-agent system, worker agent failures must be propagated to the orchestrator in a structured form that the orchestrator can reason about and handle. Design explicit failure modes: partial results, timeouts, hard errors, and rate limit backoffs.

Observability is harder. Tracing a workflow that spans 5 agents and 20 tool calls requires distributed tracing infrastructure. Each agent invocation should carry a correlation ID linking it to the parent workflow. See Chapter 8: Agent Observability.

Agent versioning. When a worker agent's behavior changes (prompt update, tool update), it can break orchestrators that depend on its output format. Version worker agents and test orchestrator-worker compatibility before deployment.

Security Considerations

Inter-agent trust. An orchestrator and its worker agents do not automatically trust each other. Messages from the orchestrator are just text from the LLM's perspective; a compromised or hallucinating orchestrator could send malicious task descriptions to workers. Workers must validate that the task they receive is within their defined scope.

Privilege escalation. A worker agent should have only the permissions needed for its specific role. The EHR worker should read patient data; it should not be able to submit prior auth decisions. If the orchestrator instructs a worker to perform an action outside its authorization scope, the worker must refuse and report the attempt.

Communication channel security. In distributed deployments, inter-agent messages travel over a network. Use mTLS or an authenticated message bus (Kafka, SQS) between agents. Do not use plaintext HTTP for inter-agent communication in production.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

A Reference Healthcare Organization's prior authorization multi-agent system uses a hierarchical topology:

text
Top-level orchestrator (receives the prior auth request)
    ├── EHR Worker Agent (FHIR R4 queries only)
    ├── Clinical Evaluator Agent (guideline RAG + criteria evaluation)
    ├── Payer Policy Agent (payer database queries)
    └── Documentation Agent (letter generation)
         └── Physician Review (HITL interrupt before submission)
              └── Payer Submission Agent (submit_to_payer tool — External class)

The Payer Submission Agent is a separate agent, not an additional tool on the orchestrator, because: (a) its single responsibility is payer submission, (b) it has only one tool (submit<em>to</em>payer), and (c) it always requires a HITL gate before its tool can be called. Separating it makes the HITL requirement explicit in the architecture.

Common Mistakes

Creating too many agents too early. Start with a single agent and extract to multi-agent only when a specific, measurable limitation is encountered. Premature decomposition adds coordination overhead without benefit.

Workers that are too general. A "Research Worker" that can do anything defeats the purpose of specialization. Workers should be narrowly focused: one domain, one set of tools, one type of task.

No explicit failure handling at the orchestrator level. When a worker returns an error, the orchestrator must be designed to handle it (retry, fallback, skip, escalate) — not just forward the error to the user.

Circular dependencies. Worker A calls Worker B which calls Worker C which calls Worker A. Without careful design, multi-agent systems can introduce deadlocks. Map the dependency graph before implementation.

Best Practices

  • Start with a single agent; extract to multi-agent only when a specific limitation is measured
  • Workers should have a single responsibility: one domain, ≤10 tools, one output type
  • Use small models for workers on focused tasks; use frontier models for orchestrator planning
  • Carry a correlation ID through all agent invocations for distributed tracing
  • Design explicit failure handling at the orchestrator for each class of worker failure
  • Version worker agents and test orchestrator-worker interface compatibility before deployment
  • Gate all External-class tool calls behind HITL, regardless of which agent makes the call

Alternatives

Approach When to Choose Trade-off
Single agent Task fits in one context window; <15 tools needed Simpler; no coordination overhead
Sequential chain Task has no parallel work; steps are known upfront Predictable; no dynamic decomposition
Orchestrator-worker Parallelizable subtasks; clear role separation Coordination overhead; requires failure handling
Hierarchical multi-agent Complex workflows with multiple layers of decomposition Maximum scalability; highest operational complexity
Peer-to-peer handoff Specialist routing; each agent decides when to escalate Flexible; requires careful loop prevention

Trade-offs

Dimension Advantage Cost
Specialization Agents excel in their domain Coordination protocol required
Parallelism Independent tasks proceed simultaneously Shared state management complexity
Scale Context saturation avoided Inter-agent latency overhead
Resilience Worker failures are local Failure propagation design required
Observability Each agent's behavior is auditable Distributed tracing infrastructure required

Interview Questions

Q1: When does a single agent become a multi-agent system, and how do you make that decision?

Category: Architecture / System Design Difficulty: Principal Role: AI Architect

Answer Framework:

Three specific conditions justify the transition: (1) tool count exceeds ~15, degrading selection accuracy; (2) context saturation occurs frequently in production — the single agent's context window fills before the task completes; (3) there is parallel work that is being serialized unnecessarily.

The decision process is empirical, not intuitive. Measure: what is the agent's tool selection error rate? What is the frequency of context overflow? Is there measurable latency from sequential execution of independent tasks? If no specific, measured problem exists, the agent is not ready for multi-agent decomposition.

The transition adds coordination overhead (latency, cost, failure handling complexity). If you cannot articulate which specific limitation you are solving and how the multi-agent architecture addresses it, you are adding complexity without benefit.

Red Flags: "Multi-agent is just better" — not true. "We're planning for scale we don't have yet" — premature optimization.


Q2: How do you establish trust boundaries between agents in a multi-agent system?

Category: Security / Architecture Difficulty: Principal Role: AI Architect

Answer Framework:

Trust between agents is not automatic — it must be designed explicitly. The threat model has two components: (1) a compromised or hallucinating orchestrator could send malicious instructions to workers; (2) a malicious agent in the system could exceed its intended authorization.

The defense is scope validation at each agent boundary: every worker validates that the task it receives is within its defined scope before executing any tool. If the orchestrator tells the EHR Worker to "also submit the prior auth to the payer," the EHR Worker should refuse — submittopayer is not in its tool registry.

In addition, agents should not trust each other's identity without authentication. In distributed deployments, use mTLS or signed messages between agents. The system that spawns agents (LangGraph, CrewAI, or a custom orchestration layer) should be the trust anchor, not the agents themselves.

Key Takeaways

  • Multi-agent systems are warranted when single-agent limitations are specifically measured: tool count, context saturation, or parallelism bottlenecks
  • Three topologies: orchestrator-worker (central coordinator), hierarchical (nested coordination), peer-to-peer (specialist handoff)
  • Workers should have a single responsibility, a focused tool set (≤10 tools), and a well-defined interface
  • Coordination overhead is real — use small models for workers, frontier models for orchestrator reasoning
  • Trust between agents is not automatic — validate task scope at each worker boundary
  • Distributed tracing with correlation IDs is required to debug multi-agent workflows
  • External-class tools always require HITL regardless of which agent in the system calls them

Glossary

Term Definition
Orchestrator An agent that decomposes a goal and delegates subtasks to worker agents
Worker agent A specialized agent that executes a specific subtask assigned by an orchestrator
Task delegation The act of an orchestrator assigning a subtask to a worker with required context
Correlation ID A unique identifier carried through all agent invocations in a workflow for distributed tracing
Privilege escalation An attack where an agent exceeds its authorized capabilities, often via orchestrator manipulation
Peer-to-peer handoff An agent topology where agents route tasks to each other based on domain expertise

Further Reading

In This Repository:

External References:

  • LangGraph Multi-Agent documentation — official framework reference
  • CrewAI documentation — alternative framework with built-in multi-agent support

Previous: Memory Systems | Next: LangGraph Deep Dive