Agent Architecture Fundamentals

Executive Summary

Agentic AI systems are LLM-powered architectures in which the model autonomously decides what actions to take, executes those actions through tools, observes the results, and continues reasoning until a goal is achieved. This chapter establishes the architectural foundations — the agent loop, tool calling mechanics, planning patterns, and the ReAct framework — that every subsequent chapter in this section builds upon. It is required reading before any framework-specific chapter (LangGraph, CrewAI, MCP). AI architects, senior engineers, and engineering leaders responsible for production agentic systems should read this chapter in full.

Learning Objectives

  • Explain what distinguishes an agent from a chain or a standard LLM API call
  • Describe the ReAct loop and why it is the dominant agent pattern
  • Identify when agents are and are not the right architectural choice
  • Explain how tool calling works at the API level for major LLM providers
  • Evaluate planning strategies and their trade-offs for enterprise workflows

Business Problem

Enterprises need AI systems that can complete multi-step, goal-directed workflows — not just answer a single question. Consider a prior authorization workflow: the AI must retrieve the patient's clinical history, look up the relevant payer policy, evaluate clinical criteria against that policy, identify missing documentation, draft a determination letter, and route it for physician review. This cannot be expressed as a single prompt. It requires sequential decision-making, tool invocation, and state persistence across multiple steps.

Traditional LLM deployments (RAG pipelines, single-turn Q&A) cannot complete these workflows autonomously. Human operators must bridge the gaps between steps, which is precisely the labor cost that agentic systems eliminate.

Why This Technology Exists

The earliest practical pattern for autonomous LLM behavior was chain-of-thought prompting (Wei et al., 2022) — instructing the model to reason step by step before answering. This improved reasoning quality but still produced a single output; the model could not take external actions.

The breakthrough was tool calling (function calling): the ability for an LLM to output a structured request for an external function, receive the result, and continue reasoning. OpenAI introduced function calling in June 2023. Anthropic introduced tool use in their API in 2023. Once tool calling was reliable, the agent loop became architecturally practical.

ReAct (Yao et al., 2022 — "Reasoning and Acting") formalized the pattern: interleave reasoning traces (Thought:) with tool invocations (Action:) and observations (Observation:), allowing the model to plan, act, and adapt in a single coherent loop. ReAct demonstrated that LLMs could solve tasks requiring 5–20 sequential steps with meaningful reliability.

The gap ReAct filled: between a model that reasons well but cannot act, and a model that acts but cannot reason about its actions' consequences.

Conceptual Explanation

An agent is an LLM equipped with:

  1. Tools — functions the model can call (search, database query, API call, file write)
  2. Memory — access to prior context beyond the current prompt
  3. A goal — an objective it pursues across multiple steps
  4. A loop — a mechanism to continue reasoning until the goal is reached or a stop condition is met

The critical distinction from a chain: in a chain, the developer decides the sequence of LLM calls in advance. In an agent, the LLM decides what to do next at each step based on what it observes. This autonomy is both the value proposition and the primary risk.

The Agent vs. Chain Distinction

text
Chain (developer controls flow):
  Step 1: LLM summarizes document (hardcoded)
  Step 2: LLM extracts entities (hardcoded)
  Step 3: LLM formats output (hardcoded)

Agent (LLM controls flow):
  Goal: "Research this patient's medication history and flag interactions"
  Step 1: LLM decides to call get_patient_medications()
  Step 2: LLM decides to call check_drug_interactions(medications)
  Step 3: LLM decides result is sufficient → produces final answer
  [Or: LLM decides to call get_allergy_history() first, then proceed]

The agent dynamically selects tools and decides when it has enough information. This flexibility enables handling cases the developer did not anticipate at design time.

Core Architecture

The agent's execution model follows the Perceive → Reason → Act → Observe cycle:

  1. Perceive: The agent receives input (user query, system trigger, prior step result). This is combined with the system prompt, available tool schemas, and any memory retrieved from external stores into the context window.
  1. Reason: The LLM generates a response. If it determines a tool is needed, it outputs a structured tool call (JSON with tool name and arguments). If it has enough information, it outputs a final response.
  1. Act: The framework executes the tool call against the real system (API, database, file). The LLM does not execute code — it outputs a request; the framework executes it.
  1. Observe: The tool result is appended to the context as an observation. The agent loops back to Reason with updated context.

This loop continues until the LLM either produces a final answer (no tool call) or a stop condition is met (max iterations, timeout, error threshold).

Architecture Diagram

Standalone diagram: architecture/mermaid/02-agent-loop.mmd

Components

LLM Backbone

The reasoning engine. All planning, tool selection, and response generation happens here. The LLM does not execute tools — it only decides which tool to call and with what arguments.

Critical property: The LLM must support tool calling (structured output with tool name + arguments as a first-class API feature). Do not implement tool calling by prompting a model to output JSON and parsing it manually — this is fragile. Use the official tool calling API.

Tool Registry

The set of tools available to the agent, each described by a JSON schema. The schema is injected into the context at every reasoning step. Tools the agent doesn't need should not be in the registry — every tool increases context size and cognitive load on the model.

Memory

Covered in depth in Chapter 3: Memory Systems. At minimum, agents need working memory (the current context window). Production agents add episodic memory (prior conversation turns, summarized) and semantic memory (retrieved knowledge from a vector store).

Orchestration Framework

The code that runs the loop: sends prompts to the LLM, parses tool call responses, executes tools, appends results, and manages stop conditions. LangGraph and CrewAI are the dominant production frameworks; raw framework-less agents are appropriate only for simple single-tool scenarios.

State Store

Persistent storage for agent state, enabling workflow resumption after failure, human-in-the-loop interrupts, and auditability. Without a state store, an agent that fails halfway through a 10-step workflow must restart from the beginning.

Implementation Patterns

Pattern 1: Basic Tool-Calling Agent (Anthropic SDK)

python
"""
Basic tool-calling agent using the Anthropic SDK.
Educational Example — Reference Implementation for learning agent architecture.
Not intended for clinical decision making.

Context: A simple clinical information agent that can look up
patient data and drug information from a Reference Healthcare Organization.
"""
import anthropic
import json
from typing import Any

client = anthropic.Anthropic()

# Tool definitions — the schema is what the LLM sees
TOOLS = [
    {
        "name": "get_patient_summary",
        "description": (
            "Retrieve a clinical summary for a patient by their encounter ID. "
            "Returns active diagnoses, current medications, and recent labs. "
            "Use this when the question requires patient-specific context."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "encounter_id": {
                    "type": "string",
                    "description": "The encounter identifier (format: ENC-XXXXXXXX)"
                }
            },
            "required": ["encounter_id"]
        }
    },
    {
        "name": "check_drug_interaction",
        "description": (
            "Check for clinically significant interactions between a list of drugs. "
            "Returns severity ratings (CONTRAINDICATED, MAJOR, MODERATE, MINOR) "
            "and clinical descriptions for each identified interaction pair."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "drug_names": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "List of drug names to check (generic names preferred)"
                }
            },
            "required": ["drug_names"]
        }
    }
]

SYSTEM_PROMPT = """You are a clinical information assistant for a Reference Healthcare Organization.
Educational Example — Not intended for clinical decision making.

Use the available tools to retrieve accurate patient and drug information before answering.
Always cite the source of your information (patient record or drug database).
If a question requires information you cannot retrieve with the available tools, say so explicitly."""


def execute_tool(tool_name: str, tool_input: dict[str, Any]) -> str:
    """Execute a tool and return its result as a string."""
    # In production: dispatch to real implementations
    # These are stubs demonstrating the pattern
    if tool_name == "get_patient_summary":
        encounter_id = tool_input["encounter_id"]
        return json.dumps({
            "encounter_id": encounter_id,
            "diagnoses": ["Type 2 Diabetes (E11.9)", "Hypertension (I10)"],
            "medications": ["Metformin 1000mg BID", "Lisinopril 10mg daily"],
            "recent_labs": {"HbA1c": "7.8%", "eGFR": "62 mL/min", "K+": "4.1 mEq/L"}
        })

    elif tool_name == "check_drug_interaction":
        drugs = tool_input["drug_names"]
        return json.dumps({
            "drugs_checked": drugs,
            "interactions": [],
            "message": f"No clinically significant interactions identified among {', '.join(drugs)}."
        })

    return json.dumps({"error": f"Unknown tool: {tool_name}"})


def run_agent(user_query: str, max_iterations: int = 10) -> str:
    """
    Run the agent loop: Perceive → Reason → Act → Observe → repeat.

    Args:
        user_query: The question or task for the agent
        max_iterations: Circuit breaker — prevents infinite loops
    """
    messages = [{"role": "user", "content": user_query}]
    iterations = 0

    while iterations < max_iterations:
        iterations += 1

        # Reason: ask the LLM what to do next
        response = client.messages.create(
            model="claude-opus-4-8",  # Verify current model ID at docs.anthropic.com
            max_tokens=4096,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages,
        )

        # Check stop condition: LLM produced a final answer
        if response.stop_reason == "end_turn":
            final_text = next(
                (block.text for block in response.content if hasattr(block, "text")),
                "No response generated."
            )
            return final_text

        # Act: execute all tool calls in this response
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                print(f"[Agent] Tool call: {block.name}({block.input})")
                result = execute_tool(block.name, block.input)
                print(f"[Agent] Tool result: {result[:100]}...")

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result,
                })

        # Observe: append LLM response + tool results to message history
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

    return "Agent reached maximum iterations without producing a final answer."


if __name__ == "__main__":
    query = "What is the current HbA1c for encounter ENC-20240101 and are there any drug interactions?"
    print(f"Query: {query}\n")
    result = run_agent(query)
    print(f"\nFinal Answer:\n{result}")

Pattern 2: ReAct Prompt Pattern (Framework-Agnostic)

When tool calling is not natively supported by the model API, the ReAct pattern can be implemented via structured prompting. This is the fallback pattern for models that produce text only:

text
System: You are an agent. Think step by step. Use this format:
  Thought: [your reasoning about what to do next]
  Action: tool_name({"param": "value"})
  Observation: [tool result, provided by the system]
  ... repeat Thought/Action/Observation as needed ...
  Final Answer: [your answer when you have enough information]

Architectural Note: Native tool calling (via the API's tools parameter) is strongly preferred over prompt-based ReAct. Native tool calling produces structured JSON with type validation, provides cleaner stop conditions, and is more reliable across diverse queries. Use prompt-based ReAct only when the target model does not support native tool calling.

Enterprise Considerations

When NOT to use agents. The agent loop introduces latency (multiple LLM calls), cost (tokens for each iteration), and failure modes (loops, wrong tool selection) that simple chains and pipelines do not. Choose agents only when the workflow requires genuine branching, the path cannot be predetermined by a developer, or handling of unforeseen cases provides significant value.

Latency budget. Each iteration of the agent loop adds one LLM call (1–5 seconds for frontier models) plus tool execution time. A 5-iteration agent workflow can take 10–30 seconds end-to-end. Design user experiences accordingly: stream intermediate steps, show progress indicators, and establish explicit SLAs before committing to agentic architecture.

Cost at scale. Agentic workflows consume 5–20x more tokens than single-turn calls. Model selection by agent role is essential: use small, fast models for routing and summarization; reserve frontier models for complex reasoning steps. See Chapter 4: Enterprise AI — Cost Management [PLANNED].

Determinism and reproducibility. Agents are non-deterministic — the same input can produce different tool call sequences. This is acceptable for knowledge retrieval tasks but requires careful design for tasks that write to databases, send communications, or trigger external workflows. Implement idempotency at the tool level.

Circuit breakers. Always set max_iterations. An agent that loops indefinitely costs money and holds resources. Log all iterations; alert if an agent consistently reaches the max iteration limit (it indicates the task is beyond the agent's capability or the tools are insufficient).

Security Considerations

Agents introduce attack surfaces that passive LLMs do not. The primary threats are covered in depth in Chapter 10: Agentic Security. The foundational principle:

An agent can only do what its tools allow. The security perimeter of an agent system is the union of the permissions of all its tools.

Minimize tool permissions to the minimum required for the task. An agent that needs to read patient records does not need write access. An agent that queries a database does not need DDL permissions.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

A Reference Healthcare Organization deploys a Prior Authorization Agent that executes the following workflow autonomously:

  1. Receive prior auth request from Epic via FHIR webhook
  2. Fetch patient clinical history (EHR tool)
  3. Retrieve relevant clinical guidelines (RAG tool)
  4. Look up payer policy criteria (knowledge base tool)
  5. Evaluate clinical criteria against policy
  6. Draft determination letter
  7. Interrupt — route to physician for review (human-in-the-loop)
  8. Submit to payer after physician approval

This workflow requires 6–8 tool calls, conditional branching (different payers have different criteria), and human oversight before the consequential final action. It is a canonical enterprise agentic workflow. The full implementation is in examples/langgraph/01-clinical-workflow-graph.py.

Common Mistakes

Running agents where chains suffice. If the sequence of steps is known at design time and doesn't change based on input, use a chain. The agent overhead (latency, cost, unpredictability) is not justified.

Unbounded loops. Forgetting max_iterations causes runaway cost and resource consumption. Always set a limit; always log when it is reached.

Too many tools. Giving an agent 30 tools degrades tool selection accuracy. Keep the tool registry focused: 5–15 tools per agent. Specialize agents rather than creating a general-purpose agent with every possible tool.

Mutable tool side effects without idempotency. If a tool writes to a database and the agent fails mid-workflow, re-running the workflow executes the write again. All write-side tools must be idempotent or the agent must checkpoint state after each write.

Trusting tool output without validation. Tool results can contain errors, unexpected formats, or injection payloads. Validate tool output structure before appending it to the agent's context.

Best Practices

  • Use agents only when the workflow cannot be expressed as a predetermined sequence
  • Set max_iterations on every agent; alert when it triggers in production
  • Keep tool registries focused: 5–15 tools maximum per agent
  • Use small models for tool routing decisions; reserve frontier models for reasoning
  • Implement all write-side tools as idempotent operations
  • Checkpoint agent state after each successful tool call for resumability
  • Require human-in-the-loop review before irreversible actions (sends, writes, external submissions)
  • Log all tool calls with inputs, outputs, timestamps, and cost for auditability

Alternatives

Approach When to Choose Trade-off
Sequential chain Steps are known at design time; no branching needed Less flexible; faster and more predictable
Parallel chain (fan-out) Multiple independent tasks; aggregate results No dynamic decision-making; requires all paths to be known
Router + chain Small number of fixed paths based on input classification More predictable than an agent; less flexible
Full agent (ReAct) Branching logic is complex or data-driven; paths cannot be predetermined Most flexible; highest cost, latency, and unpredictability
Human workflow + AI assist Task requires human judgment throughout; AI augments but doesn't automate Lower automation; more reliable for high-stakes decisions

Trade-offs

Dimension Advantage Cost
Flexibility Handles unforeseen cases dynamically Introduces unpredictability
Autonomy Reduces human coordination overhead Requires containment architecture
Capability Completes multi-step workflows 5–20x cost vs. single-turn calls
Resumability Can checkpoint and continue Requires persistent state infrastructure
Debuggability Flexible tool sequences Harder to trace failures than deterministic chains

Interview Questions

Q1: What distinguishes an agent from a chain in LLM-based systems?

Category: Architecture Difficulty: Senior Role: AI Architect

Answer Framework:

A chain is a developer-defined sequence of LLM calls and transformations where the flow is predetermined. The developer decides at build time what steps to execute and in what order. A chain is appropriate when the task has a fixed structure.

An agent is an LLM-powered system where the model itself decides what to do next at each step, based on its reasoning about the current state. The developer provides tools and a goal; the agent determines the sequence of tool calls required to achieve that goal. This makes agents appropriate for tasks where the path depends on data encountered at runtime.

The practical consequence: agents can handle cases the developer did not anticipate, but they also introduce unpredictability, higher cost, and new failure modes that chains do not have. The choice between them is an explicit architectural decision, not a preference.

Key Points to Hit: Developer controls flow (chain) vs. LLM controls flow (agent); dynamic tool selection; when each is appropriate; trade-offs are real.

Red Flags: "Agents are just better chains" — they are not. They are a different architectural pattern with different cost/benefit profiles.


Q2: A prior authorization workflow requires 8 sequential steps, each dependent on the previous. Should you use an agent or a chain?

Category: System Design Difficulty: Principal Role: AI Architect

Answer Framework:

The key question is whether the steps and their sequence are known at design time. If the prior auth workflow always executes the same 8 steps in the same order regardless of input, a chain is more appropriate — it is faster, cheaper, easier to debug, and more predictable than an agent.

However, prior auth is rarely that simple. Real workflows branch: different payers have different criteria; some requests require clinical literature lookup while others don't; some requests can be auto-approved while others require escalation. If these decision points depend on data retrieved at runtime, an agent or an agent-augmented state machine (LangGraph) is warranted.

The production pattern: a LangGraph state machine where fixed paths are expressed as deterministic edges and dynamic decisions are expressed as conditional edges routing to an LLM-powered decision node. This gives you the predictability of a chain for known paths and the flexibility of an agent for the branching logic.

Red Flags: "Use an agent for everything" — overcomplicated; "Use a chain for everything" — breaks when faced with realistic workflow variation.


Q3: What is the "agent paradox" and how does it affect system design?

Category: Architecture Difficulty: Principal Role: AI Architect / Engineering Manager

Answer Framework:

The agent paradox is the observation that the more autonomous and capable you make an agent, the more dangerous and expensive its failure modes become. A highly capable agent that can take consequential actions — send emails, submit to payers, write to patient records — is also an agent that can cause consequential harm if it makes a mistake.

This creates a design tension: the value of an agent scales with how much it can do autonomously, but the risk also scales. The architectural resolution is graduated autonomy: the agent operates autonomously up to a defined risk threshold, above which it requires human approval. Low-risk actions (reading records, querying guidelines) proceed automatically. High-risk actions (submitting prior auth decisions, updating medication records) require physician review.

This is why human-in-the-loop is an architectural requirement for enterprise clinical agents, not an optional feature. It is the mechanism that resolves the agent paradox.

Key Takeaways

  • An agent is an LLM equipped with tools, memory, and a goal-directed loop — the LLM decides what to do next at each step
  • The ReAct pattern (Reason → Act → Observe) is the foundation of all practical agent architectures
  • Agents are appropriate when workflow paths cannot be predetermined; chains are appropriate when they can
  • Tool calling via the API's native tools parameter is strongly preferred over prompt-engineered ReAct
  • Every agent must have a max_iterations circuit breaker; unbounded loops are a production risk
  • Agent cost is 5–20x single-turn cost; model tier selection per agent role is the primary cost lever
  • Human-in-the-loop is not optional for enterprise agents that take consequential actions — it resolves the agent paradox
  • The security perimeter of an agent is the union of the permissions of all its tools — minimize both

Glossary

Term Definition
Agent An LLM system that autonomously selects and executes tools across multiple steps to achieve a goal
ReAct Reasoning and Acting — a prompting pattern that interleaves reasoning traces with tool invocations
Tool calling The LLM API feature that allows a model to output a structured request for an external function
Agent loop The Perceive → Reason → Act → Observe cycle that repeats until a stop condition is met
Orchestrator An agent that decomposes a goal and delegates subtasks to worker agents
Worker agent An agent that executes a focused subtask assigned by an orchestrator
Circuit breaker A max_iterations or timeout limit that prevents infinite agent loops
Idempotency The property of an operation that can be safely repeated without changing the result beyond the first execution

Further Reading

In This Repository:

External References:

  • Yao et al. (2022), "ReAct: Synergizing Reasoning and Acting in Language Models" — the foundational paper
  • Anthropic Tool Use documentation — official API reference for tool calling
  • LangGraph documentation — official state machine framework documentation

Previous: AI Foundations | Next: Tool Design Patterns