Agent Architecture Fundamentals

Executive Summary

Agentic AI systems are LLM-powered architectures in which the model autonomously decides what actions to take, executes those actions through tools, observes the results, and continues reasoning until a goal is achieved. This chapter establishes the architectural foundations — the agent loop, tool calling mechanics, planning patterns, and the ReAct framework — that every subsequent chapter in this section builds upon. It is required reading before any framework-specific chapter (LangGraph, CrewAI, MCP). AI architects, senior engineers, and engineering leaders responsible for production agentic systems should read this chapter in full.

Learning Objectives

  • Explain what distinguishes an agent from a chain or a standard LLM API call
  • Describe the ReAct loop and why it is the dominant agent pattern
  • Identify when agents are and are not the right architectural choice
  • Explain how tool calling works at the API level for major LLM providers
  • Evaluate planning strategies and their trade-offs for enterprise workflows

Business Problem

Enterprises need AI systems that can complete multi-step, goal-directed workflows — not just answer a single question. Consider a prior authorization workflow: the AI must retrieve the patient's clinical history, look up the relevant payer policy, evaluate clinical criteria against that policy, identify missing documentation, draft a determination letter, and route it for physician review. This cannot be expressed as a single prompt. It requires sequential decision-making, tool invocation, and state persistence across multiple steps.

Traditional LLM deployments (RAG pipelines, single-turn Q&A) cannot complete these workflows autonomously. Human operators must bridge the gaps between steps, which is precisely the labor cost that agentic systems eliminate.

Why This Technology Exists

The earliest practical pattern for autonomous LLM behavior was chain-of-thought prompting (Wei et al., 2022) — instructing the model to reason step by step before answering. This improved reasoning quality but still produced a single output; the model could not take external actions.

The breakthrough was tool calling (function calling): the ability for an LLM to output a structured request for an external function, receive the result, and continue reasoning. OpenAI introduced function calling in June 2023. Anthropic introduced tool use in their API in 2023. Once tool calling was reliable, the agent loop became architecturally practical.

ReAct (Yao et al., 2022 — "Reasoning and Acting") formalized the pattern: interleave reasoning traces (Thought:) with tool invocations (Action:) and observations (Observation:), allowing the model to plan, act, and adapt in a single coherent loop. ReAct demonstrated that LLMs could solve tasks requiring 5–20 sequential steps with meaningful reliability.

The gap ReAct filled: between a model that reasons well but cannot act, and a model that acts but cannot reason about its actions' consequences.

Conceptual Explanation

An agent is an LLM equipped with:

  1. Tools — functions the model can call (search, database query, API call, file write)
  2. Memory — access to prior context beyond the current prompt
  3. A goal — an objective it pursues across multiple steps
  4. A loop — a mechanism to continue reasoning until the goal is reached or a stop condition is met

The critical distinction from a chain: in a chain, the developer decides the sequence of LLM calls in advance. In an agent, the LLM decides what to do next at each step based on what it observes. This autonomy is both the value proposition and the primary risk.

The Agent vs. Chain Distinction

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

The agent dynamically selects tools and decides when it has enough information. This flexibility enables handling cases the developer did not anticipate at design time.

Core Architecture

The agent's execution model follows the Perceive → Reason → Act → Observe cycle:

  1. Perceive: The agent receives input (user query, system trigger, prior step result). This is combined with the system prompt, available tool schemas, and any memory retrieved from external stores into the context window.
  1. Reason: The LLM generates a response. If it determines a tool is needed, it outputs a structured tool call (JSON with tool name and arguments). If it has enough information, it outputs a final response.
  1. Act: The framework executes the tool call against the real system (API, database, file). The LLM does not execute code — it outputs a request; the framework executes it.
  1. Observe: The tool result is appended to the context as an observation. The agent loops back to Reason with updated context.

This loop continues until the LLM either produces a final answer (no tool call) or a stop condition is met (max iterations, timeout, error threshold).

Architecture Diagram

Standalone diagram: architecture/mermaid/02-agent-loop.mmd

Enterprise Considerations

When NOT to use agents. The agent loop introduces latency (multiple LLM calls), cost (tokens for each iteration), and failure modes (loops, wrong tool selection) that simple chains and pipelines do not. Choose agents only when the workflow requires genuine branching, the path cannot be predetermined by a developer, or handling of unforeseen cases provides significant value.

Latency budget. Each iteration of the agent loop adds one LLM call (1–5 seconds for frontier models) plus tool execution time. A 5-iteration agent workflow can take 10–30 seconds end-to-end. Design user experiences accordingly: stream intermediate steps, show progress indicators, and establish explicit SLAs before committing to agentic architecture.

Cost at scale. Agentic workflows consume 5–20x more tokens than single-turn calls. Model selection by agent role is essential: use small, fast models for routing and summarization; reserve frontier models for complex reasoning steps. See Chapter 4: Enterprise AI — Cost Management [PLANNED].

Determinism and reproducibility. Agents are non-deterministic — the same input can produce different tool call sequences. This is acceptable for knowledge retrieval tasks but requires careful design for tasks that write to databases, send communications, or trigger external workflows. Implement idempotency at the tool level.

Circuit breakers. Always set max_iterations. An agent that loops indefinitely costs money and holds resources. Log all iterations; alert if an agent consistently reaches the max iteration limit (it indicates the task is beyond the agent's capability or the tools are insufficient).

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

A Reference Healthcare Organization deploys a Prior Authorization Agent that executes the following workflow autonomously:

  1. Receive prior auth request from Epic via FHIR webhook
  2. Fetch patient clinical history (EHR tool)
  3. Retrieve relevant clinical guidelines (RAG tool)
  4. Look up payer policy criteria (knowledge base tool)
  5. Evaluate clinical criteria against policy
  6. Draft determination letter
  7. Interrupt — route to physician for review (human-in-the-loop)
  8. Submit to payer after physician approval

This workflow requires 6–8 tool calls, conditional branching (different payers have different criteria), and human oversight before the consequential final action. It is a canonical enterprise agentic workflow. The full implementation is in examples/langgraph/01-clinical-workflow-graph.py.

Common Mistakes

Running agents where chains suffice. If the sequence of steps is known at design time and doesn't change based on input, use a chain. The agent overhead (latency, cost, unpredictability) is not justified.

Unbounded loops. Forgetting max_iterations causes runaway cost and resource consumption. Always set a limit; always log when it is reached.

Too many tools. Giving an agent 30 tools degrades tool selection accuracy. Keep the tool registry focused: 5–15 tools per agent. Specialize agents rather than creating a general-purpose agent with every possible tool.

Mutable tool side effects without idempotency. If a tool writes to a database and the agent fails mid-workflow, re-running the workflow executes the write again. All write-side tools must be idempotent or the agent must checkpoint state after each write.

Trusting tool output without validation. Tool results can contain errors, unexpected formats, or injection payloads. Validate tool output structure before appending it to the agent's context.

Best Practices

  • Use agents only when the workflow cannot be expressed as a predetermined sequence
  • Set max_iterations on every agent; alert when it triggers in production
  • Keep tool registries focused: 5–15 tools maximum per agent
  • Use small models for tool routing decisions; reserve frontier models for reasoning
  • Implement all write-side tools as idempotent operations
  • Checkpoint agent state after each successful tool call for resumability
  • Require human-in-the-loop review before irreversible actions (sends, writes, external submissions)
  • Log all tool calls with inputs, outputs, timestamps, and cost for auditability

Alternatives

Approach When to Choose Trade-off
Sequential chain Steps are known at design time; no branching needed Less flexible; faster and more predictable
Parallel chain (fan-out) Multiple independent tasks; aggregate results No dynamic decision-making; requires all paths to be known
Router + chain Small number of fixed paths based on input classification More predictable than an agent; less flexible
Full agent (ReAct) Branching logic is complex or data-driven; paths cannot be predetermined Most flexible; highest cost, latency, and unpredictability
Human workflow + AI assist Task requires human judgment throughout; AI augments but doesn't automate Lower automation; more reliable for high-stakes decisions

Trade-offs

Dimension Advantage Cost
Flexibility Handles unforeseen cases dynamically Introduces unpredictability
Autonomy Reduces human coordination overhead Requires containment architecture
Capability Completes multi-step workflows 5–20x cost vs. single-turn calls
Resumability Can checkpoint and continue Requires persistent state infrastructure
Debuggability Flexible tool sequences Harder to trace failures than deterministic chains

Interview Questions

Q1: What distinguishes an agent from a chain in LLM-based systems?

Category: Architecture Difficulty: Senior Role: AI Architect

Answer Framework:

A chain is a developer-defined sequence of LLM calls and transformations where the flow is predetermined. The developer decides at build time what steps to execute and in what order. A chain is appropriate when the task has a fixed structure.

An agent is an LLM-powered system where the model itself decides what to do next at each step, based on its reasoning about the current state. The developer provides tools and a goal; the agent determines the sequence of tool calls required to achieve that goal. This makes agents appropriate for tasks where the path depends on data encountered at runtime.

The practical consequence: agents can handle cases the developer did not anticipate, but they also introduce unpredictability, higher cost, and new failure modes that chains do not have. The choice between them is an explicit architectural decision, not a preference.

Key Points to Hit: Developer controls flow (chain) vs. LLM controls flow (agent); dynamic tool selection; when each is appropriate; trade-offs are real.

Red Flags: "Agents are just better chains" — they are not. They are a different architectural pattern with different cost/benefit profiles.


Q2: A prior authorization workflow requires 8 sequential steps, each dependent on the previous. Should you use an agent or a chain?

Category: System Design Difficulty: Principal Role: AI Architect

Answer Framework:

The key question is whether the steps and their sequence are known at design time. If the prior auth workflow always executes the same 8 steps in the same order regardless of input, a chain is more appropriate — it is faster, cheaper, easier to debug, and more predictable than an agent.

However, prior auth is rarely that simple. Real workflows branch: different payers have different criteria; some requests require clinical literature lookup while others don't; some requests can be auto-approved while others require escalation. If these decision points depend on data retrieved at runtime, an agent or an agent-augmented state machine (LangGraph) is warranted.

The production pattern: a LangGraph state machine where fixed paths are expressed as deterministic edges and dynamic decisions are expressed as conditional edges routing to an LLM-powered decision node. This gives you the predictability of a chain for known paths and the flexibility of an agent for the branching logic.

Red Flags: "Use an agent for everything" — overcomplicated; "Use a chain for everything" — breaks when faced with realistic workflow variation.


Q3: What is the "agent paradox" and how does it affect system design?

Category: Architecture Difficulty: Principal Role: AI Architect / Engineering Manager

Answer Framework:

The agent paradox is the observation that the more autonomous and capable you make an agent, the more dangerous and expensive its failure modes become. A highly capable agent that can take consequential actions — send emails, submit to payers, write to patient records — is also an agent that can cause consequential harm if it makes a mistake.

This creates a design tension: the value of an agent scales with how much it can do autonomously, but the risk also scales. The architectural resolution is graduated autonomy: the agent operates autonomously up to a defined risk threshold, above which it requires human approval. Low-risk actions (reading records, querying guidelines) proceed automatically. High-risk actions (submitting prior auth decisions, updating medication records) require physician review.

This is why human-in-the-loop is an architectural requirement for enterprise clinical agents, not an optional feature. It is the mechanism that resolves the agent paradox.

Key Takeaways

  • An agent is an LLM equipped with tools, memory, and a goal-directed loop — the LLM decides what to do next at each step
  • The ReAct pattern (Reason → Act → Observe) is the foundation of all practical agent architectures
  • Agents are appropriate when workflow paths cannot be predetermined; chains are appropriate when they can
  • Tool calling via the API's native tools parameter is strongly preferred over prompt-engineered ReAct
  • Every agent must have a max_iterations circuit breaker; unbounded loops are a production risk
  • Agent cost is 5–20x single-turn cost; model tier selection per agent role is the primary cost lever
  • Human-in-the-loop is not optional for enterprise agents that take consequential actions — it resolves the agent paradox
  • The security perimeter of an agent is the union of the permissions of all its tools — minimize both

Further Reading

In This Repository:

External References:

  • Yao et al. (2022), "ReAct: Synergizing Reasoning and Acting in Language Models" — the foundational paper
  • Anthropic Tool Use documentation — official API reference for tool calling
  • LangGraph documentation — official state machine framework documentation

Previous: AI Foundations | Next: Tool Design Patterns