Agent Architecture Fundamentals

Conceptual Explanation

An agent is an LLM equipped with:

Tools — functions the model can call (search, database query, API call, file write)
Memory — access to prior context beyond the current prompt
A goal — an objective it pursues across multiple steps
A loop — a mechanism to continue reasoning until the goal is reached or a stop condition is met

The critical distinction from a chain: in a chain, the developer decides the sequence of LLM calls in advance. In an agent, the LLM decides what to do next at each step based on what it observes. This autonomy is both the value proposition and the primary risk.

The Agent vs. Chain Distinction

text

Chain (developer controls flow):
  Step 1: LLM summarizes document (hardcoded)
  Step 2: LLM extracts entities (hardcoded)
  Step 3: LLM formats output (hardcoded)

Agent (LLM controls flow):
  Goal: "Research this patient's medication history and flag interactions"
  Step 1: LLM decides to call get_patient_medications()
  Step 2: LLM decides to call check_drug_interactions(medications)
  Step 3: LLM decides result is sufficient → produces final answer
  [Or: LLM decides to call get_allergy_history() first, then proceed]

The agent dynamically selects tools and decides when it has enough information. This flexibility enables handling cases the developer did not anticipate at design time.

Core Architecture

The agent's execution model follows the Perceive → Reason → Act → Observe cycle:

Perceive: The agent receives input (user query, system trigger, prior step result). This is combined with the system prompt, available tool schemas, and any memory retrieved from external stores into the context window.

Reason: The LLM generates a response. If it determines a tool is needed, it outputs a structured tool call (JSON with tool name and arguments). If it has enough information, it outputs a final response.

Act: The framework executes the tool call against the real system (API, database, file). The LLM does not execute code — it outputs a request; the framework executes it.

Observe: The tool result is appended to the context as an observation. The agent loops back to Reason with updated context.

This loop continues until the LLM either produces a final answer (no tool call) or a stop condition is met (max iterations, timeout, error threshold).

Architecture Diagram

flowchart TD Start(["User / System Input"]) --> Perceive subgraph "Agent Core" Perceive["Perceive\nFormat input + inject system prompt + memory"] Reason["Reason\nLLM generates thought + decides next action"] Act["Act\nExecute tool call or produce final response"] Observe["Observe\nReceive tool output, append to context"] end subgraph "Memory Layer" WorkingMem["Working Memory\n(Context Window)"] EpisodicMem["Episodic Memory\n(Conversation History)"] SemanticMem["Semantic Memory\n(Vector Store / Knowledge)"] end subgraph "Tool Layer" Tools["Tools\n(APIs, Databases, File Systems, Services)"] end Perceive --> WorkingMem EpisodicMem --> Perceive SemanticMem --> Perceive Perceive --> Reason Reason -->|"Tool call required"| Act Act --> Tools Tools -->|"Tool result"| Observe Observe --> WorkingMem WorkingMem --> Reason Reason -->|"Final answer"| Output(["Response"])

Standalone diagram: architecture/mermaid/02-agent-loop.mmd

Common Mistakes

Running agents where chains suffice. If the sequence of steps is known at design time and doesn't change based on input, use a chain. The agent overhead (latency, cost, unpredictability) is not justified.

Unbounded loops. Forgetting max_iterations causes runaway cost and resource consumption. Always set a limit; always log when it is reached.

Too many tools. Giving an agent 30 tools degrades tool selection accuracy. Keep the tool registry focused: 5–15 tools per agent. Specialize agents rather than creating a general-purpose agent with every possible tool.

Mutable tool side effects without idempotency. If a tool writes to a database and the agent fails mid-workflow, re-running the workflow executes the write again. All write-side tools must be idempotent or the agent must checkpoint state after each write.

Trusting tool output without validation. Tool results can contain errors, unexpected formats, or injection payloads. Validate tool output structure before appending it to the agent's context.

Best Practices

Use agents only when the workflow cannot be expressed as a predetermined sequence
Set max_iterations on every agent; alert when it triggers in production
Keep tool registries focused: 5–15 tools maximum per agent
Use small models for tool routing decisions; reserve frontier models for reasoning
Implement all write-side tools as idempotent operations
Checkpoint agent state after each successful tool call for resumability
Require human-in-the-loop review before irreversible actions (sends, writes, external submissions)
Log all tool calls with inputs, outputs, timestamps, and cost for auditability

Alternatives

Approach	When to Choose	Trade-off
Sequential chain	Steps are known at design time; no branching needed	Less flexible; faster and more predictable
Parallel chain (fan-out)	Multiple independent tasks; aggregate results	No dynamic decision-making; requires all paths to be known
Router + chain	Small number of fixed paths based on input classification	More predictable than an agent; less flexible
Full agent (ReAct)	Branching logic is complex or data-driven; paths cannot be predetermined	Most flexible; highest cost, latency, and unpredictability
Human workflow + AI assist	Task requires human judgment throughout; AI augments but doesn't automate	Lower automation; more reliable for high-stakes decisions

Trade-offs

Dimension	Advantage	Cost
Flexibility	Handles unforeseen cases dynamically	Introduces unpredictability
Autonomy	Reduces human coordination overhead	Requires containment architecture
Capability	Completes multi-step workflows	5–20x cost vs. single-turn calls
Resumability	Can checkpoint and continue	Requires persistent state infrastructure
Debuggability	Flexible tool sequences	Harder to trace failures than deterministic chains

Interview Questions

Q1: What distinguishes an agent from a chain in LLM-based systems?

Category: Architecture Difficulty: Senior Role: AI Architect

Answer Framework:

A chain is a developer-defined sequence of LLM calls and transformations where the flow is predetermined. The developer decides at build time what steps to execute and in what order. A chain is appropriate when the task has a fixed structure.

An agent is an LLM-powered system where the model itself decides what to do next at each step, based on its reasoning about the current state. The developer provides tools and a goal; the agent determines the sequence of tool calls required to achieve that goal. This makes agents appropriate for tasks where the path depends on data encountered at runtime.

The practical consequence: agents can handle cases the developer did not anticipate, but they also introduce unpredictability, higher cost, and new failure modes that chains do not have. The choice between them is an explicit architectural decision, not a preference.

Key Points to Hit: Developer controls flow (chain) vs. LLM controls flow (agent); dynamic tool selection; when each is appropriate; trade-offs are real.

Red Flags: "Agents are just better chains" — they are not. They are a different architectural pattern with different cost/benefit profiles.

Q2: A prior authorization workflow requires 8 sequential steps, each dependent on the previous. Should you use an agent or a chain?

Category: System Design Difficulty: Principal Role: AI Architect

Answer Framework:

The key question is whether the steps and their sequence are known at design time. If the prior auth workflow always executes the same 8 steps in the same order regardless of input, a chain is more appropriate — it is faster, cheaper, easier to debug, and more predictable than an agent.

However, prior auth is rarely that simple. Real workflows branch: different payers have different criteria; some requests require clinical literature lookup while others don't; some requests can be auto-approved while others require escalation. If these decision points depend on data retrieved at runtime, an agent or an agent-augmented state machine (LangGraph) is warranted.

The production pattern: a LangGraph state machine where fixed paths are expressed as deterministic edges and dynamic decisions are expressed as conditional edges routing to an LLM-powered decision node. This gives you the predictability of a chain for known paths and the flexibility of an agent for the branching logic.

Red Flags: "Use an agent for everything" — overcomplicated; "Use a chain for everything" — breaks when faced with realistic workflow variation.

Q3: What is the "agent paradox" and how does it affect system design?

Category: Architecture Difficulty: Principal Role: AI Architect / Engineering Manager

Answer Framework:

The agent paradox is the observation that the more autonomous and capable you make an agent, the more dangerous and expensive its failure modes become. A highly capable agent that can take consequential actions — send emails, submit to payers, write to patient records — is also an agent that can cause consequential harm if it makes a mistake.

This creates a design tension: the value of an agent scales with how much it can do autonomously, but the risk also scales. The architectural resolution is graduated autonomy: the agent operates autonomously up to a defined risk threshold, above which it requires human approval. Low-risk actions (reading records, querying guidelines) proceed automatically. High-risk actions (submitting prior auth decisions, updating medication records) require physician review.

This is why human-in-the-loop is an architectural requirement for enterprise clinical agents, not an optional feature. It is the mechanism that resolves the agent paradox.

Key Takeaways

An agent is an LLM equipped with tools, memory, and a goal-directed loop — the LLM decides what to do next at each step
The ReAct pattern (Reason → Act → Observe) is the foundation of all practical agent architectures
Agents are appropriate when workflow paths cannot be predetermined; chains are appropriate when they can
Tool calling via the API's native tools parameter is strongly preferred over prompt-engineered ReAct
Every agent must have a max_iterations circuit breaker; unbounded loops are a production risk
Agent cost is 5–20x single-turn cost; model tier selection per agent role is the primary cost lever
Human-in-the-loop is not optional for enterprise agents that take consequential actions — it resolves the agent paradox
The security perimeter of an agent is the union of the permissions of all its tools — minimize both

Agent Architecture Fundamentals#

Conceptual Explanation#

The Agent vs. Chain Distinction#

Core Architecture#

Architecture Diagram#

Common Mistakes#

Best Practices#

Alternatives#

Trade-offs#

Interview Questions#

Q1: What distinguishes an agent from a chain in LLM-based systems?#

Q2: A prior authorization workflow requires 8 sequential steps, each dependent on the previous. Should you use an agent or a chain?#

Q3: What is the "agent paradox" and how does it affect system design?#

Key Takeaways#

Agent Architecture Fundamentals

Conceptual Explanation

The Agent vs. Chain Distinction

Core Architecture

Architecture Diagram

Common Mistakes

Best Practices

Alternatives

Trade-offs

Interview Questions

Q1: What distinguishes an agent from a chain in LLM-based systems?

Q2: A prior authorization workflow requires 8 sequential steps, each dependent on the previous. Should you use an agent or a chain?

Q3: What is the "agent paradox" and how does it affect system design?

Key Takeaways