Tool Design Patterns

Executive Summary

Tools are the interface between an LLM's reasoning and the real world. The quality of an agent's tools determines the quality of the agent — a brilliant reasoning model paired with poorly designed tools will fail on tasks that a simpler model with excellent tools can complete reliably. This chapter covers tool schema design, idempotency, error handling, authorization, and versioning at a level of rigor appropriate for production enterprise systems. AI architects, platform engineers, and senior developers building agent tooling should read this chapter before writing any tool implementation.

Learning Objectives

Design tool schemas that guide reliable LLM tool selection
Classify tools by side-effect profile and apply appropriate safeguards
Implement idempotent write-side tools
Design error responses that allow agents to recover gracefully
Apply the principle of least privilege to tool authorization

Business Problem

Agent tool failures are the most common cause of production agentic system failures. The failures are rarely in the LLM's reasoning — they are in the tools: a tool with an ambiguous description causes wrong selection; a tool that throws an unhandled exception halts the workflow; a non-idempotent write tool double-commits a transaction when the agent retries; a tool with excessive permissions exposes sensitive data.

Poorly designed tools create a class of bugs that are invisible in testing (the LLM selects the wrong tool but produces a plausible-sounding answer) and catastrophic in production (the wrong action is taken on a real patient record). Tool design is therefore a first-class engineering discipline, not an afterthought.

Why This Technology Exists

When Anthropic and OpenAI introduced native tool calling, they created a contract: the developer provides a JSON schema describing each tool; the LLM uses that schema to decide when and how to call the tool. The schema is not documentation for humans — it is the primary communication channel between the developer's intent and the LLM's reasoning. This makes schema design as important as API design was in the service-oriented architecture era.

The challenge: LLMs are stochastic. Unlike a human developer who reads documentation once, the LLM reasons about the schema fresh for every query. Tool schemas must be unambiguous, consistently named, and designed to minimize the chance that the model selects the wrong tool or calls a correct tool with wrong arguments.

Conceptual Explanation

A tool has three components visible to the LLM:

Name — what the LLM calls in its tool invocation
Description — the primary signal the LLM uses for tool selection
Input schema — the parameters the LLM must provide (JSON Schema)

Everything else (the implementation, authorization logic, error handling) is invisible to the LLM. The LLM sees only the schema, so the schema must carry all the information the model needs to use the tool correctly.

Think of a tool schema as a contract: it specifies what the tool does, what inputs it requires, and implicitly, when it should be called. A well-designed schema makes the right tool selection obvious and the wrong tool selection unlikely.

Core Architecture

Tool Schema Structure

Implementation code omitted in the Playbook edition. For complete code examples, production patterns, and advanced implementation details, see the Enterprise AI Technical Reference.

Side-Effect Classification

Every tool must be classified by its side-effect profile before deployment:

Class	Description	Examples	Safeguards Required
Read	Returns data; no state change	`get<em>patient</em>labs`, `search_guidelines`	Authentication only
Write	Creates or updates state	`create<em>prior</em>auth`, `update<em>care</em>plan`	Idempotency + audit log
Delete	Removes state	`cancel<em>order`, `void</em>prescription`	Confirmation + HITL
External	Triggers external system	`submit<em>to</em>payer`, `send_notification`	HITL required; non-retryable

Architectural Rule: Never expose a Delete or External-class tool to an agent without a human-in-the-loop gate. The consequences of an incorrect invocation are irreversible.

Architecture Diagram

graph TD subgraph "Tool Design Layers" Schema["Tool Schema\n(Name + Description + Input Schema)\nVisible to LLM"] Validator["Input Validator\nType checking + constraint validation\nNot visible to LLM"] AuthZ["Authorization Layer\nCaller identity + permission check\nNot visible to LLM"] Idempotency["Idempotency Guard\nDedup key check + replay detection\nNot visible to LLM"] Implementation["Tool Implementation\nActual logic / API call / DB query\nNot visible to LLM"] AuditLog["Audit Logger\nInput + output + caller + timestamp\nNot visible to LLM"] ErrorHandler["Error Handler\nStructured error response for agent recovery\nNot visible to LLM"] end Schema --> Validator --> AuthZ --> Idempotency --> Implementation Implementation --> AuditLog Implementation -->|"Success"| Response["Structured Response\n(returned as tool_result)"] Implementation -->|"Error"| ErrorHandler --> RecoverableError["Recoverable Error Response\n(agent can retry or adapt)"]

Enterprise Considerations

Tool registry governance. In large organizations, multiple teams build tools for shared agents. Establish a governance process: tool schemas go through review before registration, descriptions are tested against a set of disambiguation queries, and each tool has a documented owner and SLA. A poorly designed tool registered by one team can degrade the performance of every agent in the platform.

Tool versioning. When a tool's schema changes incompatibly (renamed parameter, removed field), existing agents break. Version tools explicitly: getpatientlabsv2. Deprecate old versions gracefully — do not delete them until all agents using them are migrated. Use the description field to mark deprecated tools: "DEPRECATED: Use getpatientlabsv2 instead."

Latency SLAs. Tools are in the critical path of every agent iteration. A tool that takes 5 seconds to respond adds 5 seconds to every agent step that uses it. Define and enforce latency SLAs for every tool: document expected p50/p95/p99 latency; implement timeouts; return a structured error if the timeout is exceeded.

Cost attribution. In multi-tenant platforms, tools that query external APIs incur costs (per-call fees, database read units, etc.). Instrument every tool call with the calling agent's identity for cost attribution. This is also necessary for security auditing.

Healthcare Example

⊕ Healthcare Example

Educational Example — Illustrative Workflow. Not intended for clinical decision making.

A Reference Healthcare Organization's prior authorization agent uses a tool registry with six tools, classified by side effect:

Tool	Class	HITL Required
`get<em>patient</em>clinical_summary`	Read	No
`search<em>clinical</em>guidelines`	Read	No
`lookup<em>payer</em>policy`	Read	No
`create<em>prior</em>auth_draft`	Write	No (draft only)
`update<em>prior</em>auth<em>with</em>physician_notes`	Write	No (physician is already the actor)
`submit<em>prior</em>auth<em>to</em>payer`	External	Yes

The submitpriorauthtopayer tool is never called autonomously — the agent always interrupts and routes to a physician before submission. This single architectural decision bounds the agent's maximum autonomous impact to creating a draft, not submitting a binding request to a payer.

Common Mistakes

Description is too short. "Gets patient data" tells the LLM nothing about when to use it, what it returns, or how it differs from similar tools. Every description should be at least 3–5 sentences.

Parameters have no descriptions. The LLM relies on parameter descriptions to format arguments correctly. A parameter named date_range with no description will be populated inconsistently (days vs. dates vs. timestamps).

Write tools are not idempotent. When an agent retries after a failure, it calls the same tool again. If the write is not idempotent, the state is written twice. Every write tool must implement deduplication via an idempotency key.

Returning raw exceptions. If a tool raises a Python exception, the agent framework typically crashes the workflow. Always catch exceptions inside tools and return structured error objects that include a recoverable flag.

Too many similar tools. If the registry has searchguidelines, lookupguidelines, findguidelines, and retrieveguidelines, the LLM will select randomly among them. Consolidate; use discriminating names.

Best Practices

Write tool descriptions as 3–5 sentences: what it does, when to use it, what it returns, what it does NOT do
Name tools with verb_noun convention; use consistent verb prefixes across all tools of the same class
Every write-side tool must accept an idempotency key and implement dedup logic
Every tool must return a structured object with a success boolean and, on failure, errorcode, errormessage, and recoverable
Classify every tool by side effect; require HITL before executing Delete or External class tools
Set and enforce a latency SLA for every tool; return a structured timeout error if exceeded
Log every tool call for auditability; sanitize PHI/PII before logging
Keep tool registries focused: 5–15 tools per agent; specialize rather than generalize

Alternatives

Approach	When to Use	Trade-off
Native tool calling	LLM supports structured tool API	Strongly preferred; most reliable
Prompt-based tool calling	Model does not support native tool API	Fragile; parse errors are common
Code execution tools	Agent needs arbitrary computation	High security risk; requires sandboxing
Pre-compiled tool chains	Steps are known; tools always called in the same sequence	Less flexible; more predictable

Trade-offs

Dimension	Advantage	Cost
Descriptive schemas	Higher selection accuracy	Larger context per iteration
Idempotent write tools	Safe agent retries	Additional dedup infrastructure
HITL gates on external tools	Prevents irreversible mistakes	Adds human latency to workflow
Structured error responses	Agent can recover and retry	Tool implementation complexity
Tool versioning	Safe schema evolution	Registry management overhead

Interview Questions

Q1: What makes a tool description effective for LLM tool selection?

Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer

Answer Framework:

Effective tool descriptions do four things: state what the tool does in plain language, identify the triggering condition (when should the model use this tool?), specify what the tool returns, and disambiguate from similar tools by stating what the tool does NOT do. The triggering condition is the most commonly omitted element and the most important for selection accuracy.

The description is not documentation for humans — it is a classification signal that the LLM uses at inference time, fresh for every query. It should be written the way a senior engineer would explain a tool to a new team member in one paragraph: precise, without jargon, covering when and why not just what.

Red Flags: "The description just says what the function does — isn't that enough?" No. The LLM needs discriminating signals between similar tools, not just a function definition.

Q2: Why must write-side tools in an agent be idempotent, and how do you implement it?

Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer

Answer Framework:

Agents retry when they encounter errors. If a tool creates a record in a database, a retry after a partial failure creates a duplicate record. In a clinical system, a duplicate prior auth submission triggers duplicate reviews, confuses payers, and may double-bill.

Idempotency is implemented via a deduplication key: the caller (agent) provides a stable key representing "this specific logical operation." The tool stores completed operations in a lookup table. Before executing, the tool checks whether the key has been processed. If yes, it returns the stored result. If no, it executes, stores the result, and returns it.

The agent generates the idempotency key by hashing the combination of the workflow run ID and the step index (or tool call index). This makes the key stable across retries and unique across different workflow runs.

Red Flags: "Just check for duplicate records after the fact" — this is reactive, not preventive, and doesn't work for external API calls.

Key Takeaways

The tool schema (name + description + input schema) is the contract between developer intent and LLM reasoning — treat it as a first-class engineering artifact
Tool descriptions must state: what it does, when to use it, what it returns, and what it does NOT do
Classify every tool by side effect: Read, Write, Delete, External — and gate Delete/External tools behind human-in-the-loop approval
All write-side tools must implement idempotency via a deduplication key to enable safe agent retries
Tools must return structured error objects with a recoverable flag, not raw exceptions
Keep tool registries focused: 5–15 tools per agent maximizes selection accuracy
Log every tool invocation with inputs, outputs, and caller identity for auditability

Tool Design Patterns#

Executive Summary#

Learning Objectives#

Business Problem#

Why This Technology Exists#

Conceptual Explanation#

Core Architecture#

Tool Schema Structure#

Side-Effect Classification#

Architecture Diagram#

Enterprise Considerations#

Healthcare Example#

Common Mistakes#

Best Practices#

Alternatives#

Trade-offs#

Interview Questions#

Q1: What makes a tool description effective for LLM tool selection?#

Q2: Why must write-side tools in an agent be idempotent, and how do you implement it?#

Key Takeaways#

Further Reading#

Tool Design Patterns

Executive Summary

Learning Objectives

Business Problem

Why This Technology Exists

Conceptual Explanation

Core Architecture

Tool Schema Structure

Side-Effect Classification

Architecture Diagram

Enterprise Considerations

Healthcare Example

Common Mistakes

Best Practices

Alternatives

Trade-offs

Interview Questions

Q1: What makes a tool description effective for LLM tool selection?

Q2: Why must write-side tools in an agent be idempotent, and how do you implement it?

Key Takeaways

Further Reading