Tool Design Patterns
Executive Summary
Tools are the interface between an LLM's reasoning and the real world. The quality of an agent's tools determines the quality of the agent — a brilliant reasoning model paired with poorly designed tools will fail on tasks that a simpler model with excellent tools can complete reliably. This chapter covers tool schema design, idempotency, error handling, authorization, and versioning at a level of rigor appropriate for production enterprise systems. AI architects, platform engineers, and senior developers building agent tooling should read this chapter before writing any tool implementation.
Learning Objectives
- Design tool schemas that guide reliable LLM tool selection
- Classify tools by side-effect profile and apply appropriate safeguards
- Implement idempotent write-side tools
- Design error responses that allow agents to recover gracefully
- Apply the principle of least privilege to tool authorization
Business Problem
Agent tool failures are the most common cause of production agentic system failures. The failures are rarely in the LLM's reasoning — they are in the tools: a tool with an ambiguous description causes wrong selection; a tool that throws an unhandled exception halts the workflow; a non-idempotent write tool double-commits a transaction when the agent retries; a tool with excessive permissions exposes sensitive data.
Poorly designed tools create a class of bugs that are invisible in testing (the LLM selects the wrong tool but produces a plausible-sounding answer) and catastrophic in production (the wrong action is taken on a real patient record). Tool design is therefore a first-class engineering discipline, not an afterthought.
Why This Technology Exists
When Anthropic and OpenAI introduced native tool calling, they created a contract: the developer provides a JSON schema describing each tool; the LLM uses that schema to decide when and how to call the tool. The schema is not documentation for humans — it is the primary communication channel between the developer's intent and the LLM's reasoning. This makes schema design as important as API design was in the service-oriented architecture era.
The challenge: LLMs are stochastic. Unlike a human developer who reads documentation once, the LLM reasons about the schema fresh for every query. Tool schemas must be unambiguous, consistently named, and designed to minimize the chance that the model selects the wrong tool or calls a correct tool with wrong arguments.
Conceptual Explanation
A tool has three components visible to the LLM:
- Name — what the LLM calls in its tool invocation
- Description — the primary signal the LLM uses for tool selection
- Input schema — the parameters the LLM must provide (JSON Schema)
Everything else (the implementation, authorization logic, error handling) is invisible to the LLM. The LLM sees only the schema, so the schema must carry all the information the model needs to use the tool correctly.
Think of a tool schema as a contract: it specifies what the tool does, what inputs it requires, and implicitly, when it should be called. A well-designed schema makes the right tool selection obvious and the wrong tool selection unlikely.
Core Architecture
Tool Schema Structure
{
"name": "verb_noun_specificity",
"description": "What this tool does. When to use it. What it returns. What it does NOT do.",
"input_schema": {
"type": "object",
"properties": {
"parameter_name": {
"type": "string | number | array | object | boolean",
"description": "What this parameter is, in what format, and any constraints.",
"enum": ["allowed_value_1", "allowed_value_2"]
}
},
"required": ["required_param_name"]
}
}Side-Effect Classification
Every tool must be classified by its side-effect profile before deployment:
| Class | Description | Examples | Safeguards Required |
|---|---|---|---|
| Read | Returns data; no state change | get<em>patient</em>labs, search_guidelines |
Authentication only |
| Write | Creates or updates state | create<em>prior</em>auth, update<em>care</em>plan |
Idempotency + audit log |
| Delete | Removes state | cancel<em>order, void</em>prescription |
Confirmation + HITL |
| External | Triggers external system | submit<em>to</em>payer, send_notification |
HITL required; non-retryable |
Architectural Rule: Never expose a Delete or External-class tool to an agent without a human-in-the-loop gate. The consequences of an incorrect invocation are irreversible.
Architecture Diagram
Components
Tool Name
Convention: verb<em>noun or verb</em>noun_qualifier. Make the name self-describing.
Good: get_patient_medications, search_clinical_guidelines, create_prior_auth_draft
Bad: get_data, search, create, tool1, patient_funcNames must be consistent: if retrieval tools start with get<em>, all retrieval tools start with get</em>. If creation tools start with create<em>, all creation tools start with create</em>. Inconsistency forces the LLM to read descriptions more carefully and increases selection error rates.
Tool Description
The description is the most important field. It does four things:
- States what the tool does in a single sentence
- States when to use it — the triggering condition
- States what it returns — the LLM needs to know what it will receive
- States what it does NOT do — disambiguates from similar tools
# Poor description — ambiguous, no trigger condition
"description": "Gets patient information"
# Good description — complete contract
"description": (
"Retrieve a patient's current medication list, active diagnoses, and recent lab results "
"from the EHR system. Use this tool when answering clinical questions that require "
"patient-specific context. Returns a JSON object with medications, diagnoses, and labs. "
"Does NOT return historical encounter notes — use get_encounter_notes() for that."
)Input Schema
Use the description field on each property — it is the parameter-level guidance the LLM uses to populate arguments correctly.
"input_schema": {
"type": "object",
"properties": {
"patient_id": {
"type": "string",
"description": "Epic patient MRN. Format: 7-digit numeric string (e.g., '1234567'). "
"Do not include leading zeros or dashes."
},
"date_range_days": {
"type": "integer",
"description": "Number of past days to include in results. Default 30. Max 365.",
"minimum": 1,
"maximum": 365,
"default": 30
}
},
"required": ["patient_id"]
}Implementation Patterns
Pattern 1: Idempotent Write Tool
"""
Idempotent write tool for creating prior authorization drafts.
Educational Example — Reference Implementation.
Not intended for clinical decision making.
"""
import hashlib
import json
from datetime import datetime
from typing import Any
# Deduplication store — Redis or PostgreSQL in production
_processed_requests: dict[str, dict] = {}
def create_prior_auth_draft(
patient_id: str,
procedure_code: str,
clinical_rationale: str,
idempotency_key: str,
) -> dict[str, Any]:
"""
Create a draft prior authorization request.
Idempotency: If called twice with the same idempotency_key, the second call
returns the result of the first call without creating a duplicate.
Callers (including agents on retry) are responsible for providing a stable key.
"""
# Idempotency check — critical for agent retry safety
if idempotency_key in _processed_requests:
existing = _processed_requests[idempotency_key]
return {
**existing,
"idempotent_replay": True,
"message": "Returning existing draft — idempotency key already processed."
}
# Validate inputs before any state change
if not patient_id or not procedure_code:
return {
"success": False,
"error_code": "INVALID_INPUT",
"error_message": "patient_id and procedure_code are required.",
"recoverable": False,
}
# Create the draft (stub — in production: write to database)
draft_id = f"DRAFT-{hashlib.sha256(idempotency_key.encode()).hexdigest()[:8].upper()}"
result = {
"success": True,
"draft_id": draft_id,
"patient_id": patient_id,
"procedure_code": procedure_code,
"status": "DRAFT",
"created_at": datetime.utcnow().isoformat(),
"idempotent_replay": False,
}
# Store for idempotency replay
_processed_requests[idempotency_key] = result
return result
### Pattern 2: Recoverable Error Responses
# Tools must return structured errors that allow the agent to recover.
# Never raise exceptions directly — the agent cannot handle Python exceptions.
def search_drug_formulary(
drug_name: str,
include_alternatives: bool = True,
) -> dict[str, Any]:
"""
Search the institutional drug formulary by drug name.
Returns formulary status, tier, and alternatives.
"""
if not drug_name or len(drug_name.strip()) < 2:
# Recoverable: agent can retry with a corrected name
return {
"success": False,
"error_code": "INVALID_DRUG_NAME",
"error_message": "Drug name must be at least 2 characters. Provide the generic name.",
"recoverable": True,
"suggestion": "Use the generic drug name rather than brand name (e.g., 'lisinopril' not 'Zestril')"
}
# Simulated formulary lookup
formulary_data = {
"metformin": {"status": "FORMULARY", "tier": 1, "pa_required": False},
"lisinopril": {"status": "FORMULARY", "tier": 1, "pa_required": False},
}
drug_key = drug_name.lower().strip()
if drug_key not in formulary_data:
# Recoverable: agent can search with an alternative name
return {
"success": False,
"error_code": "DRUG_NOT_FOUND",
"error_message": f"'{drug_name}' not found in formulary.",
"recoverable": True,
"suggestion": "Try the generic name, or search for the therapeutic class instead."
}
return {
"success": True,
"drug_name": drug_name,
**formulary_data[drug_key],
"alternatives": [] if not include_alternatives else ["alternative_1", "alternative_2"]
}Pattern 3: Tool Registry Class
from dataclasses import dataclass
from typing import Callable
@dataclass
class ToolDefinition:
"""A tool definition with its schema and implementation."""
name: str
description: str
input_schema: dict
implementation: Callable
side_effect_class: str # "read", "write", "delete", "external"
requires_hitl: bool = False
class ToolRegistry:
"""Centralized registry for agent tools."""
def __init__(self) -> None:
self._tools: dict[str, ToolDefinition] = {}
def register(self, tool: ToolDefinition) -> None:
self._tools[tool.name] = tool
def get_schemas(self) -> list[dict]:
"""Return schemas in Anthropic API format."""
return [
{
"name": t.name,
"description": t.description,
"input_schema": t.input_schema,
}
for t in self._tools.values()
]
def execute(self, tool_name: str, tool_input: dict) -> dict:
if tool_name not in self._tools:
return {
"success": False,
"error_code": "UNKNOWN_TOOL",
"error_message": f"Tool '{tool_name}' is not registered.",
"recoverable": False,
}
tool = self._tools[tool_name]
if tool.requires_hitl:
# In production: raise a HITL interrupt signal
raise HITLRequired(
f"Tool '{tool_name}' is classified as {tool.side_effect_class} "
f"and requires human approval before execution."
)
return tool.implementation(**tool_input)
class HITLRequired(Exception):
"""Raised when a tool requires human-in-the-loop approval."""
passEnterprise Considerations
Tool registry governance. In large organizations, multiple teams build tools for shared agents. Establish a governance process: tool schemas go through review before registration, descriptions are tested against a set of disambiguation queries, and each tool has a documented owner and SLA. A poorly designed tool registered by one team can degrade the performance of every agent in the platform.
Tool versioning. When a tool's schema changes incompatibly (renamed parameter, removed field), existing agents break. Version tools explicitly: get<em>patient</em>labs<em>v2. Deprecate old versions gracefully — do not delete them until all agents using them are migrated. Use the description field to mark deprecated tools: "DEPRECATED: Use getpatientlabsv2 instead."
Latency SLAs. Tools are in the critical path of every agent iteration. A tool that takes 5 seconds to respond adds 5 seconds to every agent step that uses it. Define and enforce latency SLAs for every tool: document expected p50/p95/p99 latency; implement timeouts; return a structured error if the timeout is exceeded.
Cost attribution. In multi-tenant platforms, tools that query external APIs incur costs (per-call fees, database read units, etc.). Instrument every tool call with the calling agent's identity for cost attribution. This is also necessary for security auditing.
Security Considerations
Principle of least privilege. Each tool should have only the permissions required for its function. A tool that reads medications does not need access to billing records. Implement tool-level authorization that checks whether the calling agent's identity has permission to use that tool in that context.
Input validation at the tool boundary. The LLM-generated tool inputs arrive as JSON. Treat them as untrusted input, exactly as you would treat user input to a web form. Validate types, ranges, and formats before using inputs in database queries or API calls. Failure to validate tool inputs is a vector for prompt injection leading to database injection.
Audit logging. Every tool invocation — success or failure — must be logged with: tool name, inputs (sanitized for PHI/PII), outputs (sanitized), calling agent identity, timestamp, and latency. This is the audit trail that satisfies HIPAA audit requirements and enables post-incident forensics.
Tool poisoning. In RAG-augmented agent systems, retrieved documents could contain malicious tool call instructions. Validate that tool calls originate from LLM reasoning, not from retrieved content injected into the context. See Chapter 10: Agentic Security for the full threat model.
Healthcare Example
Educational Example — Illustrative Workflow. Not intended for clinical decision making.
A Reference Healthcare Organization's prior authorization agent uses a tool registry with six tools, classified by side effect:
| Tool | Class | HITL Required |
|---|---|---|
get<em>patient</em>clinical_summary |
Read | No |
search<em>clinical</em>guidelines |
Read | No |
lookup<em>payer</em>policy |
Read | No |
create<em>prior</em>auth_draft |
Write | No (draft only) |
update<em>prior</em>auth<em>with</em>physician_notes |
Write | No (physician is already the actor) |
submit<em>prior</em>auth<em>to</em>payer |
External | Yes |
The submit<em>prior</em>auth<em>to</em>payer tool is never called autonomously — the agent always interrupts and routes to a physician before submission. This single architectural decision bounds the agent's maximum autonomous impact to creating a draft, not submitting a binding request to a payer.
Common Mistakes
Description is too short. "Gets patient data" tells the LLM nothing about when to use it, what it returns, or how it differs from similar tools. Every description should be at least 3–5 sentences.
Parameters have no descriptions. The LLM relies on parameter descriptions to format arguments correctly. A parameter named date_range with no description will be populated inconsistently (days vs. dates vs. timestamps).
Write tools are not idempotent. When an agent retries after a failure, it calls the same tool again. If the write is not idempotent, the state is written twice. Every write tool must implement deduplication via an idempotency key.
Returning raw exceptions. If a tool raises a Python exception, the agent framework typically crashes the workflow. Always catch exceptions inside tools and return structured error objects that include a recoverable flag.
Too many similar tools. If the registry has search<em>guidelines, lookup</em>guidelines, find<em>guidelines, and retrieve</em>guidelines, the LLM will select randomly among them. Consolidate; use discriminating names.
Best Practices
- Write tool descriptions as 3–5 sentences: what it does, when to use it, what it returns, what it does NOT do
- Name tools with
verb_nounconvention; use consistent verb prefixes across all tools of the same class - Every write-side tool must accept an idempotency key and implement dedup logic
- Every tool must return a structured object with a
successboolean and, on failure,error<em>code,error</em>message, andrecoverable - Classify every tool by side effect; require HITL before executing Delete or External class tools
- Set and enforce a latency SLA for every tool; return a structured timeout error if exceeded
- Log every tool call for auditability; sanitize PHI/PII before logging
- Keep tool registries focused: 5–15 tools per agent; specialize rather than generalize
Alternatives
| Approach | When to Use | Trade-off |
|---|---|---|
| Native tool calling | LLM supports structured tool API | Strongly preferred; most reliable |
| Prompt-based tool calling | Model does not support native tool API | Fragile; parse errors are common |
| Code execution tools | Agent needs arbitrary computation | High security risk; requires sandboxing |
| Pre-compiled tool chains | Steps are known; tools always called in the same sequence | Less flexible; more predictable |
Trade-offs
| Dimension | Advantage | Cost |
|---|---|---|
| Descriptive schemas | Higher selection accuracy | Larger context per iteration |
| Idempotent write tools | Safe agent retries | Additional dedup infrastructure |
| HITL gates on external tools | Prevents irreversible mistakes | Adds human latency to workflow |
| Structured error responses | Agent can recover and retry | Tool implementation complexity |
| Tool versioning | Safe schema evolution | Registry management overhead |
Interview Questions
Q1: What makes a tool description effective for LLM tool selection?
Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer
Answer Framework:
Effective tool descriptions do four things: state what the tool does in plain language, identify the triggering condition (when should the model use this tool?), specify what the tool returns, and disambiguate from similar tools by stating what the tool does NOT do. The triggering condition is the most commonly omitted element and the most important for selection accuracy.
The description is not documentation for humans — it is a classification signal that the LLM uses at inference time, fresh for every query. It should be written the way a senior engineer would explain a tool to a new team member in one paragraph: precise, without jargon, covering when and why not just what.
Red Flags: "The description just says what the function does — isn't that enough?" No. The LLM needs discriminating signals between similar tools, not just a function definition.
Q2: Why must write-side tools in an agent be idempotent, and how do you implement it?
Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer
Answer Framework:
Agents retry when they encounter errors. If a tool creates a record in a database, a retry after a partial failure creates a duplicate record. In a clinical system, a duplicate prior auth submission triggers duplicate reviews, confuses payers, and may double-bill.
Idempotency is implemented via a deduplication key: the caller (agent) provides a stable key representing "this specific logical operation." The tool stores completed operations in a lookup table. Before executing, the tool checks whether the key has been processed. If yes, it returns the stored result. If no, it executes, stores the result, and returns it.
The agent generates the idempotency key by hashing the combination of the workflow run ID and the step index (or tool call index). This makes the key stable across retries and unique across different workflow runs.
Red Flags: "Just check for duplicate records after the fact" — this is reactive, not preventive, and doesn't work for external API calls.
Key Takeaways
- The tool schema (name + description + input schema) is the contract between developer intent and LLM reasoning — treat it as a first-class engineering artifact
- Tool descriptions must state: what it does, when to use it, what it returns, and what it does NOT do
- Classify every tool by side effect: Read, Write, Delete, External — and gate Delete/External tools behind human-in-the-loop approval
- All write-side tools must implement idempotency via a deduplication key to enable safe agent retries
- Tools must return structured error objects with a
recoverableflag, not raw exceptions - Keep tool registries focused: 5–15 tools per agent maximizes selection accuracy
- Log every tool invocation with inputs, outputs, and caller identity for auditability
Glossary
| Term | Definition |
|---|---|
| Tool schema | The JSON definition of a tool's name, description, and input parameters, visible to the LLM |
| Idempotency | The property of an operation that produces the same result when called multiple times with the same inputs |
| Idempotency key | A unique identifier provided by the caller that enables deduplication of write operations |
| Side effect class | Classification of a tool by the consequences of its execution: Read, Write, Delete, or External |
| Tool poisoning | An attack where malicious content in retrieved documents injects unauthorized tool call instructions |
| Tool registry | A centralized store of tool definitions and their implementations used by an agent system |
Further Reading
In This Repository:
- Agent Architecture Fundamentals — The agent loop that calls these tools
- Agentic Security — Tool misuse and authorization threats
- Human-in-the-Loop — HITL gates for high-risk tools
- examples/agents/01-basic-tool-calling-agent.py — Working tool calling implementation
External References:
- Anthropic Tool Use documentation — official schema specification and best practices
- OpenAI Function Calling documentation — parallel specification for comparison
- JSON Schema documentation — reference for
input_schemaconstruction
Previous: Agent Architecture Fundamentals | Next: Memory Systems