Tool Design Patterns
Conceptual Explanation
A tool has three components visible to the LLM:
- Name — what the LLM calls in its tool invocation
- Description — the primary signal the LLM uses for tool selection
- Input schema — the parameters the LLM must provide (JSON Schema)
Everything else (the implementation, authorization logic, error handling) is invisible to the LLM. The LLM sees only the schema, so the schema must carry all the information the model needs to use the tool correctly.
Think of a tool schema as a contract: it specifies what the tool does, what inputs it requires, and implicitly, when it should be called. A well-designed schema makes the right tool selection obvious and the wrong tool selection unlikely.
Core Architecture
Tool Schema Structure
{
"name": "verb_noun_specificity",
"description": "What this tool does. When to use it. What it returns. What it does NOT do.",
"input_schema": {
"type": "object",
"properties": {
"parameter_name": {
"type": "string | number | array | object | boolean",
"description": "What this parameter is, in what format, and any constraints.",
"enum": ["allowed_value_1", "allowed_value_2"]
}
},
"required": ["required_param_name"]
}
}Side-Effect Classification
Every tool must be classified by its side-effect profile before deployment:
| Class | Description | Examples | Safeguards Required |
|---|---|---|---|
| Read | Returns data; no state change | get<em>patient</em>labs, search_guidelines |
Authentication only |
| Write | Creates or updates state | create<em>prior</em>auth, update<em>care</em>plan |
Idempotency + audit log |
| Delete | Removes state | cancel<em>order, void</em>prescription |
Confirmation + HITL |
| External | Triggers external system | submit<em>to</em>payer, send_notification |
HITL required; non-retryable |
Architectural Rule: Never expose a Delete or External-class tool to an agent without a human-in-the-loop gate. The consequences of an incorrect invocation are irreversible.
Architecture Diagram
Common Mistakes
Description is too short. "Gets patient data" tells the LLM nothing about when to use it, what it returns, or how it differs from similar tools. Every description should be at least 3–5 sentences.
Parameters have no descriptions. The LLM relies on parameter descriptions to format arguments correctly. A parameter named date_range with no description will be populated inconsistently (days vs. dates vs. timestamps).
Write tools are not idempotent. When an agent retries after a failure, it calls the same tool again. If the write is not idempotent, the state is written twice. Every write tool must implement deduplication via an idempotency key.
Returning raw exceptions. If a tool raises a Python exception, the agent framework typically crashes the workflow. Always catch exceptions inside tools and return structured error objects that include a recoverable flag.
Too many similar tools. If the registry has search<em>guidelines, lookup</em>guidelines, find<em>guidelines, and retrieve</em>guidelines, the LLM will select randomly among them. Consolidate; use discriminating names.
Best Practices
- Write tool descriptions as 3–5 sentences: what it does, when to use it, what it returns, what it does NOT do
- Name tools with
verb_nounconvention; use consistent verb prefixes across all tools of the same class - Every write-side tool must accept an idempotency key and implement dedup logic
- Every tool must return a structured object with a
successboolean and, on failure,error<em>code,error</em>message, andrecoverable - Classify every tool by side effect; require HITL before executing Delete or External class tools
- Set and enforce a latency SLA for every tool; return a structured timeout error if exceeded
- Log every tool call for auditability; sanitize PHI/PII before logging
- Keep tool registries focused: 5–15 tools per agent; specialize rather than generalize
Alternatives
| Approach | When to Use | Trade-off |
|---|---|---|
| Native tool calling | LLM supports structured tool API | Strongly preferred; most reliable |
| Prompt-based tool calling | Model does not support native tool API | Fragile; parse errors are common |
| Code execution tools | Agent needs arbitrary computation | High security risk; requires sandboxing |
| Pre-compiled tool chains | Steps are known; tools always called in the same sequence | Less flexible; more predictable |
Trade-offs
| Dimension | Advantage | Cost |
|---|---|---|
| Descriptive schemas | Higher selection accuracy | Larger context per iteration |
| Idempotent write tools | Safe agent retries | Additional dedup infrastructure |
| HITL gates on external tools | Prevents irreversible mistakes | Adds human latency to workflow |
| Structured error responses | Agent can recover and retry | Tool implementation complexity |
| Tool versioning | Safe schema evolution | Registry management overhead |
Interview Questions
Q1: What makes a tool description effective for LLM tool selection?
Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer
Answer Framework:
Effective tool descriptions do four things: state what the tool does in plain language, identify the triggering condition (when should the model use this tool?), specify what the tool returns, and disambiguate from similar tools by stating what the tool does NOT do. The triggering condition is the most commonly omitted element and the most important for selection accuracy.
The description is not documentation for humans — it is a classification signal that the LLM uses at inference time, fresh for every query. It should be written the way a senior engineer would explain a tool to a new team member in one paragraph: precise, without jargon, covering when and why not just what.
Red Flags: "The description just says what the function does — isn't that enough?" No. The LLM needs discriminating signals between similar tools, not just a function definition.
Q2: Why must write-side tools in an agent be idempotent, and how do you implement it?
Category: Technical Depth Difficulty: Senior Role: AI Architect / ML Engineer
Answer Framework:
Agents retry when they encounter errors. If a tool creates a record in a database, a retry after a partial failure creates a duplicate record. In a clinical system, a duplicate prior auth submission triggers duplicate reviews, confuses payers, and may double-bill.
Idempotency is implemented via a deduplication key: the caller (agent) provides a stable key representing "this specific logical operation." The tool stores completed operations in a lookup table. Before executing, the tool checks whether the key has been processed. If yes, it returns the stored result. If no, it executes, stores the result, and returns it.
The agent generates the idempotency key by hashing the combination of the workflow run ID and the step index (or tool call index). This makes the key stable across retries and unique across different workflow runs.
Red Flags: "Just check for duplicate records after the fact" — this is reactive, not preventive, and doesn't work for external API calls.
Key Takeaways
- The tool schema (name + description + input schema) is the contract between developer intent and LLM reasoning — treat it as a first-class engineering artifact
- Tool descriptions must state: what it does, when to use it, what it returns, and what it does NOT do
- Classify every tool by side effect: Read, Write, Delete, External — and gate Delete/External tools behind human-in-the-loop approval
- All write-side tools must implement idempotency via a deduplication key to enable safe agent retries
- Tools must return structured error objects with a
recoverableflag, not raw exceptions - Keep tool registries focused: 5–15 tools per agent maximizes selection accuracy
- Log every tool invocation with inputs, outputs, and caller identity for auditability