2.112 min

Design effective tool interfaces with clear descriptions and boundaries

In production, the tool description is the main thing Claude uses to decide which tool to call. When several tools look alike, thin or overlapping descriptions cause misrouting, wrong data, and failed tasks. This lesson covers how to write differentiated descriptions, when to rename or split tools, and how system prompt wording can quietly override even a well-built interface.

Overlapping tool descriptions cause coin-flip misrouting (top). Renaming and rescoping so each request maps to exactly one purpose-specific tool fixes it (bottom).

The description is the model's primary selection signal

Every tool you give Claude, whether through the API tools array or an MCP server, is defined by three things: a name, a description, and an input_schema. When Claude chooses which tool to call, it reasons almost entirely from the description and the parameter docs. It never sees your source code or your intentions, so the description is the single highest-leverage field you control.

Anthropic's guidance is direct: write detailed descriptions, ideally more than three or four sentences per tool. Cover what the tool does, when to use it, when NOT to use it, and what each parameter means. A one-line description like 'Retrieves order details' gives the model almost nothing, and that thin signal collapses the moment two tools look alike.

name: lookup_order
description: Look up ONE order by its order ID (format ORD- followed by 8 digits). Returns line items, status, and totals. Use only when the user names a specific order or order number. Do NOT use to identify a customer, call get_customer first.
input: { order_id: string }  # e.g. ORD-10482913

What to put in a description: inputs, examples, edge cases, boundaries

A strong description answers four questions: what the tool does, what inputs it expects (formats, units, valid ranges, example values), what it returns, and when to use it versus similar tools. Add example queries and describe edge-case behavior, such as what happens on no match or on an ambiguous identifier.

Explicit boundary statements are what let Claude separate near neighbors. A line like 'Use this only for web search result lists, never for local document text' is worth more than any amount of generic prose. Boundaries turn an ambiguous menu into a decision the model can actually make.

This is exactly the fix in the customer-support scenario: get_customer and lookup_order both said only 'Retrieves ... information' and accepted similar identifier formats, so the agent routed 'check my order #12345' to get_customer. Expanding both descriptions with input formats, example queries, edge cases, and explicit boundaries is the first, highest-leverage change, higher leverage than few-shot examples or an external router.

Overlapping descriptions cause misrouting

Misrouting is usually a symptom of overlap, two tools whose descriptions are so similar the model has no basis to choose. The canonical example: analyze_content and analyze_document both described as 'Analyzes content'. Whichever one the model picks is essentially a coin flip, and web results end up in the document analyzer or vice versa.

You have two levers. First, rename and rescope: change analyze_content to extract_web_results and rewrite its description so it is unambiguously about web search hits (URLs, titles, snippets), not documents. A precise name is a compressed hint the model reads before it even reaches the description. Second, split the generic tool (covered next). Either way, the goal is that any given request maps to exactly one obvious tool.

Splitting generic tools into purpose-specific tools

When one tool tries to do several jobs, its description cannot be precise about any of them, and its input/output contract stays vague. The fix is single responsibility: split the generic tool into purpose-specific tools, each with a defined input and output.

For a research system, a catch-all analyze_document becomes three sharp tools:

analyze_document  ->  split into:
  extract_data_points          in: doc text + fields[]     out: { field: value }
  summarize_content            in: doc text + max_length   out: summary string
  verify_claim_against_source  in: claim + source          out: { supported: bool, evidence }

Each tool now has an obvious purpose, a tighter schema, and clearer errors. The trade-off is more tools, so do not split without reason: giving one agent too many tools hurts selection on its own (see task 2.3). The principle here is purpose-specific, not tool-maximizing. Split when jobs are genuinely distinct, and keep each agent's set scoped to its role.

System prompt wording can override good descriptions

Even a well-written tool interface can be overridden by the system prompt. Tool selection is keyword sensitive: if your system prompt says 'analyze the document' and you have a tool literally named analyze_document, that lexical match can pull the model toward it even when a better tool exists. The prompt created an unintended tool association.

So when tools misfire and the descriptions already look good, audit the system prompt for wording that echoes a tool name. Neutralize the phrasing (say 'review the file' or 'process the input' instead of a phrase that mirrors a tool name), or align the wording with the tool you actually want. The lesson: the tool interface is the name, the description, the input_schema, AND the surrounding prompt, all working together.

Names, parameters, and MCP metadata are part of the interface

The same rules apply to MCP tools. An MCP server advertises each tool with a name, a description, and an inputSchema, and Claude Code discovers all of them at connection time. If those descriptions are thin, the agent may prefer a built-in tool (for example Grep) over a more capable MCP tool, so write MCP tool descriptions that spell out capabilities and outputs in detail.

Do not stop at the description. Parameter names and their per-field descriptions in the input_schema also guide both selection and argument filling. Prefer descriptive field names (order_id, not id), add a short description to each field, and mark required versus optional accurately. A clean, self-documenting schema reduces malformed calls and reduces the model's need to guess.

Anti-patterns to avoid

avoid

When a tool is chosen incorrectly, add a routing classifier or keyword parser that pre-selects the tool before Claude sees the request.

Why it fails: It is over-engineered and bypasses the LLM's language understanding, and it does not fix the root cause, which is usually weak or overlapping descriptions. You now maintain a brittle rules layer that drifts out of sync with the tools.

instead First improve the descriptions: add input formats, example queries, edge cases, and explicit boundaries versus similar tools. Reserve external routing for genuine hard constraints, not for compensating for a vague interface.

avoid

Keep descriptions to one line and rely on the tool name to convey what it does.

Why it fails: Names alone do not carry input formats, outputs, edge cases, or boundaries. Two similarly named tools become indistinguishable, and selection reliability drops sharply.

instead Write multi-sentence descriptions covering purpose, inputs, outputs, and when to use this tool versus its neighbors. Treat the description as the contract, not an afterthought.

avoid

Fix overlapping tools by merging them into one generic tool (for example a single lookup_entity or analyze that auto-detects what to do).

Why it fails: Consolidation can be a valid architecture, but as a first response it just moves the ambiguity inside the tool and produces a vaguer description and a looser schema. It is more effort than the immediate problem, thin descriptions, warrants.

instead Differentiate first: rename and rescope, or split into purpose-specific tools with defined input/output contracts. Consolidate only when the tools truly share one clean contract.

avoid

Assume that once the tool descriptions are good, tool selection is fully controlled.

Why it fails: Keyword-sensitive system prompt wording can still override good descriptions by lexically matching a tool name, creating unintended associations.

instead Audit the system prompt for phrases that mirror tool names and neutralize or realign them. Consider the prompt part of the tool interface.

Worked example: Fixing tool misrouting in a multi-agent research system

Scenario: In your multi-agent research system (coordinator plus web-search, document-analysis, synthesis, and report subagents), the document-analysis agent keeps producing garbage. Web search hit lists get run through a tool meant for full papers, and long PDFs get treated like short snippets. Latency is fine and no tool errors are thrown, the outputs are just wrong.

Step 1, inspect the tool set. The agent exposes two tools:

analyze_content   description: 'Analyzes content'
analyze_document  description: 'Analyzes content'

The descriptions are near-identical and both accept a text blob. The model has no basis to route web results to one and papers to the other, so selection is effectively random. This is textbook overlap-driven misrouting.

Step 2, rename and rescope the overlapping tool. Rename analyze_content to extract_web_results and rewrite its description to be web-specific:

name: extract_web_results
description: Extract structured hits (title, url, snippet, rank) from a WEB SEARCH result list. Use only for search-engine output, never for full document text. Returns an array of results.

Step 3, split the generic document tool. analyze_document was trying to be an extractor, a summarizer, and a fact-checker at once. Split it:

extract_data_points          in: doc text + fields[]     out: { field: value }
summarize_content            in: doc text + max_length   out: summary string
verify_claim_against_source  in: claim + source          out: { supported: bool, evidence }

Now every request maps to exactly one obvious tool, each with a tight input/output contract.

Step 4, audit the system prompt. The synthesis agent's prompt said 'analyze the document and confirm each claim'. The phrase 'analyze the document' was lexically nudging the model back toward the old analyze_document name. Reword it to 'verify each claim against its source using verify_claim_against_source' so the prompt reinforces the intended tool instead of fighting it.

Result: selection becomes deterministic in practice without adding a routing classifier or a pile of few-shot examples. This mirrors the exam's highest-leverage-first principle: fix the interface (descriptions, names, boundaries, and prompt wording) before reaching for heavier machinery.

Exam tips

✓The tool description is the PRIMARY signal Claude uses to pick a tool. When tools misfire, expanding descriptions is the first and highest-leverage fix, not a routing classifier or few-shot patch.
✓A good description states four things: what the tool does, expected inputs (formats and examples), outputs, and explicit boundaries versus similar tools.
✓Two near-identical descriptions (analyze_content vs analyze_document, both 'Analyzes content') cause misrouting. Fix by renaming and rescoping (analyze_content -> extract_web_results) or by splitting into purpose-specific tools.
✓Split a bloated generic tool by responsibility: analyze_document becomes extract_data_points, summarize_content, and verify_claim_against_source, each with a defined input/output contract.
✓Keyword-sensitive system prompt wording can override good descriptions: a prompt saying 'analyze the document' can pull the model toward a tool named analyze_document. Audit and realign prompt wording.
✓An external keyword/routing layer and consolidating tools into one mega-tool are common distractors. Both are over-engineered as a first response, improve the interface first.

Official exam objectives for 2.1

Knowledge of

Tool descriptions as the primary mechanism LLMs use for tool selection; minimal descriptions lead to unreliable selection among similar tools
The importance of including input formats, example queries, edge cases, and boundary explanations in tool descriptions
How ambiguous or overlapping tool descriptions cause misrouting (e.g., analyze_content vs analyze_document with near-identical descriptions)
The impact of system prompt wording on tool selection: keyword-sensitive instructions can create unintended tool associations

Skills in

Writing tool descriptions that clearly differentiate each tool's purpose, expected inputs, outputs, and when to use it versus similar alternatives
Renaming tools and updating descriptions to eliminate functional overlap (e.g., renaming analyze_content to extract_web_results with a web-specific description)
Splitting generic tools into purpose-specific tools with defined input/output contracts (e.g., splitting a generic analyze_document into extract_data_points, summarize_content, and verify_claim_against_source)
Reviewing system prompts for keyword-sensitive instructions that might override well-written tool descriptions

Flashcards from this lesson

What is the single most important field for reliable tool selection, and why?

The tool description. Claude picks tools by reasoning over descriptions and parameter docs, not your code or intent, so thin descriptions produce unreliable selection among similar tools.

get_customer and lookup_order both misfire and both have one-line descriptions. What is the first fix?

Expand each description with input formats, example queries, edge-case behavior, and explicit boundaries stating when to use it versus the other tool. This is higher leverage than few-shot examples or a router.

analyze_content and analyze_document have near-identical descriptions. Name two ways to fix the misrouting.

Rename and rescope one tool (analyze_content -> extract_web_results) with a web-specific description. 2) Split the generic tool into purpose-specific tools.

How would you split a generic analyze_document tool?

Into extract_data_points, summarize_content, and verify_claim_against_source, each with a defined input/output contract (single responsibility).

Descriptions look good but the agent still picks analyze_document. What else should you check?

The system prompt. Keyword-sensitive wording like 'analyze the document' can lexically pull the model toward a tool named analyze_document. Neutralize or realign the wording.

Why is building a keyword/routing classifier a poor first response to tool misrouting?

It is over-engineered, bypasses the LLM's language understanding, and does not address the root cause (weak or overlapping descriptions). Fix the interface first.

Study all flashcards with spaced repetition

Mark this lesson complete when you are confident.

← Previous

1.7 Manage session state, resumption, and forking

2.2 Implement structured error responses for MCP tools