4.312 min

Enforce structured output using tool use and JSON schemas

Downstream systems need machine-parseable output every single time, but asking Claude to "reply with only JSON" fails intermittently at scale. Defining a tool whose input_schema is a JSON Schema, then reading the tool_use block, gives you output whose shape is validated by the API. This lesson covers tool_choice modes, schema design that prevents fabrication, and the critical limit that schemas stop syntax errors but never semantic ones.

The three-step tool-use extraction flow (define tool, set tool_choice, read tool_use.input) with a reminder that schemas guarantee shape but not meaning.

Why tool use is the reliable path to structured output

Prompting the model to "respond with only valid JSON" works most of the time, and "most of the time" is exactly what breaks a production pipeline. Free-text JSON arrives wrapped in prose, fenced in markdown, carrying a trailing comma, or truncated at the token limit. Tool use removes that entire class of failure. You declare a tool whose input_schema is a JSON Schema, the model returns a tool_use content block whose input is an object conforming to that schema, and the API validates the structure for you. The object you read is already parsed and structurally valid.

The extraction "tool" does not need to execute anything. It is a schema carrier: you define it purely to give the model a typed slot to fill. You then scan response.content for the block with type == "tool_use" and read block.input. When the model calls a tool, stop_reason is "tool_use".

tools = [{
  "name": "extract_invoice",
  "description": "Record the fields extracted from an invoice document.",
  "input_schema": {
    "type": "object",
    "properties": {
      "invoice_number": {"type": "string"},
      "total": {"type": "number"}
    },
    "required": ["invoice_number", "total"]
  }
}]
msg = client.messages.create(
    model="claude-...", max_tokens=1024, tools=tools,
    tool_choice={"type": "tool", "name": "extract_invoice"},
    messages=[{"role": "user", "content": document}])
data = next(b.input for b in msg.content if b.type == "tool_use")

tool_choice: auto vs any vs forced

tool_choice controls whether and which tool the model calls, and it is the single most testable idea in this task statement.

{"type": "auto"} (the default when tools are present): the model decides. It may call a tool, or it may return an ordinary text response. For extraction this is a trap, because the model can choose to answer in prose and your parser gets nothing to read.
{"type": "any"}: the model must call one of the provided tools, but it chooses which. Use this when several extraction schemas exist and you do not yet know the document type.
{"type": "tool", "name": "extract_metadata"}: forced selection. The model must call that exact tool. Use this when exactly one schema applies, or when a specific extraction must run before later steps.
{"type": "none"}: prevents tool use entirely (rarely relevant here).

One behavioral detail worth memorizing: when you force a tool (either any or a specific name), the model goes straight to the tool call and cannot emit leading reasoning text, which also makes forced choice incompatible with extended thinking. If you need the model to reason before extracting, use auto with strong instructions, or split into two calls.

Schema design: required, optional, and nullable fields

A required field the model cannot find in the source is a hallucination generator. To satisfy the schema, the model will invent a plausible value rather than leave the slot empty. The fix is to make any field the document might legitimately lack either optional (simply leave it out of the required array) or nullable so the model has a sanctioned way to say "not present."

"properties": {
  "total": {"type": "number"},
  "due_date": {"type": ["string", "null"]},
  "po_number": {"type": ["string", "null"]}
},
"required": ["total"]

Here total is mandatory, while due_date and po_number can come back as null. Reserve required for fields that must always exist. This one design choice does more to cut fabrication than any prompt instruction, because it removes the pressure that caused the model to guess in the first place.

Enums with unclear and other + detail

For categorical fields, an enum constrains the model to a fixed set of legal values, which is far more reliable than free text you later have to normalize. But rigid enums cause two problems: the model is forced to pick a category even when the source is ambiguous, and any real-world value outside your list forces a wrong choice.

Solve both with two additions. Add an "unclear" value so the model can flag genuine ambiguity instead of guessing. And add an "other" value paired with a free-text other_detail field so novel categories are captured rather than mis-slotted into the nearest enum member.

"document_type": {
  "type": "string",
  "enum": ["invoice", "receipt", "purchase_order", "other", "unclear"]
},
"other_detail": {"type": ["string", "null"]}

This keeps the schema strict and machine-friendly while remaining extensible, and it surfaces the cases a human should review instead of burying them inside a confident-looking but wrong category.

Syntax errors are gone, semantic errors are not

This is the highest-value nuance in the whole task statement. A strict JSON schema via tool use guarantees the output is well-formed and type-correct. It does not guarantee the output is correct. The model can still place the tax amount in the subtotal field, return line items that do not add up to total, or transpose the invoice date and the due date. The schema is completely blind to all of these because each one is a semantic error, not a shape error.

Design defensively. Have the model extract both a calculated_total (the sum it computed from the line items) and a stated_total (the figure printed on the document), then compare them in code. Add a conflict_detected boolean for sources that contradict themselves. The schema gets you a clean object to work with; your own validation logic decides whether that object is trustworthy. (Task 4.4 covers the retry-with-error-feedback loop that consumes these validation signals.)

Format normalization lives in the prompt, not the schema

A JSON Schema constrains structure and type, but it cannot normalize the content of a value. Source documents render the same amount as $1,200.00, 1200 USD, or 1.200,00, and dates as 03/04/25, 4 March 2025, or 2025-03-04. If you only supply a schema, the model may faithfully copy whatever ambiguous string it saw.

So pair the strict output schema with explicit normalization rules in the system or user prompt. For example: express all dates as ISO 8601 YYYY-MM-DD; return monetary amounts as a decimal number with no currency symbol or thousands separator, and put the currency in the dedicated currency field; strip surrounding whitespace from identifiers. The schema enforces the container; the prompt tells the model how to fill it. Using both together is what produces output that is consistent across a large, messy corpus rather than merely syntactically valid.

Anti-patterns to avoid

avoid

Prompt the model to "reply with only valid JSON" and json.loads() the text content.

Why it fails: At scale a fraction of responses arrive with prose preambles, markdown code fences, trailing commas, or mid-object truncation, causing parse failures that are invisible in a quick demo.

instead Define a tool with an input_schema and read tool_use.input; the API validates the shape so the object is already parsed and structurally sound.

avoid

Mark every field as required so the extraction is always "complete."

Why it fails: When a required field is absent from the source, the model fabricates a plausible value to satisfy the schema, silently injecting hallucinated data downstream.

instead Make fields the document may lack optional or nullable (type includes "null") so the model can legitimately return null.

avoid

Treat schema-valid output as automatically correct and skip validation.

Why it fails: Tool use eliminates syntax and type errors but not semantic ones: mis-filed values, line items that do not sum to the total, or transposed dates all pass schema validation.

instead Add cross-check fields (calculated_total vs stated_total), conflict_detected booleans, and code-side validation, feeding failures into a retry loop.

avoid

Leave tool_choice on auto when you always need structured data.

Why it fails: auto permits the model to answer in prose instead of calling the tool, so intermittently the parser receives no tool_use block at all.

instead Use tool_choice "any" when the document type is unknown and multiple schemas exist, or force a specific named tool when one schema applies or must run first.

Worked example: Scenario 6: extracting from documents of unknown type

Your extraction system ingests a mixed stream of financial documents. Some batches are invoices, some are receipts, some are purchase orders, and the sender does not label them. You need one call that always returns structured data and picks the right schema per document.

Step 1 define one tool per document type, each carrying its own JSON Schema:

tools = [
  {"name": "extract_invoice", "description": "Use for supplier invoices with an invoice number and payment terms.", "input_schema": invoice_schema},
  {"name": "extract_receipt", "description": "Use for point-of-sale receipts showing purchased items and tender.", "input_schema": receipt_schema},
  {"name": "extract_purchase_order", "description": "Use for purchase orders that authorize a future purchase.", "input_schema": po_schema},
]

Step 2 set tool_choice={"type": "any"}. This guarantees the model calls one of the three, but lets it choose which fits the document. The clear, differentiated descriptions (this maps to Task 2.1) are what make that choice reliable.

Step 3 design the schemas to resist fabrication. Absent-able fields are nullable, categories use enum plus other, and you add a semantic cross-check:

{
  "type": "object",
  "properties": {
    "invoice_number": {"type": "string"},
    "currency": {"type": "string", "enum": ["USD", "EUR", "GBP", "other"]},
    "currency_other": {"type": ["string", "null"]},
    "due_date": {"type": ["string", "null"]},
    "stated_total": {"type": "number"},
    "calculated_total": {"type": "number"},
    "conflict_detected": {"type": "boolean"}
  },
  "required": ["invoice_number", "currency", "stated_total", "calculated_total", "conflict_detected"]
}

Step 4 add normalization rules in the prompt: return dates as ISO 8601, amounts as plain decimals with the currency isolated in its own field.

Step 5 read the result and validate. stop_reason is "tool_use"; find the tool_use block, and its name tells you which document type the model detected:

block = next(b for b in msg.content if b.type == "tool_use")
doc_type, data = block.name, block.input
if abs(data["calculated_total"] - data["stated_total"]) > 0.01:
    route_to_human_review(data)   # semantic mismatch the schema could not catch

If you instead needed metadata extracted before an enrichment pass, you would replace step 2 with tool_choice={"type": "tool", "name": "extract_metadata"} on a first call, then make a follow-up request for enrichment.

Exam tips

✓Read structured data from the tool_use content block's input field; when a tool is called, stop_reason is "tool_use".
✓tool_choice auto lets the model return prose; any forces it to call some tool; {"type":"tool","name":X} forces the exact tool X.
✓Use tool_choice "any" when the document type is unknown and multiple schemas exist; force a named tool to run one extraction before enrichment steps.
✓Tool use plus a strict JSON schema eliminates JSON syntax and type errors, but never semantic errors like sums that do not add up or values in the wrong field.
✓Make absent-able fields optional or nullable to stop fabrication; required fields the model cannot find get guessed.
✓Add enum value "unclear" for ambiguity and "other" plus a detail field for extensibility, and put format normalization rules (ISO dates, numeric amounts) in the prompt, not the schema.

Official exam objectives for 4.3

Knowledge of

Tool use (tool_use) with JSON schemas as the most reliable approach for guaranteed schema-compliant structured output, eliminating JSON syntax errors
The distinction between tool_choice: "auto" (model may return text instead of calling a tool), "any" (model must call a tool but can choose which), and forced tool selection (model must call a specific named tool)
That strict JSON schemas via tool use eliminate syntax errors but do not prevent semantic errors (e.g., line items that don't sum to total, values in wrong fields)
Schema design considerations: required vs optional fields, enum fields with "other" + detail string patterns for extensible categories

Skills in

Defining extraction tools with JSON schemas as input parameters and extracting structured data from the tool_use response
Setting tool_choice: "any" to guarantee structured output when multiple extraction schemas exist and the document type is unknown
Forcing a specific tool with tool_choice: {"type": "tool", "name": "extract_metadata"} to ensure a particular extraction runs before enrichment steps
Designing schema fields as optional (nullable) when source documents may not contain the information, preventing the model from fabricating values to satisfy required fields
Adding enum values like "unclear" for ambiguous cases and "other" + detail fields for extensible categorization
Including format normalization rules in prompts alongside strict output schemas to handle inconsistent source formatting

Flashcards from this lesson

How do you get guaranteed schema-compliant structured output from Claude?

Define a tool whose input_schema is a JSON Schema, then read the tool_use block's input; the API validates the shape, so it arrives already parsed.

What does tool_choice "any" guarantee that "auto" does not?

"any" forces the model to call some tool; "auto" lets it return a plain text response instead of calling anything.

How do you force a specific extraction (e.g. metadata) to run first?

Set tool_choice to {"type": "tool", "name": "extract_metadata"} so the model must call exactly that tool.

Does a strict JSON schema catch line items that do not sum to the total?

No. That is a semantic error; tool-use schemas only prevent syntax and type (shape) errors, so you validate semantics in code.

How do you stop the model fabricating values for fields missing from the source?

Make those fields optional or nullable (type includes "null") so the model can return null instead of inventing a value.

Where do format normalization rules like ISO dates and numeric amounts belong?

In the prompt. The schema constrains structure and type, but cannot normalize how a value is formatted.

How do enum "other" + detail and "unclear" values help schema design?

"other" plus a free-text detail field keeps categories extensible, and "unclear" lets the model flag ambiguity instead of forcing a wrong category.

Study all flashcards with spaced repetition

Mark this lesson complete when you are confident.

← Previous

4.2 Apply few-shot prompting to improve output consistency and quality

4.4 Implement validation, retry, and feedback loops for extraction quality