Implement multi-step workflows with enforcement and handoff patterns
Production agents that touch money, identity, or compliance cannot depend on the model choosing to follow the right order. This task teaches when to move workflow ordering out of the prompt and into programmatic gates that deterministically block a tool until its prerequisites are met, how to break a multi-concern request into items you resolve together, and how to package a self-contained handoff when a human must take over.
Enforcement versus guidance
There are two ways to make an agent do step A before step B. Prompt-based guidance puts the rule in the system prompt or in few-shot examples: "always call get_customer before process_refund." Programmatic enforcement puts the rule in code that runs outside the model, such as a hook or a prerequisite gate that refuses to let the downstream tool run until the prior step has completed.
Guidance shapes behavior probabilistically. The model usually complies, but it is still free to skip the step when a shortcut looks reasonable, for example when a customer volunteers an order number and the model decides it can go straight to the lookup. Enforcement removes that freedom: the tool call is intercepted and blocked regardless of what the model decided.
The exam frames this as a spectrum, not a rivalry. You keep the prompt guidance because it makes the common path smooth and cheap, and you add enforcement on the specific transitions where a single violation is unacceptable. The skill being tested is recognizing which transitions need the hard guarantee.
Prerequisite gates with a PreToolUse hook
The mechanism for a prerequisite gate in the Claude Agent SDK is a PreToolUse hook. The hook fires before a tool executes, inspects the tool name and input, checks whatever state records that the prerequisite ran, and either allows the call or denies it with a reason.
from claude_agent_sdk import ClaudeAgentOptions, HookMatcher
verified = {"customer_id": None} # set by a PostToolUse hook on get_customer
async def gate_order_ops(input_data, tool_use_id, context):
tool = input_data["tool_name"]
if tool in ("lookup_order", "process_refund") and not verified["customer_id"]:
return {"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason":
"Call get_customer and obtain a verified customer ID first."}}
return {}
options = ClaudeAgentOptions(
hooks={"PreToolUse": [HookMatcher(hooks=[gate_order_ops])]})
When the gate denies a call, the reason is returned to the model rather than crashing the turn. The model reads "call get_customer first," does exactly that, the prerequisite state is set, and the retried process_refund now passes the gate. The block is corrective, not fatal.
Claude Code exposes the same idea declaratively in settings.json (a PreToolUse matcher whose command exits with code 2 or prints a deny decision blocks the tool). The TypeScript SDK offers an equivalent canUseTool callback that returns { behavior: 'deny', message }. All three share the property that matters for the exam: the decision is made in deterministic code, not by the model.
Deciding when deterministic compliance is required
Not every ordering rule deserves a gate. The dividing line the exam draws is the cost of a single violation. When an out-of-order action is irreversible or has financial, legal, or safety consequence, prompt instructions are not enough, because their failure rate is small but non-zero. Identity verification before a refund, authorization before releasing funds, and a signed approval before a production deploy all belong here.
When the worst case of a skipped step is mild inconvenience or a slightly worse answer, guidance is the proportionate choice. Forcing a gate onto every soft preference adds brittleness and blocks legitimate variation for no benefit.
A subtle trap: enforcing ordering is not the same as restricting availability. A router that enables only a subset of tools per request type controls which tools exist for the turn, but it does not guarantee that get_customer runs before process_refund when both are enabled. If the requirement is sequence, the answer is a prerequisite gate, not a tool-availability switch.
Decomposing multi-concern requests
Real support messages often bundle several problems: "My order arrived damaged, I think I was charged twice, and my subscription never cancelled." The failure mode is answering only the first concern, or working them one at a time and losing track of the rest.
The pattern is to decompose the message into distinct items up front, then investigate each item using shared context so you do not redo expensive prerequisites. The verified customer ID from a single get_customer call is reused across the damage lookup, the billing lookup, and the subscription check. Independent items can be investigated in parallel, then merged.
The final move is synthesis into one unified resolution rather than three disconnected replies. The customer gets a single coherent answer that addresses the damaged item, the duplicate charge, and the subscription in one place, which is both better service and easier to audit.
Structured handoff protocols for escalation
When the agent escalates mid-process, for example because a refund exceeds an auto-approval limit or the policy is silent on the request, it hands off to a human. The critical constraint is that the human agent cannot see the conversation transcript. A bare "escalating this customer" forces the human to start the investigation over.
A structured handoff makes the escalation self-contained. The escalate_to_human tool should carry the verified customer ID, a concise root-cause analysis, the concrete numbers (refund amount, order IDs, statuses), and a recommended action, plus what the agent already attempted.
{
"customer_id": "CUS-88231",
"verified": true,
"issue_summary": "Duplicate charge on order #4471 plus item damaged in transit.",
"root_cause": "Payment retry created a second capture; box crushed on delivery.",
"refund_amount": 640.00,
"recommended_action": "Approve full refund (exceeds $500 auto-limit) and ship replacement.",
"attempted": ["get_customer", "lookup_order x2", "process_refund blocked: over $500"]
}
The human reads one record and acts. This is the same discipline as preserving case facts across a long conversation, applied at the boundary where context stops being shared automatically.
Defense in depth: combine guidance and enforcement
The strongest designs use both layers deliberately. The system prompt still tells the agent to verify identity first and to compile a full handoff before escalating, so that on the happy path the model does the right thing without ever hitting a gate. That keeps latency and token cost low and the interaction natural.
The gate then sits underneath as a guarantee for the one invariant that must never break. If the model ever deviates, the hook catches it and feeds back a correction. Guidance optimizes the common case; enforcement bounds the worst case.
Think of the prompt as the intended path and the gate as a guardrail. You would not remove the guardrail because drivers usually stay on the road, and you would not rely on the guardrail to do the steering.
Anti-patterns to avoid
Why it fails: Prompt instructions are probabilistic. Even emphatic 'you MUST verify first' language has a non-zero failure rate, so a fraction of financial transactions still execute out of order, which is exactly what happened when the agent skipped get_customer in 12% of cases.
instead Add a PreToolUse prerequisite gate that blocks lookup_order and process_refund until get_customer has returned a verified customer ID. Keep the prompt guidance too, but let the gate carry the guarantee.
Why it fails: Toggling which tools are available controls availability, not sequence. When both get_customer and process_refund are enabled for a refund request, nothing forces the verification step to run first.
instead Use a prerequisite gate keyed on completed-step state. Reserve tool-availability scoping for reducing selection complexity, not for enforcing order.
Why it fails: The human agent has no access to the conversation transcript, so they must re-verify identity, re-run lookups, and reconstruct the root cause, defeating the purpose of the escalation and slowing resolution.
instead Compile a structured handoff summary containing the verified customer ID, root cause, concrete amounts, recommended action, and what was already attempted, so the human can act immediately.
Why it fails: The remaining concerns are silently dropped, the customer has to re-contact support, and first-contact resolution falls, missing the target.
instead Decompose the message into distinct items first, reuse the single verified customer ID as shared context, investigate the items in parallel, and synthesize one unified resolution.
Worked example: Enforcing verify-before-refund and a clean escalation in the support agent
Scenario. You are building the Customer Support Resolution Agent (Scenario 1) on the Claude Agent SDK with MCP tools get_customer, lookup_order, process_refund, and escalate_to_human. Production logs show that in 12% of cases the agent calls lookup_order using only the customer's stated name and skips get_customer entirely, occasionally refunding the wrong account. Prompt tweaks have not closed the gap.
Step 1: make the invariant deterministic. Add a PostToolUse hook that records the verified ID whenever get_customer succeeds, and a PreToolUse gate that blocks the downstream tools until it exists.
verified = {"customer_id": None}
async def record_verification(input_data, tool_use_id, context):
if input_data["tool_name"] == "get_customer":
result = input_data.get("tool_response", {})
if result.get("status") == "verified":
verified["customer_id"] = result["customer_id"]
return {}
async def gate_order_ops(input_data, tool_use_id, context):
if input_data["tool_name"] in ("lookup_order", "process_refund") \
and not verified["customer_id"]:
return {"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "deny",
"permissionDecisionReason":
"Verify identity with get_customer before order operations."}}
return {}
options = ClaudeAgentOptions(hooks={
"PreToolUse": [HookMatcher(hooks=[gate_order_ops])],
"PostToolUse": [HookMatcher(matcher="get_customer", hooks=[record_verification])]})
Now the 12% path self-corrects: the gate denies the premature lookup, the model reads the reason, calls get_customer, and retries. Compliance goes from probabilistic to guaranteed.
Step 2: decompose a multi-concern message. The customer writes: "My order #4471 came smashed and I think you charged me twice." The agent identifies two items, calls get_customer once (shared context), then investigates both against that single verified ID: lookup_order for the damaged item and a billing check for the duplicate capture. It prepares one unified reply covering both, not two separate threads.
Step 3: escalate with a structured handoff. The combined refund is $640, above the $500 auto-approval limit, so the agent must escalate rather than call process_refund (which the policy layer would also block). It calls escalate_to_human with a self-contained record: customer_id CUS-88231, root cause (payment retry double-capture plus transit damage), refund_amount 640.00, and recommended_action (approve refund, ship replacement), plus the attempted steps. The human, who never sees the chat, approves in one glance.
Why this is the exam-correct design. The reliability fix is the programmatic prerequisite (matching sample question 1's correct answer over prompt hardening, few-shot examples, or a tool-availability router), the multi-concern request is decomposed and synthesized rather than partially answered, and the escalation is a structured handoff rather than a bare ping.
Exam tips
- ✓When a step must precede a financial or identity-sensitive action, enforce it with a PreToolUse prerequisite gate, not a 'mandatory' system-prompt line. Prompt instructions have a small but non-zero failure rate.
- ✓A prerequisite gate denies the downstream tool with a reason; the model then runs the prerequisite and retries. It corrects the model rather than silently dropping or crashing the call.
- ✓Enforcing tool ordering is different from restricting tool availability. A router that toggles which tools exist per request type does not guarantee sequence; only a state-checking gate does.
- ✓A handoff summary must be self-contained (verified customer ID, root cause, refund amount, recommended action, what was attempted) because the human agent cannot see the conversation transcript.
- ✓For a multi-concern message, decompose into distinct items, reuse one verified customer ID as shared context, investigate in parallel, then synthesize a single unified resolution instead of answering only the first concern.
- ✓Keep prompt guidance and the gate together: guidance smooths the common path, the gate guarantees the invariant that must never break (defense in depth).
Official exam objectives for 1.4
- The difference between programmatic enforcement (hooks, prerequisite gates) and prompt-based guidance for workflow ordering
- When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate
- Structured handoff protocols for mid-process escalation that include customer details, root cause analysis, and recommended actions
- Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed (e.g., blocking process_refund until get_customer has returned a verified customer ID)
- Decomposing multi-concern customer requests into distinct items, then investigating each in parallel using shared context before synthesizing a unified resolution
- Compiling structured handoff summaries (customer ID, root cause, refund amount, recommended action) when escalating to human agents who lack access to the conversation transcript
Flashcards from this lesson
Why is a system-prompt rule like 'always verify identity before refunds' insufficient for a financial workflow?
Prompt instructions are probabilistic and have a non-zero failure rate, so the model can still skip the step. Deterministic compliance requires a programmatic prerequisite gate.
What Agent SDK mechanism blocks process_refund until get_customer has returned a verified ID?
A PreToolUse hook that checks the tool name and prerequisite state and returns permissionDecision 'deny' with a reason when the verified customer ID is missing.
When a prerequisite gate denies a tool call, what happens next?
The deny reason is returned to the model, which performs the missing prerequisite (get_customer), then retries the blocked tool, which now passes the gate. The block is corrective, not fatal.
Why must an escalation handoff summary be self-contained?
The human agent has no access to the conversation transcript, so the handoff must include the verified customer ID, root cause, refund amount, recommended action, and attempted steps.
A customer reports three separate problems in one message. What is the correct handling pattern?
Decompose into distinct items, verify the customer once and reuse that ID as shared context, investigate the items in parallel, then synthesize one unified resolution.
Does a router that enables only certain tools per request type fix a tool-ordering bug?
No. That controls tool availability, not sequence. Guaranteeing that get_customer runs before process_refund requires a state-checking prerequisite gate.