3.512 min

Apply iterative refinement techniques for progressive improvement

Iterative refinement is how you get production-quality output from Claude Code across multiple turns instead of in one shot. The exam tests which feedback mechanism fits which situation: concrete input/output examples when prose is interpreted inconsistently, test-driven iteration that guides fixes with real failing tests, the interview pattern to surface hidden design decisions, and knowing when to batch interacting fixes versus iterate on independent ones. Picking the right technique converges faster and avoids regression whack-a-mole.

The iterative refinement loop: prompt with concrete examples, run pre-written tests, and feed failing tests plus edge cases back until everything is green. Side notes show the interview pattern and the interacting-versus-independent rule.

The refinement mindset: steer, don't re-describe

Claude Code is most effective when you treat it as a collaborator you steer across several turns, not a one-shot code generator. The first output is a draft. The leverage comes from how precisely you correct it, and the fastest teams do not converge by re-explaining the goal in different words. They converge by giving Claude sharper, more concrete feedback signals: worked examples, failing tests, and targeted questions.

The testable idea in this task statement is choosing the right feedback mechanism for the situation. Prose that keeps producing inconsistent results is a signal to switch to examples. A behavior that is hard to describe but easy to check is a signal to switch to tests. An unfamiliar domain is a signal to have Claude interview you first. These are prompting and workflow techniques, not CLI features. There is no special flag that turns them on.

Concrete input/output examples beat prose

When a prose description ("normalize the phone numbers", "default missing fields") is interpreted inconsistently across runs, the single highest-leverage fix is to show 2-3 concrete input to output pairs rather than rewriting the sentence. Examples remove the ambiguity that natural language leaves open: exact formatting, casing, how blanks versus nulls differ, what an edge row maps to.

This is the same principle as few-shot prompting from domain 4, applied to code transformations. Two or three examples is the sweet spot the blueprint cites: enough to pin the pattern, few enough to stay readable.

Input:  "(415) 555-0100"   Output: "+14155550100"
Input:  ""                 Output: null

If you find yourself typing the same prose instruction a third time with different adjectives, stop and give examples instead.

Test-driven iteration: tests first, then share failures

For behavior that is hard to describe but easy to verify, write the test suite first, covering expected behavior, edge cases, and performance requirements, then have Claude implement against it. You then iterate by pasting the actual failing test output back into the conversation.

A failing test is an unambiguous, machine-checked feedback signal. "test_blank_phone_becomes_null failed: got '' expected None" tells Claude exactly what to fix, which is far more effective than "handle blanks better." Tests also act as a regression guard, so a fix for one case cannot silently break a case that already passed.

def test_blank_phone_becomes_null():
    assert migrate({"phone": ""})["phone"] is None

Sharing the failure text, not a paraphrase of it, is the key move. The exact assertion and observed value are the highest-signal input you can give.

Targeted test cases pin down edge cases

When a specific edge case is mishandled, provide a specific test case with example input and expected output rather than describing the bug abstractly. The blueprint's canonical example is null values in a migration script: instead of saying "it does not handle nulls," give the exact row and the exact expected result.

This is the debugging-time application of the examples principle. Abstract descriptions ("be robust to missing data") leave the model guessing about which of several plausible behaviors you want. A concrete case ("input {signup_date: null} should produce signup_date: \"1970-01-01\"") collapses that ambiguity to one answer and doubles as a permanent regression test.

The interview pattern for unfamiliar domains

Before implementing in a domain you do not know well, ask Claude to interview you: to ask clarifying questions that surface considerations you may not have anticipated. This front-loads design decisions instead of letting them get baked in implicitly and wrongly.

Before writing any code, ask me the questions you need
answered about requirements, edge cases, and failure modes.

Typical considerations the interview surfaces include cache invalidation strategy, failure and retry modes, concurrency and idempotency, and boundary conditions like timezones or pre-epoch dates. The pattern is most valuable precisely when you cannot fully specify the task yourself, because Claude's questions expose the gaps in your own mental model before code exists to throw away.

Batching feedback: interacting versus independent issues

When you have multiple issues to fix, whether to send them together depends on whether they interact. Provide all issues in a single detailed message when the fixes interact, so Claude can reason about them together and produce one coherent change. Fix issues sequentially when they are independent, which keeps each change small and easy to review.

Getting this backwards causes trouble. If two bugs share a helper (say a coerce() function) and you fix them one at a time, each fix can reintroduce or break the other, producing regression whack-a-mole. Conversely, dumping ten unrelated issues into one message makes the diff hard to review and hard to attribute. The rule to memorize: interacting problems go in one message, independent problems iterate sequentially.

Anti-patterns to avoid

avoid

Rewriting the prose prompt over and over when output is inconsistent

Why it fails: The same ambiguity remains, so the model keeps making different reasonable guesses each run. More adjectives do not narrow the space of valid interpretations.

instead Replace prose with 2-3 concrete input/output examples that pin the exact transformation, including how edge rows map.

avoid

Fixing interacting bugs one at a time

Why it fails: When fixes share code paths, each correction can reintroduce or break a related behavior, causing the output to oscillate instead of converge.

instead Send all interacting issues in a single detailed message so Claude reasons about them together; reserve sequential iteration for genuinely independent issues.

avoid

Iterating by eyeballing output and saying "still wrong"

Why it fails: Vague natural-language feedback gives the model no precise target and no protection against regressions in cases that already worked.

instead Write tests first, then paste the exact failing assertion and observed value so the fix targets the specific defect.

avoid

Jumping straight to implementation in an unfamiliar domain

Why it fails: Unstated design decisions (cache invalidation, failure modes, timezone handling) get made implicitly, and you often discover the wrong choice only in production.

instead Use the interview pattern: ask Claude to question you about requirements and edge cases before it writes any code.

Worked example: Fixing a data migration script (Code Generation with Claude Code)

Scenario 2. You ask Claude Code to write a script migrating a legacy customers table into a new schema. Your first prompt is prose: "Convert each row, normalize the phone numbers, and default missing fields." The output looks plausible, but three runs produce three different behaviors for blank phone fields and for rows where signup_date is null.

Step 1, replace prose with concrete examples. Instead of re-describing the rule, give 2-3 input/output pairs that fix the exact mapping:

Input:  {"phone": "(415) 555-0100", "signup_date": null}
Output: {"phone": "+14155550100", "signup_date": "1970-01-01"}

Input:  {"phone": "", "signup_date": "2021-03-02"}
Output: {"phone": null, "signup_date": "2021-03-02"}

Step 2, lock behavior with tests first. Before asking for more code, write the suite and hand it to Claude:

def test_null_signup_defaults_epoch():
    assert migrate({"phone": "(415) 555-0100", "signup_date": None})["signup_date"] == "1970-01-01"

def test_blank_phone_becomes_null():
    assert migrate({"phone": "", "signup_date": "2021-03-02"})["phone"] is None

Step 3, iterate on failures, not vibes. Run the suite and paste the real failure: "test_blank_phone_becomes_null failed: got '' expected None." That single unambiguous signal fixes the precise edge case far faster than "handle blanks better."

Step 4, batch interacting fixes. If correcting the null-date default also shifts how empty strings are coerced, because both flow through a shared coerce() helper, send both corrections in one message so Claude reasons about them together instead of oscillating between them.

Step 5, use the interview pattern for unfamiliar parts. If the migration touches an area you do not know well, such as timezone handling for signup_date, open with: "Before writing code, ask me any questions about edge cases and assumptions you need resolved." Claude surfaces decisions (UTC versus local, DST boundaries, pre-epoch dates) you would otherwise discover in production.

Exam tips

✓When a prose description is interpreted inconsistently, the recommended fix is 2-3 concrete input/output examples, not more prose and not "be careful" or "be conservative" instructions.
✓Test-driven iteration means write the test suite first, then guide fixes by sharing the actual failing test output, not a paraphrase of it.
✓The interview pattern is having Claude ask you clarifying questions before implementing; it is most valuable in unfamiliar domains to surface considerations like cache invalidation and failure modes.
✓Interacting issues (fixes that affect each other) go in one detailed message; independent issues are iterated sequentially.
✓To fix a specific edge case such as null values in a migration script, provide a concrete test case with example input and expected output, not an abstract bug description.
✓These are prompting and workflow techniques with no special flag; distractors like a --batch flag or a CLAUDE_HEADLESS variable are fabricated features.

Official exam objectives for 3.5

Knowledge of

Concrete input/output examples as the most effective way to communicate expected transformations when prose descriptions are interpreted inconsistently
Test-driven iteration: writing test suites first, then iterating by sharing test failures to guide progressive improvement
The interview pattern: having Claude ask questions to surface considerations the developer may not have anticipated before implementing
When to provide all issues in a single message (interacting problems) versus fixing them sequentially (independent problems)

Skills in

Providing 2-3 concrete input/output examples to clarify transformation requirements when natural language descriptions produce inconsistent results
Writing test suites covering expected behavior, edge cases, and performance requirements before implementation, then iterating by sharing test failures
Using the interview pattern to surface design considerations (e.g., cache invalidation strategies, failure modes) before implementing solutions in unfamiliar domains
Providing specific test cases with example input and expected output to fix edge case handling (e.g., null values in migration scripts)
Addressing multiple interacting issues in a single detailed message when fixes interact, versus sequential iteration for independent issues

Flashcards from this lesson

A prose description of a transformation keeps producing inconsistent code across runs. What is the most effective fix?

Provide 2-3 concrete input/output examples that pin the exact transformation, instead of rewriting the prose.

What is test-driven iteration with Claude Code?

Write the test suite first (expected behavior, edge cases, performance), have Claude implement against it, then iterate by sharing the actual failing test output.

When do you send all issues in one message versus fixing them sequentially?

One detailed message when the issues interact (fixes affect each other); sequential iteration when the issues are independent.

What is the interview pattern and when is it most useful?

Have Claude ask you clarifying questions before it writes code. It is most useful in unfamiliar domains to surface hidden considerations like cache invalidation and failure modes.

Best way to fix a specific edge case such as null values in a migration script?

Give a concrete test case with example input and expected output, not an abstract description of the bug.

Why prefer a failing test over saying "it's still wrong" as feedback?

A failing test is unambiguous and machine-checked, targets the exact defect, and guards against regressions in cases that already passed.

Study all flashcards with spaced repetition

Mark this lesson complete when you are confident.

← Previous

3.4 Determine when to use plan mode vs direct execution

3.6 Integrate Claude Code into CI/CD pipelines