The Task section is the intent encoding layer of your prompt. It is where you compress a potentially complex goal into two tightly coupled clauses: the directive (what operation should be performed) and the outcome (what state should exist when the operation is complete). Claude's language model architecture means it does not reason about goals the way a human engineer would — it performs next-token prediction conditioned on everything in its context window. The Task section's job is to make the intended goal maximally unambiguous inside that context so that the most probable token sequence it generates is also the most useful one.
A directive alone — "write a HubSpot connector" — is underspecified. It describes an action but not its success condition. Claude will fill the gap with its own prior over what "a HubSpot connector" looks like, drawn from training data. That prior may not match your codebase's conventions, your quality bar, or your integration requirements. Adding an outcome — "so that all contacts are normalised into the canonical table with no data loss, the mapper passes all existing tests, and the PR checklist is satisfied" — shifts the distribution of probable outputs dramatically. Claude now has a verifiable end-state to optimise toward.
Every ambiguity in your Task section is cognitive load that gets transferred to Claude — and Claude resolves ambiguity using priors, not your intent. The more precisely you specify what done looks like, the less Claude has to infer, and the less likely it is to infer incorrectly. Think of it as reducing the entropy of the output distribution: a vague task has high entropy (many plausible outputs); a precise task with outcome conditioning has low entropy (a narrow range of correct outputs).
The directive should identify: (1) the operation (write, refactor, debug, explain, design, test), (2) the object (which class, file, system, or concept), and (3) the scope (new from scratch, edit existing, extend interface). Missing any of these forces Claude to assume defaults that may be wrong.
The outcome clause should be falsifiable. "High-quality output" is not falsifiable — no test can determine pass/fail. "mvn test passes with zero new failures, all canonical fields populated, no hardcoded credentials" is falsifiable. You or a CI system can run it and get a binary result. When the outcome is falsifiable, Claude can self-evaluate its own output against it before presenting it to you — which measurably reduces iteration cycles.
ZendeskTicketConnector and its corresponding ZendeskTicketFieldMapper from scratch so that: (1) all open Zendesk tickets are fetched via OAuth2 pagination and normalised into CanonicalRecord using the canonical fields in references/schema.md, (2) ZendeskTicketFieldMapperTest passes with coverage of happy path, null fields, and missing id, and (3) the PR checklist in references/adding-a-source.md is fully satisfied."
Claude operates with a finite context window. Everything inside that window has equal "recency" from the model's perspective — there is no persistent memory between turns unless you explicitly carry it forward. Context Files is the mechanism by which you pre-load domain-specific knowledge into the window before any generation occurs, ensuring that Claude's outputs are conditioned on your project's actual state rather than statistical priors from training.
The phrase "completely before responding" is not stylistic — it is an explicit sequential processing constraint. Without it, Claude may begin generating in a streaming fashion before fully processing all referenced files, anchoring on early content and progressively discounting later material. This is especially dangerous when a later file (e.g., schema.md) contradicts an assumption formed from an earlier one. The ordering constraint forces a complete read-then-respond pattern.
Each entry in your Context Files list should follow the pattern: filename — one-sentence description of what it constrains. The description serves two functions: it tells Claude when to consult the file during reasoning (not just that it exists), and it exposes gaps — if you can't write the one-sentence description, the file probably doesn't have a clear purpose.
Over-inclusion is as harmful as under-inclusion. Every token of context you add compresses the "attention budget" available for task-relevant reasoning. Include only files that constrain or inform the specific output you need. For a new connector: architecture + process + schema + testing. For a debugging task: architecture + debugging guide only. The selection itself communicates task scope to Claude — a long file list signals a complex, multi-constraint task; a short one signals a focused, bounded task.
Order matters. Place files that establish global constraints (SKILL.md, PROMPT.md) first — they set the frame. Place files that establish task-specific constraints (adding-a-source.md, schema.md) second. Place files that provide reference material (testing.md, debugging.md) last. Claude's attention is slightly recency-biased, so task-specific constraints placed later in the context are weighted more heavily during generation — a feature, not a bug.
References are the most powerful calibration tool in prompt engineering. They operate through the same mechanism as few-shot in-context learning — by providing concrete examples of desired output, you shift Claude's output distribution toward that example's style, structure, and quality level far more reliably than descriptive instructions alone. The reason is fundamental: language models are trained to complete patterns; an example is a direct pattern signal, whereas an adjective like "clean" or "idiomatic" requires the model to resolve the adjective against its training prior, which may not match your standard.
Consider the instruction "write clean, idiomatic, well-documented Java." Each adjective — clean, idiomatic, well-documented — has an enormous range of valid interpretations. "Clean" to one engineer means no Lombok; to another it means short methods; to another it means no raw types. When you provide the existing SalesforceLeadsConnector as a reference, Claude doesn't need to resolve any of those ambiguities — it reads the actual code and extracts the conventions directly. The example encodes intent with zero loss compared to natural language description.
A reference alone is incomplete. Without explicit rule extraction, Claude may identify the wrong properties as salient. If your reference uses AbstractFieldMapper, Claude might pattern-match on the class name rather than the null-handling contract. The reverse-engineering step — your list of "Always" and "Never" rules derived from the example — tells Claude which features of the reference are load-bearing versus incidental.
Don't only show what you want — show what you explicitly don't want. If a previous connector used System.out.println for logging, show that connector and mark it as the anti-pattern. Negative examples constrain the output distribution from below, complementing how positive examples constrain it from above. Together they define a tight target band.
The Success Brief is an output specification contract. Its purpose is to close the gap between what Claude infers "good output" to mean (based on its training distribution) and what you actually need. Without it, Claude optimises for the most probable interpretation of the task — which is, statistically, a generic implementation that satisfies the literal directive but may miss your quality bar, scope constraints, and audience expectations entirely.
Specify the artifact type and size budget explicitly. Type: Java class, test file, YAML config snippet, inline explanation, architectural decision record. Length: approximate line count or file count. These constraints prevent two failure modes: scope creep (Claude produces a full framework when you needed a single class) and under-delivery (Claude produces a stub with TODOs when you needed a complete, runnable implementation). If you can't state the type and length, you haven't fully scoped the task.
This is the most technically powerful field and the most commonly omitted. Instead of describing output properties, describe what the recipient should be able to do immediately after reading. This is the "action test" — it forces outcome-orientation and eliminates vague quality descriptors. Compare:
mvn test, and merge the PR within 10 minutes."Negative constraints are among the highest-ROI additions to any prompt. They eliminate entire failure modes before generation begins. Common anti-patterns to call out for DataHarness: generic boilerplate without project-specific conventions, missing null checks on optional fields, skipping pagination in the fetch loop, using System.out.println instead of ctx.debug(), hardcoding the base URL, returning null instead of List.of() for empty fetch results. Each one you name removes a class of bad output from consideration.
This field should be a concrete, programmatically verifiable statement. Think of it as the acceptance criteria in a user story — not "high quality" but a precise list of conditions that can be checked. The more your success criteria map to things a CI pipeline could verify, the better. Examples: mvn test -Dtest=ZendeskTicketFieldMapperTest exits 0; no calls to System.out.println (checked by grep); sourceId() returns "zendesk-tickets"; all fields in references/schema.md §Identity are mapped or explicitly null.
Rules are hard constraints that are invariant across all tasks in a given project context. Unlike instructions (which are task-scoped) and references (which are output-scoped), rules are project-scoped: they apply regardless of what the task is, and they cannot be overridden by task-specific instructions without explicit acknowledgment. They encode the non-negotiables that your project has accumulated — architectural decisions, security requirements, team conventions, legal constraints — in a form Claude can check against during generation.
Effective rule sets have coverage across at least four categories:
A rule is well-formed if it satisfies all four properties: unconditional (no "usually" or "where possible"), unambiguous (a third party could apply it consistently), verifiable (can be checked by static analysis, grep, or code review), and scoped (it applies to a specific domain of output, not "everything").
Without this directive, Claude has two default behaviors when a rule conflicts with a task: silent violation (comply with the task, ignore the rule) or silent refusal (apply the rule, produce incomplete output, explain vaguely). Neither is acceptable. The directive creates a third behavior: explicit conflict report. Claude names the rule, names the task requirement that violates it, and blocks on your decision. This turns rule violations from silent failures into collaborative decision points — you may grant a one-time exception, reframe the task, or discover the rule needs updating.
The Conversation section is a generation circuit-breaker. It prevents the most expensive failure mode in LLM-assisted development: Claude generating a complete, internally consistent, but fundamentally wrong solution because it resolved ambiguity silently in the wrong direction. The cost of this failure scales with output length — a 200-line class built on a wrong assumption requires more rework than a 2-line stub. Catching that assumption before generation costs one exchange; catching it after costs many.
Not all ambiguity is worth a clarifying exchange. The test is: would different reasonable interpretations produce materially different outputs? "Should I use OAuth2 or API key auth for this connector?" — yes, the entire auth implementation diverges. "Should I add a blank line after the constructor?" — no, the output is functionally identical. Only branch the solution space when the branches lead to structurally different code, significantly different scope, or divergent architectural choices.
Claude should ask questions that partition the solution space — each answer eliminates a class of possible outputs. Structurally, a good clarifying question has exactly two to four discrete answers, each leading to a meaningfully different implementation path. Questions with open-ended answers are better placed in the Plan section as explicit assumptions Claude states before you confirm them.
The phrase "step by step" is a serialisation constraint on the clarification dialogue. Without it, Claude may batch all its uncertainties into a single wall of questions — which overwhelms the user and prevents early answers from informing later questions. Serialised dialogue — one or two questions, answers, then next questions informed by those answers — produces better-targeted questions and faster convergence on a shared understanding. It mirrors how skilled engineers conduct requirement interviews.
For DataHarness specifically, high-value questions include: auth type (OAuth2 vs API key vs basic), pagination strategy (cursor vs offset), rate limit handling (retry with backoff vs fail-fast), whether the source supports incremental sync, and whether the canonical mapping requires any new schema fields not yet in references/schema.md.
The Plan section forces reasoning externalisation — it asks Claude to make its interpretation of the task and its intended execution strategy legible to you before any code is written. This is technically significant: Claude's generation process is largely opaque. The plan step creates a visible checkpoint at which you can verify alignment between Claude's interpretation and your intent, catch errors in reasoning before they propagate through hundreds of lines of output, and redirect cheaply.
Asking Claude to list all applicable rules produces a laundry list — it transfers the prioritisation problem back to you. Asking for exactly three forces a ranking. Claude must decide which rules are architecturally load-bearing for this specific task. That decision is itself informative: if Claude's top three don't match yours, you have a misalignment to correct before execution. The rules Claude selects reveal how it has parsed your task.
Each element of the plan tells you something about Claude's understanding:
A plan is distinct from chain-of-thought reasoning. CoT is internal deliberation embedded in the output stream. A plan is a structured, reviewable contract that precedes output. Plans should be numbered, action-oriented, and bounded: "Step 1: Implement ZendeskTicketConnector.fetch() with cursor pagination. Step 2: Implement ZendeskTicketFieldMapper mapping the seven canonical identity fields. Step 3: Write ZendeskTicketFieldMapperTest covering happy path, null email, and missing id." This is legible, reviewable, and correctable in under 30 seconds.
Alignment is the execution gate — the formal handoff between the specification phase (Chapters I–VII) and the execution phase. It is not a rubber stamp or a courtesy check. It is the moment at which you and Claude establish a shared mental model of the task, its constraints, its success criteria, and its execution strategy. Everything before it prepares for this moment; everything after it builds on it. Skipping alignment is betting that every prior section was interpreted exactly as intended — and that bet compounds incorrectly with task complexity.
Requiring a maximum 5-step execution plan is a complexity budget, not an arbitrary limit. If a task genuinely requires more than five high-level steps, it should be decomposed into subtasks, each with its own prompt. Accepting a 12-step plan is accepting a task that is too broad to execute reliably in a single context window without accumulating compounding errors. When Claude can't compress the plan to five steps, that is signal to decompose the task, not to accept a longer plan.
Alignment is a bidirectional verification protocol. From your side: does Claude's plan match your intent? Does the step ordering match the logical dependency graph of the task? Are the three surfaced rules actually the most critical ones? From Claude's side: has it correctly parsed the Task, Success Brief, and Rules? Does it have all the context it needs, or are there gaps? A 30-second review of a well-formed plan catches the majority of misalignments before a single line of code is generated.
The eight sections of the prompt anatomy are not independent — they form a mutually reinforcing constraint system. Each section reduces the entropy of the output distribution further. Task narrows the operation space. Context Files loads domain knowledge. Reference calibrates the style distribution. Success Brief defines the acceptance boundary. Rules hard-constrains the solution space. Conversation resolves ambiguity forks. Plan externalises the reasoning path. Alignment confirms the shared model before execution.