Prompt Engineering
Core prompting patterns — zero-shot, few-shot, chain-of-thought, role prompting, system prompts, and structured output — with examples and when-to-use guidance.
Prompt engineering is the art and science of crafting inputs to LLMs that reliably produce the desired output. A well-designed prompt doesn’t just get a good answer — it gets consistently good answers, reducing the variance that makes LLM-based systems unreliable.
Prompting vs. Fine-tuning
Prompting is the first tool to reach for. It’s fast to iterate, costs nothing to change, and works with any model. Fine-tuning is for when prompting has been maximised and you need additional performance gains. See Fine-Tuning for when to make the switch.
The Anatomy of a Prompt
Every effective prompt has these components (not all are always needed):
┌──────────────────────────────────────┐
│ System Prompt │
│ Role, context, output format rules │
├──────────────────────────────────────┤
│ Examples (Few-shot) │
│ Input/output demonstrations │
├──────────────────────────────────────┤
│ User Message │
│ The actual task/question │
└──────────────────────────────────────┘
Zero-Shot Prompting
Give the model a task without any examples. Relies entirely on the model’s training.
Classify the sentiment of this text as Positive, Negative, or Neutral.
Text: "The new update completely broke my workflow. Unacceptable."
Sentiment:
When to use: Simple, well-understood tasks. Classification, summarisation, translation, answering factual questions.
When NOT to use: Complex reasoning, unusual formats, tasks where the model hasn’t been trained on your specific domain.
Few-Shot Prompting
Provide a few examples of input → output pairs before the actual task. This “primes” the model on the pattern you want.
Classify sentiment:
Text: "This product is absolutely fantastic!" → Positive
Text: "Arrived late and damaged." → Negative
Text: "It works as described." → Neutral
Text: "The new update completely broke my workflow. Unacceptable." →
When to use: When zero-shot gives inconsistent results. When you need a specific output format. Custom classification schemes. Domain-specific tasks.
Number of examples: 3-5 is usually enough. More doesn’t always help and adds cost. Ensure examples cover edge cases and negative examples.
Chain-of-Thought (CoT) Prompting
For complex reasoning tasks, ask the model to “think step by step” before giving the final answer. This dramatically improves performance on multi-step problems.
A train leaves Station A at 9:00 a.m. travelling at 60 mph.
Another train leaves Station B (120 miles away) at 10:00 a.m. travelling towards A at 80 mph.
At what time do they meet?
Think step by step:
The model will reason through the problem before arriving at the answer, catching errors it would make if jumping straight to a conclusion.
When to use: Maths problems, logical reasoning, multi-step analysis, complex decision-making, debugging.
Note: CoT prompting costs more tokens and adds latency. Only use it when the task genuinely requires multi-step reasoning.
Few-Shot CoT
Combine few-shot with chain-of-thought by showing examples that include the reasoning:
Q: If I have 5 apples and give away 2, then buy 4 more, how many do I have?
A: Let me think step by step. Start with 5. Give away 2: 5 - 2 = 3. Buy 4 more: 3 + 4 = 7. Answer: 7.
Q: A store has 48 items. They sell 15% and receive a shipment of 20. How many items now?
A:
Role Prompting
Assign a persona or expertise role to the model. This activates relevant knowledge and tone.
You are a senior TypeScript engineer specialising in Domain-Driven Design
and Hexagonal Architecture. You write clean, well-typed code with comprehensive
error handling and always follow SOLID principles.
Review this code and identify any architectural violations:
[code here]
Why it works: Role framing activates patterns from the model’s training data associated with that expert persona — better terminology, more relevant knowledge, appropriate tone.
Best practice: Be specific about the role. “Senior TypeScript engineer specialising in DDD” is far better than “a software engineer.”
System Prompts
The system prompt is a persistent context that frames every interaction. It’s the ideal place for:
- Persona definition (role prompting)
- Hard rules and constraints
- Output format requirements
- Context about the application or domain
const systemPrompt = `
You are Aircury's internal engineering assistant, specialised in our
Framework (OpenSpec Extended with DDD and Hexagonal Architecture).
## Rules
- Always produce TypeScript code following SOLID principles
- Domain classes must never import from infrastructure packages
- All dependencies must be constructor-injected as interfaces
- Commit messages must follow Conventional Commits format
## Context
The codebase uses:
- TypeScript 5.4+
- Hexagonal Architecture (src/domain, src/application, src/infrastructure)
- PostgreSQL with a custom ORM wrapper
- Jest for testing, Cucumber for BDD
`;
System Prompt Persistence
In API usage, the system prompt is sent with every request. Keep it focused and concise — verbose system prompts waste tokens and can dilute key instructions. Put task-specific context in the user message, not the system prompt.
Structured Output
Force the model to output in a specific parseable format. Critical for building reliable pipelines.
// Force JSON output via prompt
const prompt = `
Analyse this code review and extract issues in JSON format:
\`\`\`typescript
${code}
\`\`\`
Respond with ONLY valid JSON (no markdown, no explanation):
{
"issues": [
{
"severity": "error" | "warning" | "info",
"principle": "SRP" | "OCP" | "LSP" | "ISP" | "DIP" | "other",
"description": "string",
"line": number | null,
"suggestion": "string"
}
],
"summary": "string"
}
`;
Better: Use OpenAI’s JSON mode or function calling / tool use for reliable structured output:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
response_format: { type: 'json_object' }, // Guaranteed valid JSON
messages: [{ role: 'user', content: prompt }],
});
Or Anthropic’s tool use (forces specific schema):
const response = await anthropic.messages.create({
model: 'claude-opus-4-5',
tools: [{
name: 'extract_code_review',
description: 'Extract code review findings',
input_schema: {
type: 'object',
properties: {
issues: { type: 'array', ... },
summary: { type: 'string' }
}
}
}],
tool_choice: { type: 'tool', name: 'extract_code_review' },
messages: [{ role: 'user', content: prompt }],
});
Prompt Engineering Best Practices
| Practice | Details |
|---|---|
| Be specific | ”Write a TypeScript function” < “Write a TypeScript function that takes a UserId, queries PostgreSQL via the UserRepository interface, and throws a UserNotFoundError if the user doesn’t exist.” |
| Put context first | State what the model needs to know before what it needs to do. Context → Task → Format. |
| Use delimiters | Separate content from instructions: code blocks, XML-like <document> tags, or --- separators. |
| Specify the output format | ”Respond with only valid JSON.” “Write a bulleted list.” “Provide a single TypeScript function.” |
| Avoid negatives | ”Do not use var” → “Use const and let only.” Negatives are less reliable. |
| Test edge cases | Your prompt works on one example. Does it work on 100? On adversarial inputs? |
| Version your prompts | Treat prompts like code — track changes, use variables for dynamic content, test changes. |
Prompt Optimisation Loop
Write prompt → Test on 10+ examples → Find failure cases →
Diagnose cause → Adjust prompt → Repeat
This is uncomfortable for engineers who expect binary pass/fail. Prompt optimisation is empirical — you’re tuning a fuzzy system. Use Evaluation to make this process rigorous.