Context Engineering
The discipline of building and managing the context that AI agents receive — why what you put in the context window matters more than the model you choose, and how to do it well.
Context engineering is the discipline of deliberately constructing the information you provide to an LLM. The model you choose matters. The prompt you write matters. But what you put in the context window — what information the model can see when it reasons — has more impact on output quality than either of those combined.
This is especially true for agents, where context includes not just user instructions but file contents, tool results, conversation history, and retrieved documents. Managing that context well is the difference between an agent that works and one that produces expensive, incoherent results.
A useful reframe
Think of the context window not as a place to dump everything relevant, but as a workspace you’re preparing for a skilled collaborator. What does this person need to know right now? What would distract them? What should they be reminded of? The answers to those questions are your context engineering decisions.
Two Common Mistakes
The “Lost in the Middle” Problem
LLMs perform significantly worse with information placed in the middle of a long context. The model attends more strongly to information at the beginning and end of its context window.
The fix is structural: place the most important information — the task, key constraints, critical context — at the start and end of your prompt. If you have 20 retrieved documents, put the most relevant ones first and last, not buried in position 10–15.
This isn’t a workaround. It’s how you work with the model rather than against it.
More Context Is Not Better Context
A common instinct is to add more — more documentation, more examples, more history. This instinct is wrong. 15,000 tokens of precisely relevant context consistently outperforms 100,000 tokens of loosely relevant material. Noise in the context window actively hurts model performance: the model has to work harder to identify what matters, and it will make more mistakes doing so.
The goal is not to fill the context window. The goal is to give the model exactly what it needs to do the task well.
The Four-Layer Context Pipeline
Most teams implement basic RAG retrieval and stop there. The retrieval step gets you raw candidate chunks. The pipeline below is what determines whether those chunks actually help the model:
Raw candidate chunks
↓
[1] CLUSTERING → Group semantically similar chunks, eliminate redundancy
↓
[2] SELECTION → Choose the best representative from each cluster
↓
[3] RERANKING → Reorder by actual relevance, not just embedding similarity
↓
[4] COMPRESSION → Summarise conversation history, compress verbose tool results
↓
Curated context
Clustering prevents the model from seeing the same information five times with slight variations — a common outcome when multiple documents cover the same topic.
Selection ensures you keep the most complete and accurate version of each piece of information, not just the one that happened to score highest in vector search.
Reranking uses a cross-encoder (like Cohere Rerank) to score each chunk against the actual query for relevance, which is more accurate than embedding similarity alone. See the RAG page for implementation details.
Compression keeps the context from ballooning over a long session. Conversation history from 20 turns ago rarely needs to be preserved verbatim — a compressed summary preserves the essential information at a fraction of the token cost.
Don't skip post-processing
Most teams implement retrieval (step 0) and skip the pipeline. The pipeline is where quality and cost savings actually come from. Raw retrieval gets you candidate chunks. The pipeline gets you signal.
The Economic Impact
Context engineering has a direct, measurable effect on cost and latency:
| Approach | Tokens sent | Cost per request | Latency |
|---|---|---|---|
| Unoptimised | 100K | $0.30 | ~8s |
| With context engineering | 15K | $0.045 | ~2s |
| Reduction | 85% | 85% | 75% |
At scale — say, 500 agent tasks per day — the difference between these two approaches is roughly $46,000 in annual input token costs, before accounting for the quality improvements that come from cleaner context.
The point isn’t to minimise tokens for its own sake. The point is that smaller, cleaner context also produces better outputs. Cost and quality improve together when you engineer context well.
Token Budgets
A token budget is a deliberate allocation of the available context window before you start filling it. Without explicit budgets, context tends to expand to fill whatever space is available — which usually means important elements (the task, key constraints) get crowded out by less important ones (verbose tool results, old conversation history).
Example allocation for a 200K token context window:
| Allocation | Tokens | Contents |
|---|---|---|
| System instructions | 5K | System prompt + AGENTS.md content |
| Tool definitions | 15K | Tool schemas and descriptions (or 2K with meta-MCP) |
| Retrieved context | 30K | Code files, docs, search results |
| Conversation history | 20K | Recent turns, compressed summaries of older turns |
| Headroom | 130K | Tool results, model reasoning, response generation |
Enforce budgets programmatically
Token budgets only work if they’re enforced in code, not by convention. A budget that’s aspirational gets violated under load. Build the enforcement into your context assembly logic so that each allocation has a hard cap and a fallback (compress, truncate, or exclude) when it would be exceeded.
AGENTS.md: Persistent Context for Agents
AGENTS.md is a markdown file at the root of your project that tells AI agents how to work with your codebase. It’s the most cost-effective context engineering tool available because it replaces instructions you’d otherwise have to repeat in every session.
Projects that adopt AGENTS.md consistently report a 30–50% improvement in first-attempt success rate for agent-generated code. The reason is simple: the agent starts each task with accurate knowledge of your conventions, architecture, and common patterns instead of having to infer them or guess.
Anatomy of an Effective AGENTS.md
# Project: [name]
Tech stack: TypeScript 5.3, Node 22, PostgreSQL 16, pnpm
## Quick Start
pnpm install → pnpm dev → pnpm test → pnpm build
## Architecture
/src/api → Routes and controllers
/src/services → Business logic
/src/db → Data access (never raw SQL outside this layer)
## Conventions
- camelCase for variables, PascalCase for types, SCREAMING_SNAKE_CASE for constants
- All functions require JSDoc
- Error handling with Result<T,E> pattern — never bare try/catch in service layer
- Single quotes, no semicolons (enforced by ESLint)
## Negative instructions (critical — these prevent the most common errors)
- NEVER use `any` types in TypeScript
- NEVER modify files in /config/secrets/
- NEVER commit .env files
- NEVER console.log in production — use the logger service
## Common patterns
To add a new API endpoint:
1. Create route handler in src/api/[resource].ts
2. Create service in src/services/[resource].ts
3. Register route in src/api/index.ts
4. Write tests in tests/api/[resource].test.ts
Rules That Make AGENTS.md Work
Be specific, not general. “Follow best practices” produces nothing useful. “Use Result<T,E> for error handling, never bare try/catch in service layer” produces correct behaviour.
Always include negative instructions. The most valuable lines in any AGENTS.md are the ones that say what not to do. These directly prevent the most common errors — the ones you’ve already encountered and fixed manually.
Keep it current. An outdated AGENTS.md is worse than none. When a convention changes, update the file immediately. Set a reminder to review it monthly.
Test it. Give an agent a representative task and tell it to rely only on AGENTS.md. If the output has errors you’d expect your AGENTS.md to prevent, the file is missing information.
Complementary Files
For Claude specifically, .claude/CLAUDE.md works alongside AGENTS.md for session-level instructions. For the Aircury Framework, the Playbooks page shows how to combine AGENTS.md with OpenSpec conventions.
Context for Different Task Types
Context engineering decisions aren’t one-size-fits-all. The right context structure depends on what the agent is doing:
| Task type | Context priority | What to compress/exclude |
|---|---|---|
| Code generation | File structure, conventions, patterns from similar files | Unrelated documentation, distant history |
| Bug investigation | Error logs, relevant stack traces, the specific file | Broad architectural docs |
| Code review | The diff, style guide, past review comments | Unrelated PR history |
| Documentation | The code being documented, examples of good docs | Implementation details not reflected in the interface |
| Refactoring | The full module being changed, tests, architecture rules | External dependencies not being touched |
The common thread: give the agent the information it needs for this specific task, and actively exclude information that would add noise without adding value.
Practical Checklist
Before sending context to an agent:
- Most important information at the start or end — not in the middle
- Duplicate or near-duplicate content removed (cluster + select)
- Retrieved chunks reranked by actual relevance, not just similarity
- Conversation history compressed where turns are older than recent context
- AGENTS.md loaded for any code-related task
- Token budget allocated explicitly, not left to fill organically
- Verbose tool results from previous turns summarised or trimmed
On This Page
- Two Common Mistakes
- The “Lost in the Middle” Problem
- More Context Is Not Better Context
- The Four-Layer Context Pipeline
- The Economic Impact
- Token Budgets
- AGENTS.md: Persistent Context for Agents
- Anatomy of an Effective AGENTS.md
- Rules That Make AGENTS.md Work
- Complementary Files
- Context for Different Task Types
- Practical Checklist