Context Engineering

Context engineering is the discipline of deliberately constructing the information you provide to an LLM. The model you choose matters. The prompt you write matters. But what you put in the context window — what information the model can see when it reasons — has more impact on output quality than either of those combined.

This is especially true for agents, where context includes not just user instructions but file contents, tool results, conversation history, and retrieved documents. Managing that context well is the difference between an agent that works and one that produces expensive, incoherent results.

A useful reframe

Think of the context window not as a place to dump everything relevant, but as a workspace you’re preparing for a skilled collaborator. What does this person need to know right now? What would distract them? What should they be reminded of? The answers to those questions are your context engineering decisions.

Two Common Mistakes

The “Lost in the Middle” Problem

LLMs perform significantly worse with information placed in the middle of a long context. The model attends more strongly to information at the beginning and end of its context window.

The fix is structural: place the most important information — the task, key constraints, critical context — at the start and end of your prompt. If you have 20 retrieved documents, put the most relevant ones first and last, not buried in position 10–15.

This isn’t a workaround. It’s how you work with the model rather than against it.

More Context Is Not Better Context

A common instinct is to add more — more documentation, more examples, more history. This instinct is wrong. 15,000 tokens of precisely relevant context consistently outperforms 100,000 tokens of loosely relevant material. Noise in the context window actively hurts model performance: the model has to work harder to identify what matters, and it will make more mistakes doing so.

The goal is not to fill the context window. The goal is to give the model exactly what it needs to do the task well.

The Four-Layer Context Pipeline

Most teams implement basic RAG retrieval and stop there. The retrieval step gets you raw candidate chunks. The pipeline below is what determines whether those chunks actually help the model:

Raw candidate chunks
        ↓
  [1] CLUSTERING     → Group semantically similar chunks, eliminate redundancy
        ↓
  [2] SELECTION      → Choose the best representative from each cluster
        ↓
  [3] RERANKING      → Reorder by actual relevance, not just embedding similarity
        ↓
  [4] COMPRESSION    → Summarise conversation history, compress verbose tool results
        ↓
Curated context

Clustering prevents the model from seeing the same information five times with slight variations — a common outcome when multiple documents cover the same topic.

Selection ensures you keep the most complete and accurate version of each piece of information, not just the one that happened to score highest in vector search.

Reranking uses a cross-encoder (like Cohere Rerank) to score each chunk against the actual query for relevance, which is more accurate than embedding similarity alone. See the RAG page for implementation details.

Compression keeps the context from ballooning over a long session. Conversation history from 20 turns ago rarely needs to be preserved verbatim — a compressed summary preserves the essential information at a fraction of the token cost.

Don't skip post-processing

Most teams implement retrieval (step 0) and skip the pipeline. The pipeline is where quality and cost savings actually come from. Raw retrieval gets you candidate chunks. The pipeline gets you signal.

The Economic Impact

Context engineering has a direct, measurable effect on cost and latency:

Approach	Tokens sent	Cost per request	Latency
Unoptimised	100K	$0.30	~8s
With context engineering	15K	$0.045	~2s
Reduction	85%	85%	75%

At scale — say, 500 agent tasks per day — the difference between these two approaches is roughly $46,000 in annual input token costs, before accounting for the quality improvements that come from cleaner context.

The point isn’t to minimise tokens for its own sake. The point is that smaller, cleaner context also produces better outputs. Cost and quality improve together when you engineer context well.

Token Budgets

A token budget is a deliberate allocation of the available context window before you start filling it. Without explicit budgets, context tends to expand to fill whatever space is available — which usually means important elements (the task, key constraints) get crowded out by less important ones (verbose tool results, old conversation history).

Example allocation for a 200K token context window:

Allocation	Tokens	Contents
System instructions	5K	System prompt + AGENTS.md content
Tool definitions	15K	Tool schemas and descriptions (or 2K with meta-MCP)
Retrieved context	30K	Code files, docs, search results
Conversation history	20K	Recent turns, compressed summaries of older turns
Headroom	130K	Tool results, model reasoning, response generation

Enforce budgets programmatically

Token budgets only work if they’re enforced in code, not by convention. A budget that’s aspirational gets violated under load. Build the enforcement into your context assembly logic so that each allocation has a hard cap and a fallback (compress, truncate, or exclude) when it would be exceeded.

AGENTS.md: Persistent Context for Agents

AGENTS.md is a markdown file at the root of your project that tells AI agents how to work with your codebase. It’s the most cost-effective context engineering tool available because it replaces instructions you’d otherwise have to repeat in every session.

Projects that adopt AGENTS.md consistently report a 30–50% improvement in first-attempt success rate for agent-generated code. The reason is simple: the agent starts each task with accurate knowledge of your conventions, architecture, and common patterns instead of having to infer them or guess.

Anatomy of an Effective AGENTS.md

# Project: [name]
Tech stack: TypeScript 5.3, Node 22, PostgreSQL 16, pnpm

## Quick Start
pnpm install → pnpm dev → pnpm test → pnpm build

## Architecture
/src/api        → Routes and controllers
/src/services   → Business logic
/src/db         → Data access (never raw SQL outside this layer)

## Conventions
- camelCase for variables, PascalCase for types, SCREAMING_SNAKE_CASE for constants
- All functions require JSDoc
- Error handling with Result<T,E> pattern — never bare try/catch in service layer
- Single quotes, no semicolons (enforced by ESLint)

## Negative instructions (critical — these prevent the most common errors)
- NEVER use `any` types in TypeScript
- NEVER modify files in /config/secrets/
- NEVER commit .env files
- NEVER console.log in production — use the logger service

## Common patterns
To add a new API endpoint:
1. Create route handler in src/api/[resource].ts
2. Create service in src/services/[resource].ts
3. Register route in src/api/index.ts
4. Write tests in tests/api/[resource].test.ts

Rules That Make AGENTS.md Work

Be specific, not general. “Follow best practices” produces nothing useful. “Use Result<T,E> for error handling, never bare try/catch in service layer” produces correct behaviour.

Always include negative instructions. The most valuable lines in any AGENTS.md are the ones that say what not to do. These directly prevent the most common errors — the ones you’ve already encountered and fixed manually.

Keep it current. An outdated AGENTS.md is worse than none. When a convention changes, update the file immediately. Set a reminder to review it monthly.

Test it. Give an agent a representative task and tell it to rely only on AGENTS.md. If the output has errors you’d expect your AGENTS.md to prevent, the file is missing information.

Complementary Files

For Claude specifically, .claude/CLAUDE.md works alongside AGENTS.md for session-level instructions. For the Aircury Framework, the Playbooks page shows how to combine AGENTS.md with OpenSpec conventions.

Context for Different Task Types

Context engineering decisions aren’t one-size-fits-all. The right context structure depends on what the agent is doing:

Task type	Context priority	What to compress/exclude
Code generation	File structure, conventions, patterns from similar files	Unrelated documentation, distant history
Bug investigation	Error logs, relevant stack traces, the specific file	Broad architectural docs
Code review	The diff, style guide, past review comments	Unrelated PR history
Documentation	The code being documented, examples of good docs	Implementation details not reflected in the interface
Refactoring	The full module being changed, tests, architecture rules	External dependencies not being touched

The common thread: give the agent the information it needs for this specific task, and actively exclude information that would add noise without adding value.

Practical Checklist

Before sending context to an agent:

Most important information at the start or end — not in the middle
Duplicate or near-duplicate content removed (cluster + select)
Retrieved chunks reranked by actual relevance, not just similarity
Conversation history compressed where turns are older than recent context
AGENTS.md loaded for any code-related task
Token budget allocated explicitly, not left to fill organically
Verbose tool results from previous turns summarised or trimmed