Testing Strategy
Aircury's full testing pyramid — Unit, Integration, and Behavioural (BDD/Gherkin) tests — and how they create a safety net that makes the codebase rebuildable.
Testing is the validation layer of Aircury’s Framework. A well-designed test suite doesn’t just catch bugs — it encodes the behaviour of your system in an executable, machine-verifiable form. Combined with OpenSpec specifications and architecture rules, a comprehensive test suite transforms your codebase into something remarkable: a system that can be refactored, replaced, or rebuilt with confidence.
The Testing Pyramid
▲
/B\ Behavioural (BDD/Gherkin)
/B B\ Few, slow, high-value
/─────\
/ I I \ Integration Tests
/ I I I \ Medium number, moderate speed
/───────────\
/ U U U U \ Unit Tests
/ U U U U U\ Many, fast, precise
/─────────────────\
The pyramid describes the ratio and role of each layer — not just a count:
| Layer | What it tests | Speed | Volume |
|---|---|---|---|
| Unit | Pure logic, isolated functions, domain objects | Very fast | Many (~70%) |
| Integration | Component boundaries, DB, APIs, adapters | Moderate | Medium (~20%) |
| Behavioural | System behaviour from the outside (user perspective) | Slower | Few (~10%) |
The Ratio is a Guide, Not a Law
The 70/20/10 split is a starting point. Complex domain logic may justify more unit tests. Systems with complex integration points need more integration tests. Let the architecture guide the ratio, not the other way around.
Unit Tests
Unit tests validate individual functions, classes, or domain objects in isolation. The key word is isolation — no databases, no HTTP, no filesystem. Everything external is mocked or stubbed.
What to unit test:
- Domain entities and value objects (business logic)
- Use cases / application services (orchestration logic)
- Pure utility functions
- Complex algorithms and transformations
// Testing an Order domain entity
describe('Order', () => {
describe('confirm()', () => {
it('transitions from PENDING to CONFIRMED', () => {
const order = Order.create({ customerId: 'cust-1', items: [mockItem] });
expect(order.status).toBe(OrderStatus.PENDING);
order.confirm();
expect(order.status).toBe(OrderStatus.CONFIRMED);
});
it('throws when confirming an already-confirmed order', () => {
const order = Order.create({ ... });
order.confirm();
expect(() => order.confirm()).toThrow(OrderAlreadyProcessedError);
});
});
});
Unit test rules:
- Each test has ONE assertion about ONE behaviour
- Tests are independent — no shared mutable state
- If a test requires 20+ lines of setup, the code under test has too many responsibilities
- Test names describe behaviour:
it('throws when confirming an already-confirmed order')
Integration Tests
Integration tests validate that components work correctly when connected — typically testing repositories against a real (or in-memory) database, or verifying that adapters translate correctly between domain and infrastructure.
// Testing a repository against a real database (using test containers or SQLite)
describe('PostgresOrderRepository', () => {
let repository: PostgresOrderRepository;
let db: TestDatabase;
beforeAll(async () => {
db = await TestDatabase.start();
repository = new PostgresOrderRepository(db.connection);
});
afterAll(() => db.stop());
afterEach(() => db.reset());
it('persists and retrieves an order by ID', async () => {
const order = Order.create({ customerId: 'cust-1', items: [mockItem] });
await repository.save(order);
const retrieved = await repository.findById(order.id);
expect(retrieved).toEqual(order);
});
it('returns null for a non-existent order', async () => {
const result = await repository.findById(new OrderId('non-existent'));
expect(result).toBeNull();
});
});
Integration test rules:
- Use real infrastructure (test database, not mocks) for repository tests
- Reset state between tests — tests must be order-independent
- Test the full adapter contract, not implementation details
Behavioural Tests (BDD/Gherkin)
Behavioural tests validate the system from the outside — from the perspective of a user or external caller. They use the Gherkin syntax (Given/When/Then) and are written as .feature files.
These tests are the bridge between OpenSpec specifications and executable validation. A well-written OpenSpec scenario maps directly to a Gherkin scenario.
OpenSpec → Gherkin
<!-- OpenSpec spec.md -->
### Requirement: User can reset password
The system SHALL send a reset email within 60 seconds.
#### Scenario: Successful password reset request
- **WHEN** user submits a valid email address on the reset form
- **THEN** system queues a password reset email to that address
- **AND** returns HTTP 200 with `{ message: "Email sent if account exists" }`
# password-reset.feature
Feature: Password Reset
Scenario: Successful password reset request
When the user submits a valid email address on the reset form
Then the system queues a password reset email to that address
And returns HTTP 200 with "Email sent if account exists"
The spec is human alignment. The .feature file is machine validation. They describe the same thing.
Step Definitions
// password-reset.steps.ts
import { When, Then, And } from '@cucumber/cucumber';
When('the user submits a valid email address on the reset form', async function() {
this.response = await this.api.post('/auth/reset-password', {
email: 'test@example.com'
});
});
Then('the system queues a password reset email to that address', async function() {
const queued = await this.emailQueue.find({ to: 'test@example.com' });
expect(queued).toBeDefined();
});
Then('returns HTTP 200 with {string}', function(message: string) {
expect(this.response.status).toBe(200);
expect(this.response.body.message).toContain(message);
});
Gherkin is for Behaviour, Not Implementation
BDD tests describe what the system does from the outside — not how it does it. A good BDD test survives a complete internal refactor. If your steps are testing implementation details (specific SQL queries, internal class methods), you’re testing the wrong layer.
The Safety Net Concept
This is the most important idea in Aircury’s testing philosophy: a comprehensive test suite is an implementation-independent contract.
What we know about the system
┌─────────────────────────────┐
OpenSpec specs │ Behaviour contracts │
BDD tests │ Executable validation │
Unit tests │ Logic contracts │
Integration tests│ Boundary contracts │
└─────────────────────────────┘
↓
The implementation is just the
CURRENT fulfilment of those contracts
When everything is tested at every layer:
- Refactoring becomes safe — tests catch regressions
- Replacing AI-generated code becomes safe — tests validate the replacement
- Rebuilding from scratch becomes possible — the contracts define the target
This is the “rebuildable codebase” concept from Methodology: with good specs and tests, you’re not locked into your current implementation. You’re locked into your contracts.
Writing Testable Code
Testability is not an afterthought — it’s a design constraint, and one that produces better design as a side effect.
The testability rule: if a unit of code is hard to test in isolation, it’s revealing a design problem.
| Hard to test symptom | Likely design problem | Fix |
|---|---|---|
| Must mock 5+ things | Too many responsibilities | Split the class |
| Constructor instantiates dependencies | Missing dependency injection | Inject dependencies as interfaces |
| Test requires specific database state | Logic coupled to infrastructure | Move logic to pure domain |
| Feature tests are the only way to test it | Logic buried in controller | Extract to use case |
TDD with AI
An effective AI-assisted workflow: write the failing BDD scenario first, then ask AI to implement until it passes. The test defines the contract; the AI fulfils it. This is Test-Driven Development at the scenario level, and it produces AI output that is focused and verifiable.
Test Coverage as a Quality Signal
Coverage is a signal, not a goal. Chasing 100% coverage leads to testing implementation details. Instead, aim for:
- All happy paths tested at the BDD layer (the system does what it’s supposed to)
- All business rules tested at the unit layer (edge cases, error paths, invariants)
- All integration points tested at the integration layer (data in → data out)
- Zero “protected” branches — if there’s an
if, there’s a test for both branches