Testing Strategy

Testing is the validation layer of Aircury’s Framework. A well-designed test suite doesn’t just catch bugs — it encodes the behaviour of your system in an executable, machine-verifiable form. Combined with OpenSpec specifications and architecture rules, a comprehensive test suite transforms your codebase into something remarkable: a system that can be refactored, replaced, or rebuilt with confidence.

The Testing Pyramid

           ▲
          /B\        Behavioural (BDD/Gherkin)
         /B B\       Few, slow, high-value
        /─────\
       / I   I \     Integration Tests
      / I  I  I \    Medium number, moderate speed
     /───────────\
    / U  U  U  U \   Unit Tests
   / U  U  U  U  U\  Many, fast, precise
  /─────────────────\

The pyramid describes the ratio and role of each layer — not just a count:

Layer	What it tests	Speed	Volume
Unit	Pure logic, isolated functions, domain objects	Very fast	Many (~70%)
Integration	Component boundaries, DB, APIs, adapters	Moderate	Medium (~20%)
Behavioural	System behaviour from the outside (user perspective)	Slower	Few (~10%)

The Ratio is a Guide, Not a Law

The 70/20/10 split is a starting point. Complex domain logic may justify more unit tests. Systems with complex integration points need more integration tests. Let the architecture guide the ratio, not the other way around.

Unit Tests

Unit tests validate individual functions, classes, or domain objects in isolation. The key word is isolation — no databases, no HTTP, no filesystem. Everything external is mocked or stubbed.

What to unit test:

Domain entities and value objects (business logic)
Use cases / application services (orchestration logic)
Pure utility functions
Complex algorithms and transformations

// Testing an Order domain entity
describe('Order', () => {
  describe('confirm()', () => {
    it('transitions from PENDING to CONFIRMED', () => {
      const order = Order.create({ customerId: 'cust-1', items: [mockItem] });
      expect(order.status).toBe(OrderStatus.PENDING);

      order.confirm();

      expect(order.status).toBe(OrderStatus.CONFIRMED);
    });

    it('throws when confirming an already-confirmed order', () => {
      const order = Order.create({ ... });
      order.confirm();

      expect(() => order.confirm()).toThrow(OrderAlreadyProcessedError);
    });
  });
});

Unit test rules:

Each test has ONE assertion about ONE behaviour
Tests are independent — no shared mutable state
If a test requires 20+ lines of setup, the code under test has too many responsibilities
Test names describe behaviour: it('throws when confirming an already-confirmed order')

Integration Tests

Integration tests validate that components work correctly when connected — typically testing repositories against a real (or in-memory) database, or verifying that adapters translate correctly between domain and infrastructure.

// Testing a repository against a real database (using test containers or SQLite)
describe('PostgresOrderRepository', () => {
  let repository: PostgresOrderRepository;
  let db: TestDatabase;

  beforeAll(async () => {
    db = await TestDatabase.start();
    repository = new PostgresOrderRepository(db.connection);
  });

  afterAll(() => db.stop());
  afterEach(() => db.reset());

  it('persists and retrieves an order by ID', async () => {
    const order = Order.create({ customerId: 'cust-1', items: [mockItem] });
    await repository.save(order);

    const retrieved = await repository.findById(order.id);

    expect(retrieved).toEqual(order);
  });

  it('returns null for a non-existent order', async () => {
    const result = await repository.findById(new OrderId('non-existent'));
    expect(result).toBeNull();
  });
});

Integration test rules:

Use real infrastructure (test database, not mocks) for repository tests
Reset state between tests — tests must be order-independent
Test the full adapter contract, not implementation details

Behavioural Tests (BDD/Gherkin)

Behavioural tests validate the system from the outside — from the perspective of a user or external caller. They use the Gherkin syntax (Given/When/Then) and are written as .feature files.

These tests are the bridge between OpenSpec specifications and executable validation. A well-written OpenSpec scenario maps directly to a Gherkin scenario.

OpenSpec → Gherkin

<!-- OpenSpec spec.md -->
### Requirement: User can reset password
The system SHALL send a reset email within 60 seconds.

#### Scenario: Successful password reset request
- **WHEN** user submits a valid email address on the reset form
- **THEN** system queues a password reset email to that address
- **AND** returns HTTP 200 with `{ message: "Email sent if account exists" }`

# password-reset.feature
Feature: Password Reset

  Scenario: Successful password reset request
    When the user submits a valid email address on the reset form
    Then the system queues a password reset email to that address
    And returns HTTP 200 with "Email sent if account exists"

The spec is human alignment. The .feature file is machine validation. They describe the same thing.

Step Definitions

// password-reset.steps.ts
import { When, Then, And } from '@cucumber/cucumber';

When('the user submits a valid email address on the reset form', async function() {
  this.response = await this.api.post('/auth/reset-password', {
    email: 'test@example.com'
  });
});

Then('the system queues a password reset email to that address', async function() {
  const queued = await this.emailQueue.find({ to: 'test@example.com' });
  expect(queued).toBeDefined();
});

Then('returns HTTP 200 with {string}', function(message: string) {
  expect(this.response.status).toBe(200);
  expect(this.response.body.message).toContain(message);
});

Gherkin is for Behaviour, Not Implementation

BDD tests describe what the system does from the outside — not how it does it. A good BDD test survives a complete internal refactor. If your steps are testing implementation details (specific SQL queries, internal class methods), you’re testing the wrong layer.

The Safety Net Concept

This is the most important idea in Aircury’s testing philosophy: a comprehensive test suite is an implementation-independent contract.

                   What we know about the system
                   ┌─────────────────────────────┐
  OpenSpec specs   │  Behaviour contracts         │
  BDD tests        │  Executable validation       │
  Unit tests       │  Logic contracts             │
  Integration tests│  Boundary contracts          │
                   └─────────────────────────────┘
                              ↓
                   The implementation is just the
                   CURRENT fulfilment of those contracts

When everything is tested at every layer:

Refactoring becomes safe — tests catch regressions
Replacing AI-generated code becomes safe — tests validate the replacement
Rebuilding from scratch becomes possible — the contracts define the target

This is the “rebuildable codebase” concept from Methodology: with good specs and tests, you’re not locked into your current implementation. You’re locked into your contracts.

Writing Testable Code

Testability is not an afterthought — it’s a design constraint, and one that produces better design as a side effect.

The testability rule: if a unit of code is hard to test in isolation, it’s revealing a design problem.

Hard to test symptom	Likely design problem	Fix
Must mock 5+ things	Too many responsibilities	Split the class
Constructor instantiates dependencies	Missing dependency injection	Inject dependencies as interfaces
Test requires specific database state	Logic coupled to infrastructure	Move logic to pure domain
Feature tests are the only way to test it	Logic buried in controller	Extract to use case

TDD with AI

An effective AI-assisted workflow: write the failing BDD scenario first, then ask AI to implement until it passes. The test defines the contract; the AI fulfils it. This is Test-Driven Development at the scenario level, and it produces AI output that is focused and verifiable.

Test Coverage as a Quality Signal

Coverage is a signal, not a goal. Chasing 100% coverage leads to testing implementation details. Instead, aim for:

All happy paths tested at the BDD layer (the system does what it’s supposed to)
All business rules tested at the unit layer (edge cases, error paths, invariants)
All integration points tested at the integration layer (data in → data out)
Zero “protected” branches — if there’s an if, there’s a test for both branches