Docs
Observability

Observability

Logs, metrics, and traces as a design constraint — built in from the first spec, not added after something breaks in production.


Observability isn’t monitoring. Monitoring tells you when something is wrong. Observability tells you why. The difference is that observability has to be designed in — it can’t be bolted on after the fact.

When agents generate code, they generate observable code only if the spec and design explicitly require it. By default, they’ll produce code that works in tests and fails silently in production.

The silent failure problem

Agent-generated code is optimised for test passage, not production operation. Tests don’t need structured logs. Tests don’t emit metrics. Tests don’t need distributed traces. If observability isn’t in the spec, it won’t be in the code.

The three pillars

Logs — discrete events with context. What happened, when, and what was the state at that moment. Logs are for developers debugging specific incidents.

Metrics — numerical measurements over time. Request rates, error rates, latency percentiles, queue depths. Metrics are for understanding system behaviour at scale.

Traces — the path of a single request across services and layers. Traces are for understanding why a specific operation took as long as it did, or where it failed.

All three are necessary. Metrics tell you that error rate spiked. Traces tell you which requests failed. Logs tell you what the application state was when they failed.

Observability in design documents

Every design document for a non-trivial feature should explicitly state:

  • What events this feature logs (and at which level)
  • What metrics this feature emits
  • What trace spans this feature creates
  • What an on-call engineer needs to diagnose a failure in this feature

This isn’t over-engineering. It takes one paragraph. The alternative is debugging a production incident with no instruments.

## Observability

### Logs
- INFO: Order created (order_id, customer_id, item_count, total_amount)
- INFO: Payment charged successfully (order_id, payment_id, amount)
- WARN: Payment retry attempt (order_id, attempt_number, reason)
- ERROR: Payment failed after all retries (order_id, final_error, attempts)

### Metrics
- orders.created (counter)
- orders.payment_success (counter)
- orders.payment_failed (counter)
- orders.processing_duration_ms (histogram)

### Traces
- Span: CreateOrder (includes payment charge as child span)
- Span: ProcessPayment (includes Stripe API call as child span)

Structured logging

Logs are only useful if they’re queryable. Free-text logs are not queryable at scale. Structured logs — JSON or key-value pairs — are.

// ❌ Unstructured — can't filter, can't aggregate
console.log(`Order ${orderId} failed with error: ${error.message}`);

// ✓ Structured — filterable by any field
logger.error('Order payment failed', {
  order_id: orderId.value,
  customer_id: customerId.value,
  error_code: error.code,
  error_message: error.message,
  attempt: attemptNumber,
});

Log levels

LevelWhen to useExamples
ERRORSomething failed and requires attentionPayment charge failed, database connection lost
WARNSomething unexpected but handledPayment retry, fallback triggered, rate limit hit
INFONormal, significant eventsOrder created, user authenticated, job completed
DEBUGDiagnostic detail for developmentQuery parameters, response body, timing breakdown

DEBUG should be off in production by default. INFO and above should always be on.

Metrics instrumentation

Metrics should be emitted at use case boundaries — not inside domain entities. The application layer knows when a business operation succeeds or fails; the domain layer doesn’t need to know it’s being measured.

class CreateOrderUseCase implements CreateOrderUseCasePort {
  constructor(
    private readonly orderRepository: OrderRepository,
    private readonly paymentGateway: PaymentGateway,
    private readonly metrics: MetricsPort,
    private readonly logger: LoggerPort,
  ) {}

  async execute(command: CreateOrderCommand): Promise<OrderId> {
    const start = Date.now();

    try {
      const order = Order.create(command.customerId, command.items);
      await this.paymentGateway.charge(order.total(), command.paymentMethod);
      await this.orderRepository.save(order);

      this.metrics.increment('orders.created');
      this.metrics.histogram('orders.processing_duration_ms', Date.now() - start);
      this.logger.info('Order created', { order_id: order.id.value });

      return order.id;
    } catch (error) {
      this.metrics.increment('orders.creation_failed');
      this.logger.error('Order creation failed', { error, command });
      throw error;
    }
  }
}

Note that MetricsPort and LoggerPort are injected as interfaces — the application layer doesn’t know whether metrics go to Datadog, Prometheus, or a test stub.

Observability rules for AGENTS.md

## Observability Rules

- All use cases must inject LoggerPort and emit INFO on success, ERROR on failure
- Log fields must be structured key-value pairs — no string interpolation in logs
- Use WARN for handled errors (retries, fallbacks), ERROR for unhandled failures
- Never log PII (email addresses, phone numbers, payment data, tokens)
- Metrics must be emitted at use case boundaries, not inside domain entities
- Error logs must include enough context to diagnose without access to the database
- Use trace spans for any operation that crosses a service or makes an external call

What goes in every error log

An error log is useful only if the on-call engineer can diagnose the issue without asking the user to reproduce it. Every error log needs:

  • What failed — the operation name
  • Why it failed — the error message and code
  • What the state was — the relevant IDs and context at the time of failure
  • What was attempted — inputs, retry count if applicable
logger.error('Payment charge failed', {
  order_id: order.id.value,
  customer_id: order.customerId.value,
  amount_cents: order.total().inCents(),
  currency: order.total().currency,
  payment_provider: 'stripe',
  error_code: stripeError.code,
  error_message: stripeError.message,
  attempt: attemptNumber,
  correlation_id: traceContext.traceId,
});

This log tells the on-call engineer everything they need to investigate — without touching the database.