Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Testing Strategies

Testing microservices is fundamentally harder than testing a monolith. Think of it like testing a single factory versus testing a supply chain — in a monolith, you can verify the whole assembly line in one place. With microservices, you need to verify that dozens of independent factories coordinate correctly, handle delayed shipments, and recover when one factory goes offline. This chapter covers comprehensive testing strategies that give you confidence without drowning in slow, brittle tests.
Learning Objectives:
  • Implement unit testing with mocks
  • Write integration tests for services
  • Use contract testing with Pact
  • Design end-to-end test strategies
  • Set up test environments

Testing Pyramid for Microservices

Before we dive into a single line of test code, we need a mental model for where to invest testing effort. The testing pyramid is a resource allocation framework disguised as a diagram. The fundamental insight is that tests closer to the bottom (unit tests) are cheap to write, fast to run, and catch localized bugs. Tests closer to the top (E2E) are expensive, slow, and catch integration bugs — but they are also flaky and break for environmental reasons that have nothing to do with your code. If you invert the pyramid and invest heavily in E2E tests, your CI pipeline becomes a coin flip: developers learn to ignore red builds because “the tests are flaky,” and real bugs slip through. The pyramid shape exists because it represents the optimal trade-off between coverage, speed, and signal quality. In microservices, we add two middle layers — contract and component tests — because service boundaries are the highest-risk part of the system and deserve dedicated test types.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    MICROSERVICES TESTING PYRAMID                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                              ▲                                               │
│                             ╱ ╲                                              │
│                            ╱   ╲                                             │
│                           ╱ E2E ╲        Few, slow, expensive               │
│                          ╱───────╲       Test critical user journeys         │
│                         ╱         ╲                                          │
│                        ╱ Component ╲     Service in isolation                │
│                       ╱─────────────╲    with real DB, mocked deps          │
│                      ╱               ╲                                       │
│                     ╱    Contract     ╲  Verify service compatibility       │
│                    ╱───────────────────╲ Consumer-driven contracts          │
│                   ╱                     ╲                                    │
│                  ╱     Integration       ╲  Test with real dependencies     │
│                 ╱─────────────────────────╲ Database, message queue         │
│                ╱                           ╲                                 │
│               ╱           Unit              ╲  Fast, isolated               │
│              ╱───────────────────────────────╲ Many, quick, cheap           │
│                                                                              │
│                                                                              │
│  ┌─────────────┬────────────┬────────────┬───────────────────────────────┐  │
│  │   Level     │   Count    │   Speed    │   What to Test                │  │
│  ├─────────────┼────────────┼────────────┼───────────────────────────────┤  │
│  │ Unit        │ Thousands  │ < 1ms      │ Business logic, utilities     │  │
│  │ Integration │ Hundreds   │ < 1s       │ DB queries, external calls    │  │
│  │ Contract    │ Tens       │ < 5s       │ API compatibility             │  │
│  │ Component   │ Tens       │ < 30s      │ Service endpoints             │  │
│  │ E2E         │ Few        │ < 5min     │ Critical paths                │  │
│  └─────────────┴────────────┴────────────┴───────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Testing Strategy Decision Framework

Choosing the right test type for a given scenario is where most teams get it wrong. They either over-invest in E2E tests (slow, flaky, expensive) or under-invest in contract tests (the single most impactful test type for microservices that almost nobody writes). Use this decision framework: “What am I verifying?”
QuestionBest Test TypeWhy
Does my business logic produce correct results?Unit testFast, isolated, easy to debug
Does my SQL/ORM actually work against a real DB?Integration testCatches schema mismatches, query bugs
Can Service A still talk to Service B after B’s latest deploy?Contract testCatches breaking API changes before production
Does my service handle the full request lifecycle correctly?Component testCatches wiring issues, middleware bugs
Does the critical user journey (checkout, signup) still work?E2E testLast line of defense for revenue-critical paths
“How many should I write?”
Test TypeTypical Count Per ServiceRun Time BudgetCI Frequency
Unit200-500+less than 30 seconds totalEvery commit
Integration30-100less than 2 minutesEvery commit
Contract10-30 per consumerless than 1 minuteEvery commit
Component10-20less than 5 minutesEvery PR
E2E5-10 (entire system)less than 15 minutesPre-deploy only
Edge case: when contract tests are not enough. Contract tests verify the shape of responses, not the semantics. If Service B returns {"status": "COMPLETED"} but the meaning of “COMPLETED” has subtly changed (say, it now means “payment authorized” instead of “payment captured”), contract tests pass while your system is broken. For these semantic edge cases, you need integration tests that validate actual behavior, not just schema compliance.

Unit Testing

Testing Business Logic

The key insight with unit testing in microservices: you test your service’s logic in isolation by mocking its dependencies. This is no different from unit testing in a monolith, but the boundaries are cleaner because service APIs are explicit contracts rather than internal method calls. Why mock dependencies? Because the alternative — spinning up real databases and network calls for every test — turns a 10-millisecond unit test into a 500-millisecond integration test. Across thousands of tests, that difference is minutes vs hours of CI time. The trade-off: mocks can drift from reality. You might mock inventoryClient.checkAvailability to return { available: true }, but in production the real API returns a different field name. That is why we complement unit tests with contract tests (which verify the mock matches reality) and integration tests (which exercise real dependencies). Think of unit tests as verifying “my logic is correct assuming my dependencies behave” and other test layers as verifying “my dependencies actually behave the way I assumed.” The dependency injection pattern below is what makes the code testable. If OrderService instantiated its own repository and clients internally, we would have no way to substitute mocks — we would be stuck running the real code against real infrastructure. Constructor injection externalizes these dependencies and flips control: the test decides what OrderService talks to. This applies equally to Node.js and Python; the syntax differs, but the discipline is identical.
// services/OrderService.js
// Notice the constructor injection pattern -- this is what makes the class testable.
// Each dependency is an interface we can replace with a mock in tests.
class OrderService {
  constructor(orderRepository, pricingService, inventoryClient) {
    this.orderRepository = orderRepository;
    this.pricingService = pricingService;
    this.inventoryClient = inventoryClient;
  }

  async createOrder(customerId, items) {
    // Validate items
    if (!items || items.length === 0) {
      throw new ValidationError('Order must have at least one item');
    }

    if (items.length > 50) {
      throw new ValidationError('Order cannot have more than 50 items');
    }

    // Check inventory
    const availability = await this.inventoryClient.checkAvailability(items);
    const unavailable = availability.filter(item => !item.available);
    
    if (unavailable.length > 0) {
      throw new InsufficientInventoryError(unavailable);
    }

    // Calculate pricing
    const pricing = await this.pricingService.calculate(items, customerId);

    // Create order
    const order = {
      id: generateId(),
      customerId,
      items,
      subtotal: pricing.subtotal,
      tax: pricing.tax,
      total: pricing.total,
      status: 'PENDING',
      createdAt: new Date()
    };

    await this.orderRepository.save(order);
    
    return order;
  }

  calculateDiscount(order, discountCode) {
    const discounts = {
      'SAVE10': { type: 'percentage', value: 10 },
      'FLAT50': { type: 'fixed', value: 50 },
      'FREESHIP': { type: 'shipping', value: 0 }
    };

    const discount = discounts[discountCode];
    if (!discount) {
      throw new InvalidDiscountError(discountCode);
    }

    switch (discount.type) {
      case 'percentage':
        return order.subtotal * (discount.value / 100);
      case 'fixed':
        return Math.min(discount.value, order.subtotal);
      case 'shipping':
        return order.shippingCost;
      default:
        return 0;
    }
  }
}

Unit Tests with Jest and pytest

Notice how each mock returns sensible defaults in beforeEach. This is a production pitfall to watch: if your mocks are too tightly coupled to specific test scenarios, a single change in the service code breaks dozens of tests. Default to the “happy path” in setup, then override per test. Why is the happy-path-in-setup pattern so important? Because tests should read like specifications: “Given a valid order, when I call createOrder, then it returns a pending order.” The setup answers “given a valid X” once, at the top, and each individual test focuses only on what makes it different. If every test has to re-configure every mock, you end up with test files where the setup dwarfs the actual assertions, and changing a dependency signature requires touching hundreds of lines. The same principle applies whether you use Jest’s beforeEach or pytest’s fixtures — fixtures compose better (pytest can inject only the fixtures a test needs), but the core idea of “shared happy-path setup, per-test overrides” is the same. One more nuance: verify behavior, not implementation. Tests that assert “mock was called with these exact arguments” are brittle — they fail every time you refactor. Prefer assertions on observable outcomes (what the function returned, what state changed). Only verify interactions when the interaction itself is the behavior you care about (e.g., “did we charge the customer?”).
Caveats & Common Pitfalls: Test Doubles That Do Not Match Production
  1. Hand-written mocks that lie. You hand-code mockInventoryClient.checkAvailability to return { available: true }. In production the real API returns { in_stock: true }. Every unit test passes; every production request 500s. Symptom: 100% green CI, broken in staging.
  2. Mocks returning sync values for async APIs. Your mock returns a plain object; the real client returns a Promise. Tests pass because JavaScript tolerates the difference in many paths. The first .then() in real code returns undefined and the bug surfaces at runtime.
  3. Mocks that never fail. Every mock call succeeds, so error paths are never exercised. Your service has beautiful unit tests and zero resilience; the first network blip in prod takes it down.
  4. Mocks that ignore time. Tests call Date.now() without freezing it, so tests pass at 10am and fail at 11:59pm when a midnight boundary hits.
Solutions & Patterns
  • Generate mocks from the real API schema. If the producer publishes an OpenAPI spec, generate TypeScript types and mock factories from it. A schema change breaks compilation — impossible to have a silently wrong mock.
  • Complement unit tests with contract tests. The contract test verifies the mock matches production. If checkAvailability actually returns in_stock, the contract test fails immediately.
  • Test error paths explicitly. For every happy-path test, write at least one failure case: timeout, 500, malformed response, invalid schema.
  • Freeze time with jest.useFakeTimers() or freezegun (Python). Any test that touches time must pin it; relying on wall-clock time is how you get flaky nightly builds.
  • Snapshot real responses. Record a real production response once (in staging), save it as a fixture, and use it as the baseline mock. Refresh when the schema changes.
// tests/unit/OrderService.test.js
const { OrderService } = require('../../services/OrderService');
const { ValidationError, InsufficientInventoryError } = require('../../errors');

describe('OrderService', () => {
  let orderService;
  let mockOrderRepository;
  let mockPricingService;
  let mockInventoryClient;

  beforeEach(() => {
    // Create mocks
    mockOrderRepository = {
      save: jest.fn().mockResolvedValue(undefined),
      findById: jest.fn()
    };

    mockPricingService = {
      calculate: jest.fn().mockResolvedValue({
        subtotal: 100,
        tax: 10,
        total: 110
      })
    };

    mockInventoryClient = {
      checkAvailability: jest.fn().mockResolvedValue([
        { productId: 'p1', available: true },
        { productId: 'p2', available: true }
      ])
    };

    orderService = new OrderService(
      mockOrderRepository,
      mockPricingService,
      mockInventoryClient
    );
  });

  describe('createOrder', () => {
    const customerId = 'customer-123';
    const validItems = [
      { productId: 'p1', quantity: 2 },
      { productId: 'p2', quantity: 1 }
    ];

    it('should create order with valid items', async () => {
      const order = await orderService.createOrder(customerId, validItems);

      expect(order).toMatchObject({
        customerId,
        items: validItems,
        subtotal: 100,
        tax: 10,
        total: 110,
        status: 'PENDING'
      });
      expect(order.id).toBeDefined();
      expect(mockOrderRepository.save).toHaveBeenCalledWith(
        expect.objectContaining({ customerId })
      );
    });

    it('should throw ValidationError for empty items', async () => {
      await expect(
        orderService.createOrder(customerId, [])
      ).rejects.toThrow(ValidationError);

      await expect(
        orderService.createOrder(customerId, [])
      ).rejects.toThrow('Order must have at least one item');
    });

    it('should throw ValidationError for too many items', async () => {
      const tooManyItems = Array(51).fill({ productId: 'p1', quantity: 1 });

      await expect(
        orderService.createOrder(customerId, tooManyItems)
      ).rejects.toThrow(ValidationError);
    });

    it('should throw InsufficientInventoryError when items unavailable', async () => {
      mockInventoryClient.checkAvailability.mockResolvedValue([
        { productId: 'p1', available: true },
        { productId: 'p2', available: false }
      ]);

      await expect(
        orderService.createOrder(customerId, validItems)
      ).rejects.toThrow(InsufficientInventoryError);
    });

    it('should check inventory before calculating pricing', async () => {
      await orderService.createOrder(customerId, validItems);

      expect(mockInventoryClient.checkAvailability).toHaveBeenCalledBefore(
        mockPricingService.calculate
      );
    });
  });

  describe('calculateDiscount', () => {
    const order = { subtotal: 200, shippingCost: 15 };

    it('should apply percentage discount correctly', () => {
      const discount = orderService.calculateDiscount(order, 'SAVE10');
      expect(discount).toBe(20); // 10% of 200
    });

    it('should apply fixed discount correctly', () => {
      const discount = orderService.calculateDiscount(order, 'FLAT50');
      expect(discount).toBe(50);
    });

    it('should not exceed subtotal for fixed discount', () => {
      const smallOrder = { subtotal: 30 };
      const discount = orderService.calculateDiscount(smallOrder, 'FLAT50');
      expect(discount).toBe(30);
    });

    it('should apply shipping discount correctly', () => {
      const discount = orderService.calculateDiscount(order, 'FREESHIP');
      expect(discount).toBe(15);
    });

    it('should throw for invalid discount code', () => {
      expect(() => {
        orderService.calculateDiscount(order, 'INVALID');
      }).toThrow('Invalid discount code');
    });
  });
});

Integration Testing

Database Integration Tests

Unit tests with mocked repositories can be 100% green while your SQL is completely broken. The mock does not know your column is customer_id and you typed customerid. The mock does not know your foreign key constraint will reject orphaned rows. The mock does not know your JSONB column needs a ::jsonb cast. Every one of these bugs only surfaces when code meets a real database. That is the purpose of integration tests: to verify that your persistence layer actually works against the real storage engine, with real schemas, real constraints, and real SQL dialects. The architectural insight is that integration tests are where your “database per service” principle gets tested. In a microservices world, each service owns its schema. Integration tests verify that your service correctly manages its own schema without reaching into shared tables or assuming knowledge of other services’ data. They also catch performance cliffs — a query that runs in 2 ms on an empty table might take 2 seconds at 10 million rows. Seeding realistic data volumes in integration tests surfaces these issues early. The trade-off: integration tests are slower than unit tests (seconds instead of milliseconds) and require infrastructure (a running Postgres, a migration system). That is fine — you write fewer of them, and you target the code paths that unit tests cannot cover: the SQL, the ORM mappings, the transactional boundaries. If you skip integration tests, your first integration “test” runs in production, with real customer data, at 3 AM.
// tests/integration/OrderRepository.test.js
const { OrderRepository } = require('../../repositories/OrderRepository');
const { createTestDatabase, closeTestDatabase } = require('../helpers/database');

describe('OrderRepository Integration', () => {
  let orderRepository;
  let db;

  beforeAll(async () => {
    db = await createTestDatabase();
    orderRepository = new OrderRepository(db);
  });

  afterAll(async () => {
    await closeTestDatabase(db);
  });

  beforeEach(async () => {
    await db.query('DELETE FROM orders');
    await db.query('DELETE FROM order_items');
  });

  describe('save', () => {
    it('should persist order to database', async () => {
      const order = {
        id: 'order-123',
        customerId: 'customer-456',
        items: [{ productId: 'p1', quantity: 2, price: 50 }],
        total: 100,
        status: 'PENDING'
      };

      await orderRepository.save(order);

      const result = await db.query(
        'SELECT * FROM orders WHERE id = $1',
        [order.id]
      );

      expect(result.rows[0]).toMatchObject({
        id: order.id,
        customer_id: order.customerId,
        total: '100.00',
        status: 'PENDING'
      });
    });

    it('should save order items', async () => {
      const order = {
        id: 'order-123',
        customerId: 'customer-456',
        items: [
          { productId: 'p1', quantity: 2, price: 50 },
          { productId: 'p2', quantity: 1, price: 30 }
        ],
        total: 130,
        status: 'PENDING'
      };

      await orderRepository.save(order);

      const result = await db.query(
        'SELECT * FROM order_items WHERE order_id = $1',
        [order.id]
      );

      expect(result.rows).toHaveLength(2);
    });
  });

  describe('findByCustomerId', () => {
    beforeEach(async () => {
      // Seed test data
      await orderRepository.save({
        id: 'order-1',
        customerId: 'customer-A',
        items: [],
        total: 100,
        status: 'COMPLETED'
      });
      await orderRepository.save({
        id: 'order-2',
        customerId: 'customer-A',
        items: [],
        total: 200,
        status: 'PENDING'
      });
      await orderRepository.save({
        id: 'order-3',
        customerId: 'customer-B',
        items: [],
        total: 150,
        status: 'COMPLETED'
      });
    });

    it('should return orders for specific customer', async () => {
      const orders = await orderRepository.findByCustomerId('customer-A');

      expect(orders).toHaveLength(2);
      expect(orders.every(o => o.customerId === 'customer-A')).toBe(true);
    });

    it('should return empty array for customer with no orders', async () => {
      const orders = await orderRepository.findByCustomerId('customer-C');
      expect(orders).toEqual([]);
    });
  });
});

API Integration Tests

Integration tests at the API level verify the entire request-response lifecycle: routing, middleware, authentication, request parsing, business logic, database writes, and response serialization. This is the layer where most “works on my machine” bugs live — a middleware that strips authentication headers in production but not in dev, a request body parser that silently truncates large payloads, a response serializer that exposes internal fields. The supertest library in Node.js and httpx.AsyncClient in Python both test the application without opening a real network socket. They invoke the application’s request handler directly with a synthetic request object. This is faster than spinning up an HTTP server, and it gives you the same level of realism for middleware and routing. You are still testing the real app — just without the network layer that would add latency and potential flakiness. A common mistake in API integration tests is relying on side effects from previous tests (a test that creates an order, followed by a test that assumes the order exists). This creates order-dependent tests that pass locally but fail in CI when tests run in parallel or a different order. Always reset state between tests. The beforeEach delete in the example below is not optional paranoia — it is the difference between a test suite that works and one that develops mysterious flakiness over time.
Caveats & Common Pitfalls: Integration Tests Hitting Shared Staging DB
  1. Shared staging database across developers and CI. Two CI runs collide; one deletes the other’s test data mid-test; both runs fail intermittently. Engineers learn to “just rerun the pipeline” and stop trusting CI entirely.
  2. No cleanup, accumulating test data. Your test_orders table grows to 10M rows over months. Suddenly the find_by_customer_id test takes 40 seconds because the index is bloated and the seed phase churns through locks.
  3. Schema drift between environments. Local Postgres 14, staging Postgres 15, prod Aurora Postgres 13. A JSONB behavior difference passes locally and fails in prod — or worse, passes in staging and fails in prod.
  4. Tests depending on test data that does not exist in fresh DBs. A test assumes customer_id cust-1 exists because someone seeded it manually two years ago. A fresh-spin environment has no such row; the test fails with a cryptic foreign key error.
Solutions & Patterns
  • Testcontainers per test suite. Every CI run spins its own ephemeral Postgres container. Zero shared state, zero cross-run contamination, identical version and config as production.
  • Transactional rollback per test. Start a transaction in beforeEach, roll back in afterEach. The test runs against real SQL but leaves zero trace. Dramatically faster than TRUNCATE between tests.
  • Pin the Postgres version to match production. If prod is aws:aurora-postgres:14.7, your Testcontainers image is postgres:14.7-alpine. Drift between test and prod DB versions is a silent bug generator.
  • Ephemeral schema per test run. Create a uniquely-named schema (test_<run-id>) at start, drop it at end. Multiple CI runs can share a single container without collision.
  • Factories, not fixtures. Every test creates the data it needs via factories (FactoryBot, factory_boy). No test depends on data created by another test or a seed script.
// tests/integration/OrderAPI.test.js
const request = require('supertest');
const { createApp } = require('../../app');
const { createTestDatabase, closeTestDatabase } = require('../helpers/database');

describe('Order API Integration', () => {
  let app;
  let db;
  let authToken;

  beforeAll(async () => {
    db = await createTestDatabase();
    app = createApp({ db });
    
    // Get auth token
    authToken = await getTestAuthToken();
  });

  afterAll(async () => {
    await closeTestDatabase(db);
  });

  beforeEach(async () => {
    await db.query('DELETE FROM orders');
  });

  describe('POST /orders', () => {
    it('should create order and return 201', async () => {
      const orderData = {
        items: [
          { productId: 'product-1', quantity: 2 }
        ]
      };

      const response = await request(app)
        .post('/orders')
        .set('Authorization', `Bearer ${authToken}`)
        .send(orderData)
        .expect(201);

      expect(response.body).toMatchObject({
        id: expect.any(String),
        status: 'PENDING',
        items: expect.arrayContaining([
          expect.objectContaining({ productId: 'product-1' })
        ])
      });
    });

    it('should return 400 for invalid order data', async () => {
      const response = await request(app)
        .post('/orders')
        .set('Authorization', `Bearer ${authToken}`)
        .send({ items: [] })
        .expect(400);

      expect(response.body.error).toContain('at least one item');
    });

    it('should return 401 without auth token', async () => {
      await request(app)
        .post('/orders')
        .send({ items: [{ productId: 'p1', quantity: 1 }] })
        .expect(401);
    });
  });

  describe('GET /orders/:id', () => {
    it('should return order by id', async () => {
      // Create order first
      const createResponse = await request(app)
        .post('/orders')
        .set('Authorization', `Bearer ${authToken}`)
        .send({ items: [{ productId: 'p1', quantity: 1 }] });

      const orderId = createResponse.body.id;

      const response = await request(app)
        .get(`/orders/${orderId}`)
        .set('Authorization', `Bearer ${authToken}`)
        .expect(200);

      expect(response.body.id).toBe(orderId);
    });

    it('should return 404 for non-existent order', async () => {
      await request(app)
        .get('/orders/non-existent-id')
        .set('Authorization', `Bearer ${authToken}`)
        .expect(404);
    });
  });
});

Contract Testing with Pact

Contract testing is the unsung hero of microservices testing. Here is the core problem it solves: Service A depends on Service B’s API. Service B deploys a change that breaks Service A. Without contract tests, you only discover this in staging (or worse, production). With contract tests, Service B’s CI pipeline fails before the breaking change ever merges. Think of it like a legal contract between two businesses — the consumer writes down what they expect, and the provider agrees to deliver it. If the provider changes the deal, the contract breaks visibly. Why not just use integration tests across services? Because integration tests require running both services together, which means you need an integration environment, shared test data, and coordinated deployments. Contract tests invert the relationship: the consumer runs its test against a Pact mock server (fast, isolated), which records what the consumer expects. The provider later replays those expectations against its real implementation (also fast, also isolated). No shared environment is required. Each team tests independently, on their own CI, and incompatibilities are detected at PR time rather than at integration time. This is the testing strategy that makes independent deployability actually work. The trade-off: contract tests verify syntax (field names, types, HTTP status codes), not semantics. A provider can rename a field and fail the contract; it cannot change the meaning of a field and still pass the contract. For semantic verification, you need integration tests or explicit versioning. Contract tests are necessary but not sufficient — they are the first line of defense, not the only line.

Consumer Contract Test

// tests/contract/OrderService.consumer.test.js
const { Pact } = require('@pact-foundation/pact');
const { PaymentClient } = require('../../clients/PaymentClient');
const path = require('path');

describe('Order Service - Payment Service Contract', () => {
  const provider = new Pact({
    consumer: 'OrderService',
    provider: 'PaymentService',
    port: 8081,
    log: path.resolve(process.cwd(), 'logs', 'pact.log'),
    dir: path.resolve(process.cwd(), 'pacts'),
    logLevel: 'warn'
  });

  beforeAll(() => provider.setup());
  afterAll(() => provider.finalize());
  afterEach(() => provider.verify());

  describe('processPayment', () => {
    it('should process payment successfully', async () => {
      // Define expected interaction
      await provider.addInteraction({
        state: 'customer has valid payment method',
        uponReceiving: 'a request to process payment',
        withRequest: {
          method: 'POST',
          path: '/payments',
          headers: {
            'Content-Type': 'application/json'
          },
          body: {
            orderId: '12345',
            amount: 100.00,
            currency: 'USD',
            customerId: 'customer-123'
          }
        },
        willRespondWith: {
          status: 200,
          headers: {
            'Content-Type': 'application/json'
          },
          body: {
            id: like('pay_abc123'),
            orderId: '12345',
            status: 'COMPLETED',
            amount: 100.00,
            processedAt: like('2024-01-15T10:30:00Z')
          }
        }
      });

      // Execute the actual client
      const client = new PaymentClient(`http://localhost:8081`);
      const result = await client.processPayment({
        orderId: '12345',
        amount: 100.00,
        currency: 'USD',
        customerId: 'customer-123'
      });

      expect(result.status).toBe('COMPLETED');
      expect(result.orderId).toBe('12345');
    });

    it('should handle payment failure', async () => {
      await provider.addInteraction({
        state: 'customer has insufficient funds',
        uponReceiving: 'a request to process payment',
        withRequest: {
          method: 'POST',
          path: '/payments',
          headers: {
            'Content-Type': 'application/json'
          },
          body: {
            orderId: '12345',
            amount: 10000.00,
            currency: 'USD',
            customerId: 'customer-456'
          }
        },
        willRespondWith: {
          status: 402,
          headers: {
            'Content-Type': 'application/json'
          },
          body: {
            error: 'INSUFFICIENT_FUNDS',
            message: like('Payment declined due to insufficient funds')
          }
        }
      });

      const client = new PaymentClient(`http://localhost:8081`);
      
      await expect(
        client.processPayment({
          orderId: '12345',
          amount: 10000.00,
          currency: 'USD',
          customerId: 'customer-456'
        })
      ).rejects.toThrow('INSUFFICIENT_FUNDS');
    });
  });
});

// Pact matchers
const { like, eachLike, term } = require('@pact-foundation/pact').Matchers;

Provider Contract Test

The provider side of contract testing is where the consumer’s expectations meet the provider’s reality. The provider downloads pact files (usually from a Pact Broker), then replays each interaction against its real, running service. Critically, the provider must restore its database state before each interaction so that the “given” preconditions hold — this is what stateHandlers are for. If you skip provider verification, contract tests give you false confidence. The consumer test only verifies that the consumer code works against its own assumptions. Until the provider runs verification, you have no proof that the real provider actually behaves that way. Teams that “have Pact tests” but never run provider verification are in the worst of both worlds — they have the maintenance burden of Pact without the benefit.
Caveats & Common Pitfalls: Contract Tests Not Run in CI
  1. Consumer writes contracts, provider never verifies them. Pact files accumulate in a broker, nobody enforces them, and the provider ships a breaking change that only surfaces in staging. The contracts became documentation, not gates.
  2. Provider verification runs only nightly, not on every PR. A breaking merge happens at 10am; the verification failure lands at 2am; by morning it is blocking 20 other PRs. Verification must be a PR gate, not a nightly job.
  3. Contracts without can-i-deploy checks. Even with verification passing on both sides, you deploy a provider version the consumer has never seen. Pact Broker’s can-i-deploy answers “is this deploy safe for all current consumers?” — skip it and you ship risk blindly.
  4. Contracts covering only the happy path. The consumer records 200 OK responses but never the 404 or 429 error shapes. Provider changes the error envelope; consumer parses it wrong; contract passes because nobody wrote an error-case contract.
Solutions & Patterns
  • Wire provider verification into the PR pipeline. Every provider PR runs pact-verifier against the latest consumer contracts before merging. If verification fails, the PR is blocked.
  • Use Pact Broker’s webhooks + can-i-deploy. Consumer CI asks the broker: “based on the verified pacts for provider version X, can I deploy consumer version Y?” If no, halt the deploy.
  • Write contracts for error cases. Document 404, 400, 401, 429, 5xx response shapes as explicit interactions. These are where breakages most often hide.
  • Version the broker state. Tag pacts with git SHA and environment (dev, staging, prod). A production deploy only proceeds if the prod tag is compatible, not the floating latest.
  • Make the consumer publish on every CI run, not just merges. This surfaces the consumer-provider mismatch at PR review time, long before merge.
// tests/contract/PaymentService.provider.test.js
const { Verifier } = require('@pact-foundation/pact');
const { createApp } = require('../../app');
const path = require('path');

describe('Payment Service - Provider Verification', () => {
  let server;

  beforeAll(async () => {
    const app = createApp();
    server = app.listen(8082);
  });

  afterAll(() => {
    server.close();
  });

  it('should validate the expectations of Order Service', async () => {
    const opts = {
      provider: 'PaymentService',
      providerBaseUrl: 'http://localhost:8082',
      pactUrls: [
        path.resolve(process.cwd(), 'pacts', 'orderservice-paymentservice.json')
      ],
      // Or use Pact Broker
      // pactBrokerUrl: 'https://your-broker.pactflow.io',
      // pactBrokerToken: process.env.PACT_BROKER_TOKEN,
      stateHandlers: {
        'customer has valid payment method': async () => {
          // Set up test data for valid payment method
          await seedTestData({
            customerId: 'customer-123',
            paymentMethods: [{ type: 'card', valid: true }]
          });
        },
        'customer has insufficient funds': async () => {
          // Set up test data for insufficient funds
          await seedTestData({
            customerId: 'customer-456',
            balance: 0
          });
        }
      },
      publishVerificationResult: process.env.CI === 'true',
      providerVersion: process.env.GIT_COMMIT || '1.0.0'
    };

    await new Verifier(opts).verifyProvider();
  });
});

Component Testing

Component tests sit between integration and E2E tests. The idea: spin up your service with real infrastructure (databases, message brokers) but mock other services. This catches the bugs that unit tests miss (ORM misconfigurations, SQL syntax errors, message serialization issues) without the fragility of full E2E tests. Testcontainers makes this practical by spinning up real Docker containers that are destroyed after each test suite. Why not just use more integration tests? Because integration tests typically scope to one module (a repository, a handler). Component tests scope to the whole service: HTTP in, HTTP out, with real infrastructure underneath. They are the test you trust when you ask “if I deploy this service to prod alone, does it work?” They catch wiring bugs (did I register that middleware?), configuration bugs (did I wire the correct environment variable?), and startup ordering bugs (did I await the database connection before accepting requests?). No other test type catches these. Production pitfall: The 60-second timeout on beforeAll is not optional. Container startup is unpredictable — if your CI runner is under load, Postgres might take 30 seconds to initialize. Flaky tests here almost always trace back to insufficient timeouts. A related trap: reusing containers across test runs for speed. It works until a previous run’s state leaks into the current one. Prefer fresh containers per suite and invest in parallelism to keep total time reasonable.
// tests/component/OrderService.component.test.js
const { GenericContainer, Wait } = require('testcontainers');
const request = require('supertest');
const { createApp } = require('../../app');

describe('Order Service Component Test', () => {
  let postgresContainer;
  let kafkaContainer;
  let app;

  beforeAll(async () => {
    // Start PostgreSQL container
    postgresContainer = await new GenericContainer('postgres:15-alpine')
      .withEnvironment({
        POSTGRES_DB: 'orders_test',
        POSTGRES_USER: 'test',
        POSTGRES_PASSWORD: 'test'
      })
      .withExposedPorts(5432)
      .withWaitStrategy(Wait.forLogMessage('database system is ready'))
      .start();

    // Start Kafka container (optional)
    kafkaContainer = await new GenericContainer('confluentinc/cp-kafka:7.5.0')
      .withExposedPorts(9092)
      .start();

    // Create app with real containers
    const dbUrl = `postgresql://test:test@${postgresContainer.getHost()}:${postgresContainer.getMappedPort(5432)}/orders_test`;
    
    app = createApp({
      databaseUrl: dbUrl,
      kafkaBrokers: `${kafkaContainer.getHost()}:${kafkaContainer.getMappedPort(9092)}`
    });

    // Run migrations
    await runMigrations(dbUrl);
  }, 60000);

  afterAll(async () => {
    await postgresContainer?.stop();
    await kafkaContainer?.stop();
  });

  describe('Order Lifecycle', () => {
    it('should complete full order flow', async () => {
      // 1. Create order
      const createResponse = await request(app)
        .post('/orders')
        .set('Authorization', 'Bearer test-token')
        .send({
          customerId: 'customer-123',
          items: [
            { productId: 'product-1', quantity: 2 }
          ]
        })
        .expect(201);

      const orderId = createResponse.body.id;
      expect(createResponse.body.status).toBe('PENDING');

      // 2. Confirm order
      await request(app)
        .post(`/orders/${orderId}/confirm`)
        .set('Authorization', 'Bearer test-token')
        .expect(200);

      // 3. Check order status
      const statusResponse = await request(app)
        .get(`/orders/${orderId}`)
        .set('Authorization', 'Bearer test-token')
        .expect(200);

      expect(statusResponse.body.status).toBe('CONFIRMED');
    });
  });
});

End-to-End Testing

E2E tests verify complete user journeys across multiple services. They are the most realistic but also the most expensive and fragile. The rule of thumb: have fewer than 10 E2E tests covering only your critical revenue paths. If your checkout flow breaks, you lose money. If a profile avatar upload is slightly slow, nobody notices. Why limit E2E tests? Because every additional service in the test adds a multiplier to failure modes. A test that crosses 5 services has 5 networks to fail, 5 databases to desynchronize, 5 version mismatches to worry about, and 5 containers to start. The math is brutal: even if each service has 99% reliability during tests, the combined suite has 0.99^5 = 95% reliability — meaning 1 in 20 runs fails for environmental reasons alone. This is how “just rerun CI” becomes a team’s default debugging technique, which in turn makes real regressions invisible. Keep E2E tests scarce and scope them to journeys where failure costs real money. Production pitfall: The setTimeout in the test below (waiting for async processing) is a code smell but sometimes unavoidable. In production E2E suites, replace it with a polling mechanism that checks for the expected state with a timeout, rather than a fixed delay. Fixed delays are the single biggest source of E2E flakiness — they either wait too little (test fails intermittently) or too much (test suite takes 30 minutes when it should take 3).
Caveats & Common Pitfalls: E2E Tests So Slow Nobody Runs Them
  1. Full E2E suite takes 45 minutes. Engineers stop running it locally, run it only in a scheduled nightly job, and merge PRs without E2E coverage. Bugs land on main and are caught 12 hours later.
  2. E2E suite covers happy paths for every feature. Fifty tests for CRUD flows. The ratio of value to runtime is terrible — most of these could be integration tests.
  3. E2E against shared staging environment. Developer A’s manual test data mutates what developer B’s E2E assumes. Flakes fly. People stop trusting failures. Real regressions merge.
  4. Retries as an anti-flakiness solution. “Retry failed E2E tests 3 times” hides genuine bugs. A test that fails once in five runs and passes on retry is telling you something — probably about production — and you are silencing it.
Solutions & Patterns
  • Cap E2E at 5-10 tests, covering only revenue-critical journeys (checkout, signup, payment). Every other behavior belongs in integration or contract tests.
  • Run E2E pre-deploy, not pre-merge. E2E gates the promotion to production, not the merge to main. PR CI runs unit + integration + contract; a separate pipeline runs E2E against a freshly-built release candidate.
  • Ephemeral E2E environments per test run. Spin up the full stack via docker-compose or a dedicated k8s namespace, run the suite, tear it down. Zero shared state, zero cross-test contamination.
  • Polling with timeouts, never fixed sleeps. await poll_until(condition, timeout=10) is reliable; await sleep(5) is either slow or flaky. Almost always both.
  • When a test flakes, quarantine or fix it same day. Flaky tests are technical debt that compounds. Each “flaky” test teaches the team to ignore failures.
// tests/e2e/checkout.e2e.test.js
const axios = require('axios');

describe('Checkout E2E Flow', () => {
  const baseUrl = process.env.E2E_BASE_URL || 'http://localhost:8080';
  let authToken;
  let userId;

  beforeAll(async () => {
    // Login and get auth token
    const loginResponse = await axios.post(`${baseUrl}/auth/login`, {
      email: 'e2e-test@example.com',
      password: 'test-password'
    });
    
    authToken = loginResponse.data.accessToken;
    userId = loginResponse.data.userId;
  });

  const api = () => axios.create({
    baseURL: baseUrl,
    headers: { Authorization: `Bearer ${authToken}` }
  });

  describe('Complete Checkout Journey', () => {
    let orderId;

    it('should add items to cart', async () => {
      const response = await api().post('/cart/items', {
        productId: 'product-123',
        quantity: 2
      });

      expect(response.status).toBe(200);
      expect(response.data.items).toHaveLength(1);
    });

    it('should create order from cart', async () => {
      const response = await api().post('/orders', {
        shippingAddress: {
          street: '123 Test St',
          city: 'Test City',
          country: 'US',
          postalCode: '12345'
        }
      });

      expect(response.status).toBe(201);
      orderId = response.data.id;
      expect(response.data.status).toBe('PENDING');
    });

    it('should process payment', async () => {
      const response = await api().post(`/orders/${orderId}/pay`, {
        paymentMethod: {
          type: 'card',
          token: 'tok_visa'
        }
      });

      expect(response.status).toBe(200);
      expect(response.data.paymentStatus).toBe('COMPLETED');
    });

    it('should show order as confirmed', async () => {
      // Wait for async processing
      await new Promise(resolve => setTimeout(resolve, 2000));

      const response = await api().get(`/orders/${orderId}`);

      expect(response.status).toBe(200);
      expect(response.data.status).toBe('CONFIRMED');
      expect(response.data.paymentId).toBeDefined();
    });

    it('should update inventory', async () => {
      const response = await api().get('/inventory/product-123');

      expect(response.status).toBe(200);
      // Verify inventory was reduced
    });
  });
});

Test Environment Setup

A dedicated test environment via Docker Compose gives every developer the same reproducible test infrastructure. Notice the tmpfs mount for Postgres below — this stores the database entirely in memory, making integration tests 3-5x faster since there are no disk writes. This single trick can cut your CI pipeline time significantly. Why dedicate infrastructure for tests? Because sharing dev infrastructure creates order-dependent flakiness. Developer A’s lunch-break test run leaves stale data; developer B’s CI run fails mysteriously. A test-only compose file ensures every run starts from a clean, known state. The memory-backed database trick works because tests rarely need durability — a disk crash during CI is no worse than any other CI failure. Trading durability for speed is exactly the kind of test-environment-specific optimization that is wrong in production but right here.
# docker-compose.test.yml
version: '3.8'

services:
  test-db:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: test
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    ports:
      - "5433:5432"
    tmpfs:
      - /var/lib/postgresql/data  # Speed up tests

  test-redis:
    image: redis:7-alpine
    ports:
      - "6380:6379"

  test-kafka:
    image: confluentinc/cp-kafka:7.5.0
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9093
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@test-kafka:9093
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      CLUSTER_ID: 'test-cluster-id'
    ports:
      - "9093:9092"

Test Runner Configuration

Splitting tests into projects (unit/integration/e2e) lets developers run fast tests often and slow tests rarely. The coverageThreshold gate prevents silent coverage regressions — if a PR drops coverage below 80%, CI fails. This is more useful than it sounds: in practice, it forces engineers to at least think about testing before merging, which is a surprisingly effective cultural lever.
// jest.config.js
module.exports = {
  projects: [
    {
      displayName: 'unit',
      testMatch: ['<rootDir>/tests/unit/**/*.test.js'],
      testEnvironment: 'node',
      setupFilesAfterEach: ['<rootDir>/tests/setup/unit.js']
    },
    {
      displayName: 'integration',
      testMatch: ['<rootDir>/tests/integration/**/*.test.js'],
      testEnvironment: 'node',
      setupFilesAfterEach: ['<rootDir>/tests/setup/integration.js'],
      globalSetup: '<rootDir>/tests/setup/integration-global-setup.js',
      globalTeardown: '<rootDir>/tests/setup/integration-global-teardown.js'
    },
    {
      displayName: 'e2e',
      testMatch: ['<rootDir>/tests/e2e/**/*.test.js'],
      testEnvironment: 'node',
      testTimeout: 30000
    }
  ],
  collectCoverageFrom: [
    'src/**/*.js',
    '!src/**/*.test.js'
  ],
  coverageThreshold: {
    global: {
      branches: 80,
      functions: 80,
      lines: 80,
      statements: 80
    }
  }
};

Interview Questions

Answer:Approaches:
  1. Contract Testing (Pact)
    • Consumer defines expected interactions
    • Provider verifies it can meet expectations
    • Catches breaking changes early
  2. Integration Testing
    • Test with real dependencies (test containers)
    • Slower but more realistic
    • Good for database interactions
  3. Service Virtualization
    • Mock external services
    • Consistent test data
    • Faster than real services
Best Practice:
  • Contract tests for API compatibility
  • Integration tests for data integrity
  • Mocks for unit tests
Answer:Contract Testing:
  • Consumer defines expectations (contract)
  • Provider verifies it meets contract
  • Changes detected before deployment
Why important:
  • Prevents integration failures
  • Enables independent deployment
  • Faster feedback than E2E tests
  • Documents service interactions
Consumer-Driven Contracts:
  1. Consumer writes contract tests
  2. Generates contract file
  3. Provider runs verification
  4. Breaks build if contract broken
Tools: Pact, Spring Cloud Contract
Answer:Strategies:
  1. Test Containers
    • Spin up real databases
    • Isolated per test suite
    • Clean state each run
  2. Test Data Builders / Factories
OrderBuilder.create()
  .withCustomer('123')
  .withItems([...])
  .build();
  1. Database Seeding
    • Known state before tests
    • Fixtures or factories
  2. Data Cleanup
    • Truncate after tests
    • Transaction rollback
    • Isolated test databases
Best Practices:
  • Each test manages own data
  • No dependencies between tests
  • Use factories for complex objects

Testing Anti-Patterns and Edge Cases

The “Ice Cream Cone” Anti-Pattern

Many teams invert the testing pyramid — heavy on E2E and manual tests, light on unit and contract tests. This creates slow CI pipelines (45+ minute builds), flaky test suites that everyone ignores, and a false sense of security because the tests that pass are testing the wrong things. How to detect it: If your team says “just run it in staging and check” more than twice a week, you have an ice cream cone.

Edge Case: Testing Async Event Flows

Testing event-driven interactions (Kafka, RabbitMQ) is where microservices testing gets genuinely hard. The challenge: you publish an event, another service consumes it, and you need to verify the side effect. You cannot just assert immediately because the consumer processes asynchronously. Approaches ranked by reliability:
ApproachReliabilitySpeedComplexity
Poll for expected state with timeoutHighSlow (seconds)Low
Use an in-memory event bus in testsHighFast (milliseconds)Medium
Testcontainers with real KafkaHighestSlow (startup overhead)High
Mock the message broker entirelyLow (misses serialization bugs)FastestLow
A senior engineer would say: “For unit tests, mock the broker. For integration tests, use Testcontainers with real Kafka. For contract tests, verify the event schema independently of the transport.”

Edge Case: Testing Database Migrations

Migrations that work on an empty database often fail on a production database with millions of rows. Your test suite should include:
  1. Forward migration test — apply migration to a seeded database with realistic data volumes
  2. Rollback test — verify the down migration restores the previous schema
  3. Data integrity test — verify existing data survives the migration without corruption
  4. Performance test — ensure the migration completes within your maintenance window (an ALTER TABLE on a 500M row table can take hours)

Summary

Key Takeaways

  • Follow the testing pyramid
  • Unit tests for business logic
  • Contract tests for API compatibility
  • Integration tests with test containers
  • E2E for critical paths only

Next Steps

In the next chapter, we’ll cover Interview Preparation - common questions and system design exercises.

Interview Deep-Dive

Strong Answer:45-minute integration tests that break frequently are a symptom of two problems: the test suite is doing too much, and the test environment is not isolated.The first fix is pushing tests down the pyramid. Most of those 45 minutes are probably spent on end-to-end tests that spin up multiple services. I would audit the test suite and reclassify: if a test verifies business logic, it should be a unit test with mocked dependencies (runs in milliseconds). If it verifies database queries, it should be an integration test against a local database using Testcontainers (runs in seconds). If it verifies the API contract between two services, it should be a contract test with Pact (runs in seconds). Only critical user journeys (place order, process payment, receive confirmation) should be E2E tests.The second fix is test isolation. Each test should spin up its own dependencies and clean up after itself. Testcontainers creates disposable Docker containers for PostgreSQL, Redis, and Kafka per test suite. No shared staging database that accumulates test data and creates flaky failures when two CI runs interfere with each other.The third fix is parallelization. If I have 15 services, their test suites should run in parallel in CI, not sequentially. Each service’s pipeline runs independently — Service A’s tests should never wait for Service B’s tests. The only shared gate is contract tests, which verify compatibility between services.After these changes, the typical result is: unit tests (30 seconds), integration tests with Testcontainers (2-3 minutes), contract tests (1-2 minutes), and E2E tests for 3-5 critical paths (5-10 minutes). Total: under 15 minutes, running mostly in parallel.Follow-up: “How do contract tests work in practice, and who is responsible for writing them — the consumer or the provider?”The consumer writes the contract. This is called Consumer-Driven Contract Testing (Pact). The Order Service (consumer) writes a test that says “when I call GET /users/123 on User Service, I expect a response with {id, name, email}.” This contract is published to a Pact Broker. The User Service (provider) CI pipeline downloads all consumer contracts and verifies that its API satisfies them. If User Service renames “email” to “emailAddress,” the contract test fails before the change reaches production. The beauty is that neither team runs the other’s service — the contract is the intermediary. This catches breaking changes without requiring a full integration environment.
Strong Answer:You test the saga at three levels, and only the last level requires multiple services.Level one: unit test the saga orchestrator logic. The orchestrator is a state machine — given state X and event Y, it transitions to state Z and sends command W. I test every state transition and every compensation path with unit tests. The downstream services are mocked interfaces. These tests verify the workflow logic: “if payment fails after inventory is reserved, the orchestrator sends a release-inventory command and then a cancel-order command.” This catches logic errors in the saga definition itself.Level two: integration test each service’s saga participant independently. Test that the Payment Service correctly handles a “charge-payment” command and publishes a “payment-charged” event. Test that it handles a “refund-payment” compensation command. These tests use Testcontainers for the service’s own database and a local Kafka/RabbitMQ instance. Each service is tested in isolation with real infrastructure but without other services.Level three: a focused E2E test for the happy path and the most critical failure paths. I run all 4 services (using docker-compose in CI) and test: (1) successful order placement end-to-end, (2) payment failure triggers compensation across all services, and (3) partial failure mid-saga (kill inventory service during processing) to verify the orchestrator retries and eventually completes. These tests are slow (30-60 seconds each) so I keep them to 3-5 scenarios.The key insight: level one catches 80% of saga bugs (wrong state transitions, missing compensation steps) and runs in milliseconds. Level two catches integration bugs (serialization mismatches, database constraint violations) and runs in seconds. Level three is a confidence check, not a primary testing layer.Follow-up: “How do you test idempotency — the guarantee that processing the same message twice produces the same result?”I write explicit idempotency tests at level two. For each saga participant, I send the same command twice (with the same idempotency key) and verify that the second invocation is a no-op: no duplicate database records, no duplicate side effects (no double charge), and the response is the same as the first invocation. I also test the edge case where the service crashes after processing but before acknowledging the message — on restart, it receives the message again and must handle it idempotently. Testcontainers makes this testable by letting me kill and restart the service mid-test.
Strong Answer:The traditional testing pyramid (many unit tests, fewer integration tests, even fewer E2E tests) was designed for monoliths where the boundaries are all in-process function calls. In microservices, the most valuable bugs are at the integration boundaries — API contract mismatches, serialization errors, incorrect event schemas, timeout behavior under load. Unit tests do not catch these because they mock the boundaries away.The testing diamond (or testing honeycomb, as Spotify calls it) places the most emphasis on integration tests and contract tests — the middle layer. It has fewer unit tests (only for complex business logic), more integration tests (service + real database, service + real message broker), more contract tests (verify API compatibility between services), and fewer E2E tests (only for critical user journeys).The rationale: in a microservice with a typical CRUD API, a unit test that mocks the database verifies that your mock returns what you told it to return — it does not verify that the SQL query actually works, that the database constraints are correct, or that the ORM mapping is accurate. An integration test with Testcontainers running a real PostgreSQL instance catches all of these.However, I do not completely abandon unit tests. Complex business logic — pricing calculations, saga state machine transitions, validation rules — should absolutely be unit tested because the logic is intricate and the tests are fast. The diamond shape means: thin unit layer (only for pure logic), thick integration layer (service + real dependencies), thick contract layer (API compatibility), thin E2E layer (critical paths only).The pragmatic approach I use: if a test can catch the bug with mocked dependencies, make it a unit test. If it can only catch the bug with real dependencies, make it an integration test. If it can only catch the bug with multiple services, make it a contract test or E2E test. This naturally produces a diamond shape for most microservices.Follow-up: “How do you keep integration tests fast enough that developers actually run them before pushing?”Testcontainers with reusable containers. Instead of starting a fresh PostgreSQL container for every test file, I start one container per test suite run and reset the database between tests (truncate tables or use transactional test isolation). A Testcontainers PostgreSQL instance starts in 2-3 seconds and stays alive for the entire test suite. With this approach, 100 integration tests run in under 30 seconds. The key is test isolation through transactions (start transaction before test, rollback after) rather than through container recreation.

Scenario-Driven Interview Questions

Strong Answer Framework
  1. Start with the risk map. Which integrations are financial (payments), which are eventual-consistency safe (notifications), which affect user data? The testing investment follows risk: payments gets contract + integration + an explicit E2E; notifications gets contract + a single smoke test.
  2. Unit tests for internal logic. Inject each of the 4 dependencies as interfaces. Test business rules in isolation with mocks that return controlled responses: happy path, timeout, 4xx, 5xx, malformed body. Target 200-500 unit tests running in under 30 seconds total.
  3. Integration tests for your own persistence. Spin up real Postgres via Testcontainers. Exercise every repository method, every migration, every complex query. No mocks for your own DB — ever. Target 30-100 tests in under 2 minutes.
  4. Contract tests for every downstream. Write consumer-driven contracts for each of the 4 services using Pact. The contracts live in your repo; publish them to a broker on every merge. Each downstream’s CI verifies the contracts before merging their own changes. This is the single most valuable test tier for multi-service architectures.
  5. Component tests for the service boundary. Start your service with Testcontainers Postgres, real Kafka, and WireMock stubs for the 4 downstreams. Make HTTP requests against the service. This catches wiring bugs (middleware order, config injection, startup sequence) that unit tests cannot.
  6. A handful of E2E tests for critical paths. One test per revenue-critical flow (e.g., full order placement end-to-end). Run these in a dedicated staging pipeline, not per-commit. Keep total count under 10.
  7. Production verification. Synthetic transactions every 60 seconds from a third-party prober (or Kubernetes CronJob). Alerts fire if the user-facing journey breaks. This is your continuous E2E.
Real-World ExampleMonzo Bank’s engineering team wrote publicly in 2020 about how they test ~1,500 microservices. The core insight: they rely overwhelmingly on contract tests (they built their own fork of Pact called Coach) and integration tests with ephemeral environments. E2E tests are kept under 20 across the entire bank, each covering a critical customer journey (money transfer, card payment, account opening). Unit and contract tests run per commit; E2E runs on release candidates only. Their deploy frequency reached ~100/day by 2020 with this model.Senior Follow-up Questions
Q: What about chaos tests and performance tests — where do they fit?A: Orthogonal to the functional pyramid. Chaos tests run in staging on a schedule (weekly), injecting failures (kill a downstream, slow a DB). Performance tests run pre-release against a production-shaped environment. Both produce alerts rather than gating deploys — you catch regressions over time, not per-commit.
Q: How do you test failure modes across 4 downstreams (one down, two slow, etc.)?A: Use WireMock or a similar programmable stub. Write component tests that configure each downstream to return errors, timeouts, or slow responses, then assert that your service degrades gracefully (circuit breaker opens, fallback activates, correct error returned to caller). These tests live at the component tier so you exercise real middleware behavior.
Q: How do you measure success of this testing strategy?A: Four signals: (1) escape rate — bugs that reach prod per 1000 commits, trending down; (2) CI time — under 15 minutes for the full pre-merge pipeline; (3) flakiness — test failure rate when nothing changed, under 1%; (4) confidence — engineers merge on green without “let me test this in staging first” qualifiers.
Common Wrong Answers
  • “Lots of E2E tests covering every feature.” Inverts the pyramid. Slow, flaky, and catches bugs too late. Discussed at length as the ice cream cone anti-pattern.
  • “Mock all 4 services in unit tests, skip contracts.” Mocks drift from reality. The first time you find out the real service’s schema changed is in staging, if you are lucky, or production if not.
  • “Run everything against a shared staging environment.” Cross-test contamination guarantees flakiness. Shared state is the enemy of reliable tests.
Further Reading
  • Martin Fowler’s “Microservice Testing” series: martinfowler.com/articles/microservice-testing — the canonical reference.
  • “Testing Microservices” by Sam Newman (chapters from “Building Microservices”, 2nd Edition).
  • Monzo Engineering Blog: “How we test at Monzo” — real-world example at scale.
Strong Answer Framework
  1. Instrument before you guess. Turn on per-test timing and failure counts. The 45-minute runtime is usually 20% of tests consuming 80% of time; the flakiness is usually 5% of tests producing 90% of failures. Data first, fixes second.
  2. Sort tests by flakiness rate. For each test, compute failures / total runs over the last 30 days. Any test above 2% is either genuinely racy, depends on shared state, or asserts something non-deterministic.
  3. Categorize the flaky tests. Three buckets: (a) timing — setTimeout-based waits, insufficient polling; (b) shared state — order-dependent, shared DB/environment; (c) external dependency — hitting a real API, random data from the network.
  4. Fix, do not retry. Never wrap flaky tests in retry logic — that hides real bugs. Instead: replace fixed sleeps with polling, replace shared state with ephemeral state, replace external APIs with Testcontainers or stubs.
  5. Attack the slowest tests. For tests above 5 seconds each, ask: can this be a unit test? Can I seed fixtures once instead of per-test? Can I run this test in parallel? Can I use transactional rollback instead of TRUNCATE?
  6. Parallelize the suite. With proper isolation (each test owns its data), integration tests parallelize trivially. pytest -n auto and Jest’s --maxWorkers cut wall-clock time by the worker count.
  7. Quarantine, fix, reinstate. A test that is flaky and cannot be immediately fixed goes to a quarantine suite — still runs, still reports, but does not block merges. File a ticket, fix within a sprint, or delete.
Real-World ExampleSpotify’s famous “Flaky Test Management” initiative (2018-2019) discovered that 1% of their tests generated 99% of CI failures. They built tooling that auto-quarantined tests with failure rate above 2%, notified owners, and blocked deploys after 30 days of unfixed quarantine. Within a quarter, CI reliability went from 80% to 97%. The counter-intuitive lesson: aggressively removing tests made the suite more valuable because the remaining tests were trusted.Senior Follow-up Questions
Q: How do you prevent the flakiness from creeping back in?A: Three mechanisms. (1) CI reports flakiness rates on every test file — visible in PR reviews. (2) Add a pre-merge check that tests cannot call real external APIs or Date.now() without a mock or frozen clock. (3) Flakiness budgets — each service owns a metric and pages the team if it exceeds threshold.
Q: Is it ever OK to retry a test?A: Only when the flakiness is in the infrastructure, not the test. Example: a Testcontainer failed to start because Docker was slow — retry is legitimate. Example: the test inconsistently asserts on timing — retry is covering a bug. Rule of thumb: if the retry policy lives in your CI config, it is infra; if it lives in the test file, it is a bug.
Q: How do you measure success of the effort?A: Three metrics trend weekly. (1) CI time P95 — should drop from 45 min to under 15. (2) Test flakiness rate (failures when nothing changed) — should drop below 1%. (3) Developer trust, measured indirectly by “retry count after failure” — if the average engineer retries once and then investigates, CI is trusted; if they retry three times before reading the failure, it is not.
Common Wrong Answers
  • “Add retries to the flaky tests.” Hides bugs, increases CI time, teaches engineers to ignore failures.
  • “Delete the slow tests.” Loses coverage without fixing the cause. Only valid after confirming the deleted test is redundant.
  • “Move everything to E2E so we do not need integration tests.” Inverts the pyramid entirely. Ten-minute integration tests become forty-minute E2E tests with worse signal.
Further Reading
  • Google Testing Blog: “Flaky Tests at Google and How We Mitigate Them” (2016) — the foundational paper.
  • Spotify Engineering: “Flaky Test Management at Scale” — practical quarantine workflow.
  • “Working Effectively with Legacy Code” by Michael Feathers — the chapter on fast feedback applies directly to CI speed.
Strong Answer Framework
  1. Reproduce the miss. Pull the exact pact file at the time of the breaking User-service merge. Did it contain the assertion that would have caught the break? If not, the consumer’s contract was incomplete; if yes, the provider’s verification pipeline was not running on this PR.
  2. Check the verification pipeline. Was pact-verifier a mandatory gate on User-service PRs? Most contract-test failures trace to “we had contracts, but verification was a nightly job, not a PR gate.” Fix: wire verifier into the PR required checks.
  3. Check contract completeness. Contract tests verify the shapes and status codes that the consumer actually exercises. If Order never called the specific endpoint that User broke, the contract never covered it. Fix: audit consumer code paths against contract coverage, file tickets for gaps.
  4. Check can-i-deploy in the deploy pipeline. Even with green verification, deploying User-service v2.5 without checking compatibility against the consumer version running in prod is risky. can-i-deploy answers: “are all consumers currently in prod compatible with what I am about to deploy?” If not wired in, wire it now.
  5. Audit tag hygiene. Pact Broker tags like prod, staging, master must move atomically on successful deploys. Drifted tags (stale prod) make can-i-deploy report false positives. Verify tag update is part of the deploy pipeline, not a manual step.
  6. Backfill with a regression test. Write a contract test that would have caught this specific break. Future consumer contracts inherit this shape, so the next team cannot accidentally ship the same mistake.
  7. Postmortem the gap, not the break. The real failure is the process: contracts existed but did not gate. Document the process change (PR gate, can-i-deploy, tag hygiene) and treat this as a lesson for all service teams.
Real-World ExampleSoundCloud wrote about a 2017 incident where the search service shipped a response-envelope change that broke the recommendations service for 4 hours. Both teams had Pact tests. The provider’s verification ran nightly, not on PRs; the breaking merge happened at 10am and was caught at 2am the next day. Their postmortem outcome: verification became a required PR check across all 500+ services, and can-i-deploy was added as a deploy gate. Similar incidents dropped to zero over the following year.Senior Follow-up Questions
Q: What if the breaking change was semantic, not structural (the shape stayed the same but the meaning changed)?A: Contract tests cannot catch semantics by design. Complement with integration tests on the consumer side for behaviorally-critical flows, and use stronger typing with runtime validation (Zod, Pydantic). For truly critical semantics, add explicit version fields and reject unknown versions.
Q: How do you handle the case where the provider has dozens of consumers and cannot coordinate with all of them?A: This is exactly what Pact Broker’s can-i-deploy solves at scale. The provider asks: “given all consumer contracts currently in prod, can I ship this new version?” If even one consumer is incompatible, deploy halts and the provider contacts that specific team. Alternative: backward compatibility as a policy — providers support both old and new shapes for a deprecation window.
Q: How do you measure whether the fixed process is working?A: Track breaking-change-related incidents per quarter. If the count is nonzero, something in the chain is still broken. Also monitor: % of services with required pact gates, % of deploys that triggered can-i-deploy failures (high is good — it means the gate caught real issues), time from provider PR to consumer awareness of break.
Common Wrong Answers
  • “We need more E2E tests.” E2E tests would have caught this break once, in a shared environment, probably after merge. Contract tests catch it at PR time. E2E is a fallback, not a primary defense against API compatibility.
  • “The provider should announce breaking changes.” Relies on humans remembering and writing correct announcements. The whole point of contract testing is that the test fails automatically; process changes that rely on human discipline regress over time.
Further Reading
  • Pact documentation: “Versioning and Branches” and “Can I Deploy” — docs.pact.io.
  • “Consumer-Driven Contracts: A Service Evolution Pattern” by Ian Robinson — martinfowler.com/articles/consumerDrivenContracts.html.
  • SoundCloud Engineering: “Scaling Pact Across 500 Services” — real adoption story with pitfalls.