Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Testing Strategies
Testing microservices is fundamentally harder than testing a monolith. Think of it like testing a single factory versus testing a supply chain — in a monolith, you can verify the whole assembly line in one place. With microservices, you need to verify that dozens of independent factories coordinate correctly, handle delayed shipments, and recover when one factory goes offline. This chapter covers comprehensive testing strategies that give you confidence without drowning in slow, brittle tests.- Implement unit testing with mocks
- Write integration tests for services
- Use contract testing with Pact
- Design end-to-end test strategies
- Set up test environments
Testing Pyramid for Microservices
Before we dive into a single line of test code, we need a mental model for where to invest testing effort. The testing pyramid is a resource allocation framework disguised as a diagram. The fundamental insight is that tests closer to the bottom (unit tests) are cheap to write, fast to run, and catch localized bugs. Tests closer to the top (E2E) are expensive, slow, and catch integration bugs — but they are also flaky and break for environmental reasons that have nothing to do with your code. If you invert the pyramid and invest heavily in E2E tests, your CI pipeline becomes a coin flip: developers learn to ignore red builds because “the tests are flaky,” and real bugs slip through. The pyramid shape exists because it represents the optimal trade-off between coverage, speed, and signal quality. In microservices, we add two middle layers — contract and component tests — because service boundaries are the highest-risk part of the system and deserve dedicated test types.Testing Strategy Decision Framework
Choosing the right test type for a given scenario is where most teams get it wrong. They either over-invest in E2E tests (slow, flaky, expensive) or under-invest in contract tests (the single most impactful test type for microservices that almost nobody writes). Use this decision framework: “What am I verifying?”| Question | Best Test Type | Why |
|---|---|---|
| Does my business logic produce correct results? | Unit test | Fast, isolated, easy to debug |
| Does my SQL/ORM actually work against a real DB? | Integration test | Catches schema mismatches, query bugs |
| Can Service A still talk to Service B after B’s latest deploy? | Contract test | Catches breaking API changes before production |
| Does my service handle the full request lifecycle correctly? | Component test | Catches wiring issues, middleware bugs |
| Does the critical user journey (checkout, signup) still work? | E2E test | Last line of defense for revenue-critical paths |
| Test Type | Typical Count Per Service | Run Time Budget | CI Frequency |
|---|---|---|---|
| Unit | 200-500+ | less than 30 seconds total | Every commit |
| Integration | 30-100 | less than 2 minutes | Every commit |
| Contract | 10-30 per consumer | less than 1 minute | Every commit |
| Component | 10-20 | less than 5 minutes | Every PR |
| E2E | 5-10 (entire system) | less than 15 minutes | Pre-deploy only |
{"status": "COMPLETED"} but the meaning of “COMPLETED” has subtly changed (say, it now means “payment authorized” instead of “payment captured”), contract tests pass while your system is broken. For these semantic edge cases, you need integration tests that validate actual behavior, not just schema compliance.
Unit Testing
Testing Business Logic
The key insight with unit testing in microservices: you test your service’s logic in isolation by mocking its dependencies. This is no different from unit testing in a monolith, but the boundaries are cleaner because service APIs are explicit contracts rather than internal method calls. Why mock dependencies? Because the alternative — spinning up real databases and network calls for every test — turns a 10-millisecond unit test into a 500-millisecond integration test. Across thousands of tests, that difference is minutes vs hours of CI time. The trade-off: mocks can drift from reality. You might mockinventoryClient.checkAvailability to return { available: true }, but in production the real API returns a different field name. That is why we complement unit tests with contract tests (which verify the mock matches reality) and integration tests (which exercise real dependencies). Think of unit tests as verifying “my logic is correct assuming my dependencies behave” and other test layers as verifying “my dependencies actually behave the way I assumed.”
The dependency injection pattern below is what makes the code testable. If OrderService instantiated its own repository and clients internally, we would have no way to substitute mocks — we would be stuck running the real code against real infrastructure. Constructor injection externalizes these dependencies and flips control: the test decides what OrderService talks to. This applies equally to Node.js and Python; the syntax differs, but the discipline is identical.
- Node.js
- Python
Unit Tests with Jest and pytest
Notice how each mock returns sensible defaults inbeforeEach. This is a production pitfall to watch: if your mocks are too tightly coupled to specific test scenarios, a single change in the service code breaks dozens of tests. Default to the “happy path” in setup, then override per test.
Why is the happy-path-in-setup pattern so important? Because tests should read like specifications: “Given a valid order, when I call createOrder, then it returns a pending order.” The setup answers “given a valid X” once, at the top, and each individual test focuses only on what makes it different. If every test has to re-configure every mock, you end up with test files where the setup dwarfs the actual assertions, and changing a dependency signature requires touching hundreds of lines. The same principle applies whether you use Jest’s beforeEach or pytest’s fixtures — fixtures compose better (pytest can inject only the fixtures a test needs), but the core idea of “shared happy-path setup, per-test overrides” is the same.
One more nuance: verify behavior, not implementation. Tests that assert “mock was called with these exact arguments” are brittle — they fail every time you refactor. Prefer assertions on observable outcomes (what the function returned, what state changed). Only verify interactions when the interaction itself is the behavior you care about (e.g., “did we charge the customer?”).
- Node.js
- Python
Integration Testing
Database Integration Tests
Unit tests with mocked repositories can be 100% green while your SQL is completely broken. The mock does not know your column iscustomer_id and you typed customerid. The mock does not know your foreign key constraint will reject orphaned rows. The mock does not know your JSONB column needs a ::jsonb cast. Every one of these bugs only surfaces when code meets a real database. That is the purpose of integration tests: to verify that your persistence layer actually works against the real storage engine, with real schemas, real constraints, and real SQL dialects.
The architectural insight is that integration tests are where your “database per service” principle gets tested. In a microservices world, each service owns its schema. Integration tests verify that your service correctly manages its own schema without reaching into shared tables or assuming knowledge of other services’ data. They also catch performance cliffs — a query that runs in 2 ms on an empty table might take 2 seconds at 10 million rows. Seeding realistic data volumes in integration tests surfaces these issues early.
The trade-off: integration tests are slower than unit tests (seconds instead of milliseconds) and require infrastructure (a running Postgres, a migration system). That is fine — you write fewer of them, and you target the code paths that unit tests cannot cover: the SQL, the ORM mappings, the transactional boundaries. If you skip integration tests, your first integration “test” runs in production, with real customer data, at 3 AM.
- Node.js
- Python
API Integration Tests
Integration tests at the API level verify the entire request-response lifecycle: routing, middleware, authentication, request parsing, business logic, database writes, and response serialization. This is the layer where most “works on my machine” bugs live — a middleware that strips authentication headers in production but not in dev, a request body parser that silently truncates large payloads, a response serializer that exposes internal fields. Thesupertest library in Node.js and httpx.AsyncClient in Python both test the application without opening a real network socket. They invoke the application’s request handler directly with a synthetic request object. This is faster than spinning up an HTTP server, and it gives you the same level of realism for middleware and routing. You are still testing the real app — just without the network layer that would add latency and potential flakiness.
A common mistake in API integration tests is relying on side effects from previous tests (a test that creates an order, followed by a test that assumes the order exists). This creates order-dependent tests that pass locally but fail in CI when tests run in parallel or a different order. Always reset state between tests. The beforeEach delete in the example below is not optional paranoia — it is the difference between a test suite that works and one that develops mysterious flakiness over time.
- Node.js
- Python
Contract Testing with Pact
Contract testing is the unsung hero of microservices testing. Here is the core problem it solves: Service A depends on Service B’s API. Service B deploys a change that breaks Service A. Without contract tests, you only discover this in staging (or worse, production). With contract tests, Service B’s CI pipeline fails before the breaking change ever merges. Think of it like a legal contract between two businesses — the consumer writes down what they expect, and the provider agrees to deliver it. If the provider changes the deal, the contract breaks visibly. Why not just use integration tests across services? Because integration tests require running both services together, which means you need an integration environment, shared test data, and coordinated deployments. Contract tests invert the relationship: the consumer runs its test against a Pact mock server (fast, isolated), which records what the consumer expects. The provider later replays those expectations against its real implementation (also fast, also isolated). No shared environment is required. Each team tests independently, on their own CI, and incompatibilities are detected at PR time rather than at integration time. This is the testing strategy that makes independent deployability actually work. The trade-off: contract tests verify syntax (field names, types, HTTP status codes), not semantics. A provider can rename a field and fail the contract; it cannot change the meaning of a field and still pass the contract. For semantic verification, you need integration tests or explicit versioning. Contract tests are necessary but not sufficient — they are the first line of defense, not the only line.Consumer Contract Test
- Node.js
- Python
Provider Contract Test
The provider side of contract testing is where the consumer’s expectations meet the provider’s reality. The provider downloads pact files (usually from a Pact Broker), then replays each interaction against its real, running service. Critically, the provider must restore its database state before each interaction so that the “given” preconditions hold — this is whatstateHandlers are for.
If you skip provider verification, contract tests give you false confidence. The consumer test only verifies that the consumer code works against its own assumptions. Until the provider runs verification, you have no proof that the real provider actually behaves that way. Teams that “have Pact tests” but never run provider verification are in the worst of both worlds — they have the maintenance burden of Pact without the benefit.
- Node.js
- Python
Component Testing
Component tests sit between integration and E2E tests. The idea: spin up your service with real infrastructure (databases, message brokers) but mock other services. This catches the bugs that unit tests miss (ORM misconfigurations, SQL syntax errors, message serialization issues) without the fragility of full E2E tests. Testcontainers makes this practical by spinning up real Docker containers that are destroyed after each test suite. Why not just use more integration tests? Because integration tests typically scope to one module (a repository, a handler). Component tests scope to the whole service: HTTP in, HTTP out, with real infrastructure underneath. They are the test you trust when you ask “if I deploy this service to prod alone, does it work?” They catch wiring bugs (did I register that middleware?), configuration bugs (did I wire the correct environment variable?), and startup ordering bugs (did I await the database connection before accepting requests?). No other test type catches these. Production pitfall: The 60-second timeout onbeforeAll is not optional. Container startup is unpredictable — if your CI runner is under load, Postgres might take 30 seconds to initialize. Flaky tests here almost always trace back to insufficient timeouts. A related trap: reusing containers across test runs for speed. It works until a previous run’s state leaks into the current one. Prefer fresh containers per suite and invest in parallelism to keep total time reasonable.
- Node.js
- Python
End-to-End Testing
E2E tests verify complete user journeys across multiple services. They are the most realistic but also the most expensive and fragile. The rule of thumb: have fewer than 10 E2E tests covering only your critical revenue paths. If your checkout flow breaks, you lose money. If a profile avatar upload is slightly slow, nobody notices. Why limit E2E tests? Because every additional service in the test adds a multiplier to failure modes. A test that crosses 5 services has 5 networks to fail, 5 databases to desynchronize, 5 version mismatches to worry about, and 5 containers to start. The math is brutal: even if each service has 99% reliability during tests, the combined suite has0.99^5 = 95% reliability — meaning 1 in 20 runs fails for environmental reasons alone. This is how “just rerun CI” becomes a team’s default debugging technique, which in turn makes real regressions invisible. Keep E2E tests scarce and scope them to journeys where failure costs real money.
Production pitfall: The setTimeout in the test below (waiting for async processing) is a code smell but sometimes unavoidable. In production E2E suites, replace it with a polling mechanism that checks for the expected state with a timeout, rather than a fixed delay. Fixed delays are the single biggest source of E2E flakiness — they either wait too little (test fails intermittently) or too much (test suite takes 30 minutes when it should take 3).
- Node.js
- Python
Test Environment Setup
A dedicated test environment via Docker Compose gives every developer the same reproducible test infrastructure. Notice thetmpfs mount for Postgres below — this stores the database entirely in memory, making integration tests 3-5x faster since there are no disk writes. This single trick can cut your CI pipeline time significantly.
Why dedicate infrastructure for tests? Because sharing dev infrastructure creates order-dependent flakiness. Developer A’s lunch-break test run leaves stale data; developer B’s CI run fails mysteriously. A test-only compose file ensures every run starts from a clean, known state. The memory-backed database trick works because tests rarely need durability — a disk crash during CI is no worse than any other CI failure. Trading durability for speed is exactly the kind of test-environment-specific optimization that is wrong in production but right here.
Test Runner Configuration
Splitting tests into projects (unit/integration/e2e) lets developers run fast tests often and slow tests rarely. ThecoverageThreshold gate prevents silent coverage regressions — if a PR drops coverage below 80%, CI fails. This is more useful than it sounds: in practice, it forces engineers to at least think about testing before merging, which is a surprisingly effective cultural lever.
- Node.js
- Python
Interview Questions
Q1: How do you test microservices interactions?
Q1: How do you test microservices interactions?
-
Contract Testing (Pact)
- Consumer defines expected interactions
- Provider verifies it can meet expectations
- Catches breaking changes early
-
Integration Testing
- Test with real dependencies (test containers)
- Slower but more realistic
- Good for database interactions
-
Service Virtualization
- Mock external services
- Consistent test data
- Faster than real services
- Contract tests for API compatibility
- Integration tests for data integrity
- Mocks for unit tests
Q2: What is contract testing and why is it important?
Q2: What is contract testing and why is it important?
- Consumer defines expectations (contract)
- Provider verifies it meets contract
- Changes detected before deployment
- Prevents integration failures
- Enables independent deployment
- Faster feedback than E2E tests
- Documents service interactions
- Consumer writes contract tests
- Generates contract file
- Provider runs verification
- Breaks build if contract broken
Q3: How do you handle test data in microservices?
Q3: How do you handle test data in microservices?
-
Test Containers
- Spin up real databases
- Isolated per test suite
- Clean state each run
- Test Data Builders / Factories
- Node.js
- Python
-
Database Seeding
- Known state before tests
- Fixtures or factories
-
Data Cleanup
- Truncate after tests
- Transaction rollback
- Isolated test databases
- Each test manages own data
- No dependencies between tests
- Use factories for complex objects
Testing Anti-Patterns and Edge Cases
The “Ice Cream Cone” Anti-Pattern
Many teams invert the testing pyramid — heavy on E2E and manual tests, light on unit and contract tests. This creates slow CI pipelines (45+ minute builds), flaky test suites that everyone ignores, and a false sense of security because the tests that pass are testing the wrong things. How to detect it: If your team says “just run it in staging and check” more than twice a week, you have an ice cream cone.Edge Case: Testing Async Event Flows
Testing event-driven interactions (Kafka, RabbitMQ) is where microservices testing gets genuinely hard. The challenge: you publish an event, another service consumes it, and you need to verify the side effect. You cannot just assert immediately because the consumer processes asynchronously. Approaches ranked by reliability:| Approach | Reliability | Speed | Complexity |
|---|---|---|---|
| Poll for expected state with timeout | High | Slow (seconds) | Low |
| Use an in-memory event bus in tests | High | Fast (milliseconds) | Medium |
| Testcontainers with real Kafka | Highest | Slow (startup overhead) | High |
| Mock the message broker entirely | Low (misses serialization bugs) | Fastest | Low |
Edge Case: Testing Database Migrations
Migrations that work on an empty database often fail on a production database with millions of rows. Your test suite should include:- Forward migration test — apply migration to a seeded database with realistic data volumes
- Rollback test — verify the down migration restores the previous schema
- Data integrity test — verify existing data survives the migration without corruption
- Performance test — ensure the migration completes within your maintenance window (an ALTER TABLE on a 500M row table can take hours)
Summary
Key Takeaways
- Follow the testing pyramid
- Unit tests for business logic
- Contract tests for API compatibility
- Integration tests with test containers
- E2E for critical paths only
Next Steps
Interview Deep-Dive
'Your team has 15 microservices. Integration tests take 45 minutes to run and break frequently due to test environment instability. How do you fix this?'
'Your team has 15 microservices. Integration tests take 45 minutes to run and break frequently due to test environment instability. How do you fix this?'
'How do you test a saga that spans 4 services (Order, Payment, Inventory, Notification) without running all 4 services?'
'How do you test a saga that spans 4 services (Order, Payment, Inventory, Notification) without running all 4 services?'
'What is the testing diamond and how does it differ from the testing pyramid for microservices?'
'What is the testing diamond and how does it differ from the testing pyramid for microservices?'
Scenario-Driven Interview Questions
You are building a new microservice that calls 4 other services (payments, inventory, user, notifications). Design the testing strategy from unit to production.
You are building a new microservice that calls 4 other services (payments, inventory, user, notifications). Design the testing strategy from unit to production.
- Start with the risk map. Which integrations are financial (payments), which are eventual-consistency safe (notifications), which affect user data? The testing investment follows risk: payments gets contract + integration + an explicit E2E; notifications gets contract + a single smoke test.
- Unit tests for internal logic. Inject each of the 4 dependencies as interfaces. Test business rules in isolation with mocks that return controlled responses: happy path, timeout, 4xx, 5xx, malformed body. Target 200-500 unit tests running in under 30 seconds total.
- Integration tests for your own persistence. Spin up real Postgres via Testcontainers. Exercise every repository method, every migration, every complex query. No mocks for your own DB — ever. Target 30-100 tests in under 2 minutes.
- Contract tests for every downstream. Write consumer-driven contracts for each of the 4 services using Pact. The contracts live in your repo; publish them to a broker on every merge. Each downstream’s CI verifies the contracts before merging their own changes. This is the single most valuable test tier for multi-service architectures.
- Component tests for the service boundary. Start your service with Testcontainers Postgres, real Kafka, and WireMock stubs for the 4 downstreams. Make HTTP requests against the service. This catches wiring bugs (middleware order, config injection, startup sequence) that unit tests cannot.
- A handful of E2E tests for critical paths. One test per revenue-critical flow (e.g., full order placement end-to-end). Run these in a dedicated staging pipeline, not per-commit. Keep total count under 10.
- Production verification. Synthetic transactions every 60 seconds from a third-party prober (or Kubernetes CronJob). Alerts fire if the user-facing journey breaks. This is your continuous E2E.
- “Lots of E2E tests covering every feature.” Inverts the pyramid. Slow, flaky, and catches bugs too late. Discussed at length as the ice cream cone anti-pattern.
- “Mock all 4 services in unit tests, skip contracts.” Mocks drift from reality. The first time you find out the real service’s schema changed is in staging, if you are lucky, or production if not.
- “Run everything against a shared staging environment.” Cross-test contamination guarantees flakiness. Shared state is the enemy of reliable tests.
- Martin Fowler’s “Microservice Testing” series: martinfowler.com/articles/microservice-testing — the canonical reference.
- “Testing Microservices” by Sam Newman (chapters from “Building Microservices”, 2nd Edition).
- Monzo Engineering Blog: “How we test at Monzo” — real-world example at scale.
Your team's integration test suite takes 45 minutes and breaks 3 times a week due to flakiness. What is your diagnostic process and what fixes do you apply?
Your team's integration test suite takes 45 minutes and breaks 3 times a week due to flakiness. What is your diagnostic process and what fixes do you apply?
- Instrument before you guess. Turn on per-test timing and failure counts. The 45-minute runtime is usually 20% of tests consuming 80% of time; the flakiness is usually 5% of tests producing 90% of failures. Data first, fixes second.
- Sort tests by flakiness rate. For each test, compute
failures / total runsover the last 30 days. Any test above 2% is either genuinely racy, depends on shared state, or asserts something non-deterministic. - Categorize the flaky tests. Three buckets: (a) timing —
setTimeout-based waits, insufficient polling; (b) shared state — order-dependent, shared DB/environment; (c) external dependency — hitting a real API, random data from the network. - Fix, do not retry. Never wrap flaky tests in retry logic — that hides real bugs. Instead: replace fixed sleeps with polling, replace shared state with ephemeral state, replace external APIs with Testcontainers or stubs.
- Attack the slowest tests. For tests above 5 seconds each, ask: can this be a unit test? Can I seed fixtures once instead of per-test? Can I run this test in parallel? Can I use transactional rollback instead of TRUNCATE?
- Parallelize the suite. With proper isolation (each test owns its data), integration tests parallelize trivially.
pytest -n autoand Jest’s--maxWorkerscut wall-clock time by the worker count. - Quarantine, fix, reinstate. A test that is flaky and cannot be immediately fixed goes to a quarantine suite — still runs, still reports, but does not block merges. File a ticket, fix within a sprint, or delete.
Date.now() without a mock or frozen clock. (3) Flakiness budgets — each service owns a metric and pages the team if it exceeds threshold.- “Add retries to the flaky tests.” Hides bugs, increases CI time, teaches engineers to ignore failures.
- “Delete the slow tests.” Loses coverage without fixing the cause. Only valid after confirming the deleted test is redundant.
- “Move everything to E2E so we do not need integration tests.” Inverts the pyramid entirely. Ten-minute integration tests become forty-minute E2E tests with worse signal.
- Google Testing Blog: “Flaky Tests at Google and How We Mitigate Them” (2016) — the foundational paper.
- Spotify Engineering: “Flaky Test Management at Scale” — practical quarantine workflow.
- “Working Effectively with Legacy Code” by Michael Feathers — the chapter on fast feedback applies directly to CI speed.
A production incident happened because a breaking API change shipped from the User service and broke the Order service. You had contract tests. Why did they not catch it, and how do you fix the process?
A production incident happened because a breaking API change shipped from the User service and broke the Order service. You had contract tests. Why did they not catch it, and how do you fix the process?
- Reproduce the miss. Pull the exact pact file at the time of the breaking User-service merge. Did it contain the assertion that would have caught the break? If not, the consumer’s contract was incomplete; if yes, the provider’s verification pipeline was not running on this PR.
- Check the verification pipeline. Was
pact-verifiera mandatory gate on User-service PRs? Most contract-test failures trace to “we had contracts, but verification was a nightly job, not a PR gate.” Fix: wire verifier into the PR required checks. - Check contract completeness. Contract tests verify the shapes and status codes that the consumer actually exercises. If Order never called the specific endpoint that User broke, the contract never covered it. Fix: audit consumer code paths against contract coverage, file tickets for gaps.
- Check
can-i-deployin the deploy pipeline. Even with green verification, deploying User-service v2.5 without checking compatibility against the consumer version running in prod is risky.can-i-deployanswers: “are all consumers currently in prod compatible with what I am about to deploy?” If not wired in, wire it now. - Audit tag hygiene. Pact Broker tags like
prod,staging,mastermust move atomically on successful deploys. Drifted tags (staleprod) makecan-i-deployreport false positives. Verify tag update is part of the deploy pipeline, not a manual step. - Backfill with a regression test. Write a contract test that would have caught this specific break. Future consumer contracts inherit this shape, so the next team cannot accidentally ship the same mistake.
- Postmortem the gap, not the break. The real failure is the process: contracts existed but did not gate. Document the process change (PR gate, can-i-deploy, tag hygiene) and treat this as a lesson for all service teams.
can-i-deploy was added as a deploy gate. Similar incidents dropped to zero over the following year.Senior Follow-up Questionscan-i-deploy solves at scale. The provider asks: “given all consumer contracts currently in prod, can I ship this new version?” If even one consumer is incompatible, deploy halts and the provider contacts that specific team. Alternative: backward compatibility as a policy — providers support both old and new shapes for a deprecation window.can-i-deploy failures (high is good — it means the gate caught real issues), time from provider PR to consumer awareness of break.- “We need more E2E tests.” E2E tests would have caught this break once, in a shared environment, probably after merge. Contract tests catch it at PR time. E2E is a fallback, not a primary defense against API compatibility.
- “The provider should announce breaking changes.” Relies on humans remembering and writing correct announcements. The whole point of contract testing is that the test fails automatically; process changes that rely on human discipline regress over time.
- Pact documentation: “Versioning and Branches” and “Can I Deploy” — docs.pact.io.
- “Consumer-Driven Contracts: A Service Evolution Pattern” by Ian Robinson — martinfowler.com/articles/consumerDrivenContracts.html.
- SoundCloud Engineering: “Scaling Pact Across 500 Services” — real adoption story with pitfalls.