Domain-Driven Design for Microservices
Domain-Driven Design (DDD) is the foundation for properly decomposing systems into microservices. Without DDD, you’ll likely end up with a distributed monolith. Think of DDD like city planning. A city is not organized by building material (all brick buildings here, all glass buildings there) — it is organized by purpose (residential district, commercial district, industrial zone). Each district has its own rules, its own vocabulary, and its own governance. A “permit” means something different in the building department than in the parking department. DDD gives you the same clarity for software: organize by business purpose, not by technical layer, and accept that the same word can mean different things in different contexts.- Understand DDD strategic patterns
- Identify bounded contexts in a domain
- Map subdomains to microservices
- Design aggregate boundaries
- Apply context mapping patterns
Why DDD for Microservices?
DDD provides the conceptual framework to:- Identify Service Boundaries - Bounded contexts become services
- Define Data Ownership - Aggregates define what a service owns
- Design Communication - Context maps define how services interact
- Align with Business - Ubiquitous language bridges tech and business
Strategic DDD Patterns
Subdomains
Every business has different types of domains:- Core Domain
- Supporting Domain
- Generic Domain
- Invest most resources here
- Custom-built, in-house
- Highest quality code
- Best engineers assigned
- Personalized recommendations
- Dynamic pricing engine
- Customer experience optimization
Why Classify Subdomains Before Writing Code
The subdomain classification exercise is not an academic formality — it is a direct input to your staffing and architecture decisions. Core domains get your best engineers, the most architectural investment, and the largest test suites. Supporting domains get pragmatic “good enough” solutions. Generic domains get outsourced to a vendor via a thin integration layer. If you skip this step, you end up treating all subdomains as equally important, which means you either under-invest in what differentiates your business, over-invest in commodity concerns, or both. Within the broader microservices architecture, subdomain classification tells you which services deserve to be separate microservices (likely your core and most supporting domains) and which should be thin wrappers around third-party SaaS (your generic domains). The tradeoff to watch: classifications drift over time. What is generic today can become core when your business pivots, and what is core today can become commodity when the industry catches up. Revisit these classifications at least annually.- Node.js
- Python
Bounded Contexts
A bounded context is a boundary within which a particular domain model is defined and applicable.Interview: You join a team where 3 services share a single User table. How do you fix it?
Interview: You join a team where 3 services share a single User table. How do you fix it?
Interview: Two teams want to share a single 'Order' model across Order Service and Shipping Service. Why is this a bad idea, and what's the alternative?
Interview: Two teams want to share a single 'Order' model across Order Service and Shipping Service. Why is this a bad idea, and what's the alternative?
Ubiquitous Language
Each bounded context has its own vocabulary that everyone (developers and business) uses. This is not just a naming convention — it is a communication discipline. When the code sayspublish() and the business says “make it live,” you have a language gap that will cause bugs. Ubiquitous language means the code reads like the business speaks, within each context.
Why Different Contexts Deserve Different Models
New developers often see three “Product” classes and assume it is code duplication that should be consolidated. That instinct is wrong and it is the most common way teams destroy bounded contexts. The Catalog Product, Sales Product, and Shipping Product share a name, but they represent fundamentally different concerns. The Catalog team cares about merchandising (descriptions, images, categories). The Sales team cares about money (prices, discounts, promotions). The Shipping team cares about physics (weight, dimensions, fragility). Merging these into one “Product” class forces the catalog team to coordinate with shipping every time they add a new image size, and forces shipping to coordinate with sales every time they change the weight unit. If you ignore this principle and share a single Product model across contexts, you will build the most coupling-heavy class in your system. Everyone will depend on it, and no one will own it. In the broader microservices architecture, ubiquitous language per context is what makes independent team ownership possible — each team owns their model, their vocabulary, and their evolution. The tradeoff to watch: you will have data duplication and translation overhead at context boundaries. That cost is the price of independence, and it is almost always worth paying.- Node.js
- Python
Context Mapping
Context maps show how bounded contexts relate to each other.Relationship Patterns
E-Commerce Context Map
Tactical DDD Patterns
Aggregates
An aggregate is a cluster of domain objects treated as a single unit. Think of an aggregate like a legal document with attachments — you can reference the document by its case number (the aggregate root ID), but you cannot independently modify an attachment without going through the document. The document enforces the rules for what attachments are valid, and when you file the document, all attachments are filed atomically. The key trade-off: smaller aggregates mean better concurrency (fewer conflicts) but more eventual consistency between them. Larger aggregates mean stronger consistency guarantees but worse performance under concurrent writes. In practice, most teams start with aggregates that are too large and gradually split them as contention becomes visible.Why Aggregates Matter More Than You Think
Aggregates are the most practically important tactical pattern in DDD, because they define your transaction boundary. Everything inside an aggregate is consistent together, atomically, within a single database transaction. Everything outside the aggregate is eventually consistent, via domain events. That one rule — “one aggregate, one transaction” — is what prevents the accidental distributed transactions that wreck microservices architectures. If you ignore aggregate boundaries and let external code modify OrderItems directly without going through the Order root, you lose the ability to enforce invariants. The rule “an order cannot be modified after submission” disappears the moment someone canUPDATE order_items SET quantity = 5 WHERE order_id = ... without asking the Order aggregate. Within the broader architecture, aggregates map directly to the “consistency guarantees” column of your service contract: within-aggregate changes are atomic, cross-aggregate changes are eventually consistent. Tradeoff: large aggregates create lock contention under high concurrency (two customers trying to update the same order’s items will serialize), so the practical advice is to start small and combine only when you actually need the invariants.
- Node.js
- Python
Interview: Your team has a 'UserService' with 20 methods, all of which take a User data object and return it modified. Is this DDD? What's wrong?
Interview: Your team has a 'UserService' with 20 methods, all of which take a User data object and return it modified. Is this DDD? What's wrong?
- Identify the anti-pattern. This is an anemic domain model — all logic is in the service, the User class is just a data bag. It is using DDD vocabulary without the substance.
- Explain the hidden cost. Invariants are impossible to enforce because any code can construct or mutate User directly, bypassing UserService.
- Refactor toward a rich aggregate. Methods like
changeEmail,changePassword,deactivatebelong on the User class itself. UserService becomes thin — it coordinates repository, event publishing, and transaction boundaries. - Distinguish domain services from application services. Application services (UserService at the use-case level) coordinate workflows. Domain services (e.g., PasswordHashingService) encapsulate logic that does not belong to a single aggregate but is still domain behavior.
- Show the test benefit. Rich aggregates are testable without a database. A unit test of
user.changeEmail("new@x.com")runs in microseconds. Testing anemic models requires the whole service layer.
- “When is a service layer legitimate, vs. anemic?” An application service is legitimate — it orchestrates a use case (load aggregate, invoke behavior, persist, publish events). A domain service is legitimate when the logic genuinely spans multiple aggregates (e.g., TransferMoney between two Accounts). What is illegitimate is business rules inside these services when they belong on an aggregate.
- “What about ORMs that want plain data classes?” Separate the persistence model from the domain model. SQLAlchemy 2.0, JPA, and Entity Framework all support mapping rich domain objects. Or use a repository pattern to translate between the two.
- “How do you retrofit rich aggregates into a legacy anemic codebase?” Incrementally. Pick one aggregate. Introduce methods for one specific use case (
Order.cancel()). Deprecate the equivalent service method. Repeat. This can take months but is non-disruptive.
- “It’s fine because we’re using DDD terminology.” Fails because DDD is not a naming convention. The semantic substance (invariants, rich behavior, bounded contexts) matters more than class names.
- “We need service classes because of dependency injection.” Fails because DI is compatible with rich aggregates — inject dependencies via constructors into services that use aggregates, not into aggregates themselves.
- Martin Fowler, “AnemicDomainModel” (2003) — the original article.
- Vaughn Vernon, Implementing Domain-Driven Design (Addison-Wesley, 2013), Chapters 7-10.
- Greg Young’s talk “7 Reasons Why Your DDD is Failing” (DDDEU 2019).
Interview: How do you handle a business rule that spans two aggregates -- like 'a customer cannot place an order if their credit limit is exceeded'?
Interview: How do you handle a business rule that spans two aggregates -- like 'a customer cannot place an order if their credit limit is exceeded'?
- Identify the type of rule. Rules spanning aggregates should either (a) be eventually consistent via events, or (b) live in a domain service that coordinates the two aggregates within a single transaction.
- Prefer eventual consistency where possible. If the rule is “warn after exceeding limit,” let credit-limit-exceeded be an event that Customer subscribes to.
- For strict rules, use a domain service.
OrderPlacementService.place(order, customer)loads both aggregates, checks the invariant, and persists within a transaction. Both aggregates remain rich; the service is the coordinator. - Avoid distributed transactions across services. If Customer and Order are separate services, you cannot atomically check limit + create order. You must choose: (a) optimistic placement + async compensation (saga), or (b) real-time query to Customer at placement time.
- Document the consistency guarantee explicitly. “Orders are placed optimistically; exceeded-limit cases compensate by cancelling within 5 seconds.” This is a product contract, not a bug.
- “What if the merchant cannot tolerate eventual consistency?” Offer a synchronous fraud check (
RadarRules.evaluate()before charging). Trade latency for stronger guarantees. This is a product-level choice. - “How do you handle failures in the compensation path?” Compensation must be idempotent and retried until successful. Use the transactional outbox pattern to ensure compensation events are never lost.
- “What’s the risk of relying on domain services for cross-aggregate rules?” The service can grow to be a god-object that does “everything about orders.” Keep domain services narrow — one use case each. Use application services for orchestration.
- “Put the rule in both aggregates.” Fails because now the rule is duplicated in two places, diverges over time, and is impossible to keep in sync.
- “Use a distributed transaction across the two services.” Fails because 2PC is a known anti-pattern for microservices — high latency, single point of failure, poor availability.
- Chris Richardson, Microservices Patterns (Manning, 2018), Chapter 4 (Sagas) and Chapter 5 (Business Logic Organization).
- Vaughn Vernon, Implementing Domain-Driven Design, Chapter 10 (Services).
- Stripe Engineering, “How Stripe handles Idempotency” (2017).
Aggregate Design Rules
Rule 1: Reference by ID
Rule 2: Small Aggregates
Rule 3: Consistency Boundary
Rule 4: Eventual Consistency
Value Objects
Immutable objects defined by their attributes, not identity. Two Money objects with the same amount and currency are equal — you do not care which specific 10. This is the opposite of entities, where two Users with identical names are still different people (they have different IDs). Production pitfall: The most common value object mistake is representing money as a plainnumber. Floating-point arithmetic means 0.1 + 0.2 !== 0.3 in JavaScript (and 0.1 + 0.2 == 0.30000000000000004 in Python). Use integer cents, Python’s Decimal, or a dedicated Money class. This is not a theoretical concern — rounding errors in financial calculations compound across millions of transactions and trigger real reconciliation failures.
Why Immutability Matters for Value Objects
Value objects feel like a small detail but they are load-bearing for domain correctness. When Money, Address, and Quantity are immutable, you can pass them around freely, share them across aggregates, and cache them without fear of action-at-a-distance bugs where one caller mutates a shared reference and breaks another caller. Equality is by value, not identity, which matches how the domain thinks: “these are the same address” is a claim about the content, not the memory location. If you make value objects mutable, you reintroduce the class of bugs immutability was designed to prevent: one piece of code changes an Address field on a shared reference, and three unrelated features suddenly break because they were relying on the original value. In the broader microservices architecture, immutable value objects map cleanly to message payloads — you can serialize them into a Kafka event and know they will survive the round trip intact. Tradeoff: every “modification” allocates a new object. For hot paths with millions of operations per second, this can add GC pressure, but in practice this is almost never the bottleneck in a business application.- Node.js
- Python
Domain Events
Events that indicate something important happened in the domain.Why Events Decouple the Future From the Past
Domain events are the mechanism that allows cross-aggregate and cross-service consistency without tight coupling. When an Order is submitted, the Order aggregate does not need to know that Inventory Service will reserve stock, that Email Service will send a confirmation, that Analytics will record a funnel step, and that Recommendations will update the user’s history. It just publishesOrderSubmittedEvent and the interested services subscribe to it. This is the essence of inversion of dependency at the architecture level.
If you skip domain events and instead have Order Service directly call Inventory, Email, Analytics, and Recommendations, you have coupled one change (submitting an order) to four downstream systems. Adding a fifth subscriber means modifying Order Service. In the broader microservices architecture, events are what enables true independent evolution — a new team can start consuming OrderSubmittedEvent tomorrow and the Order team never knows. Tradeoff: events are asynchronous and eventually consistent, so you cannot assume all subscribers have processed an event before the next one arrives. Design for idempotency and out-of-order delivery.
- Node.js
- Python
Mapping DDD to Microservices
From Bounded Context to Service
Service Structure Based on DDD
The folder structure below reflects the classic “layered architecture” (sometimes called hexagonal or onion architecture) applied to a DDD microservice. The four layers — domain, application, infrastructure, api — exist so that business logic stays pure and independent of technical details. The domain layer knows nothing about HTTP, PostgreSQL, or RabbitMQ; it just describes what an Order is and how it behaves. This separation pays off when you need to swap Postgres for DynamoDB, or REST for gRPC — only the infrastructure and api layers change. If you skip this structure and mix business logic with HTTP controllers or ORM queries, you get the “fat controller” anti-pattern where a 500-line route handler does validation, database work, external calls, and business rules all in one place. In the broader microservices architecture, layered structure is what makes unit-testing the business logic feasible without spinning up a test database, which in turn enables fast CI pipelines. Tradeoff: the directory count is larger, and junior engineers initially find the indirection confusing (“why do I need an interface AND an implementation?”). The payoff comes around month six, when refactoring the infrastructure does not ripple through business code.__init__.py files make the layers importable:
Anti-Corruption Layer Example
When integrating with external systems that have a different model. The ACL is like a professional translator at a diplomatic meeting — each side speaks their own language, and the translator ensures the meaning is preserved without either side needing to learn the other’s language. Without this layer, external model changes ripple through your entire domain.When to Reach for an ACL (and When Not To)
The ACL is one of the most misused patterns in DDD. Teams either over-apply it (wrapping every external API in a translator “just in case”) or under-apply it (letting Stripe’s field names leak into their domain classes). The right question to ask is: “If this external system doubled the number of fields it returns or renamed half of them, how many files in my codebase would need to change?” If the answer is “every file that deals with payments,” you need an ACL. If the answer is “just the one client class,” you are already fine. Without an ACL when you need one, external model changes become an existential threat. Stripe releases v3 of their API, half your codebase breaks, and you spend two weeks untangling “charge_id” vs “payment_intent_id” across 40 files. Within the broader architecture, ACLs are what allow generic-subdomain vendors to be swapped with acceptable effort — migrating from Stripe to Adyen stops being a project and becomes a task. Tradeoff: an ACL is extra code, an extra abstraction, and sometimes an extra allocation per call. For truly stable, well-designed external APIs (like Stripe itself, in most cases), conforming directly is often better than adding a layer for its own sake.- Node.js
- Python
Event Storming Workshop
Event Storming is a workshop technique to discover bounded contexts.Steps
Event Storming Result for E-Commerce
Interview Questions
Q1: What is a Bounded Context?
Q1: What is a Bounded Context?
- Same term can mean different things in different contexts (e.g., “Product” in Catalog vs Sales)
- Each context has its own models, rules, and language
- Maps naturally to microservices boundaries
- Reduces complexity by limiting scope
Q2: What is an Aggregate?
Q2: What is an Aggregate?
- Reference other aggregates by ID only
- Keep aggregates small
- Changes within aggregate are atomic (consistency boundary)
- Cross-aggregate consistency is eventual (via events)
Q3: What is an Anti-Corruption Layer?
Q3: What is an Anti-Corruption Layer?
- Integrating with third-party APIs (Stripe, SendGrid)
- Communicating with legacy systems
- When upstream model doesn’t fit your domain
Q4: How do you identify service boundaries using DDD?
Q4: How do you identify service boundaries using DDD?
- Event Storming: Workshop to discover domain events
- Identify Bounded Contexts: Group related events and concepts
- Define Aggregates: What data changes together?
- Map Context Relationships: Customer-Supplier, ACL, etc.
- Each Bounded Context → Microservice
- High cohesion within the service
- Loose coupling between services
- Clear ownership of data
- Matches team structure
Summary
Key Takeaways
- DDD provides the framework for microservice boundaries
- Bounded contexts map to microservices
- Aggregates define data ownership
- Domain events enable loose coupling
- ACL protects from external model pollution
Next Steps
Interview Deep-Dive
'You are designing service boundaries for a new e-commerce platform. Two teams disagree: one wants a single OrderService, the other wants separate CartService and OrderService. How do you resolve this?'
'You are designing service boundaries for a new e-commerce platform. Two teams disagree: one wants a single OrderService, the other wants separate CartService and OrderService. How do you resolve this?'
'Explain what an Anti-Corruption Layer is and give me a real scenario where you would use one versus just conforming to the upstream API.'
'Explain what an Anti-Corruption Layer is and give me a real scenario where you would use one versus just conforming to the upstream API.'
'How would you run an Event Storming session for a system you know nothing about, and what would you do with the output?'
'How would you run an Event Storming session for a system you know nothing about, and what would you do with the output?'