Domain-Driven Design (DDD) is the foundation for properly decomposing systems into microservices. Without DDD, you’ll likely end up with a distributed monolith.Think of DDD like city planning. A city is not organized by building material (all brick buildings here, all glass buildings there) — it is organized by purpose (residential district, commercial district, industrial zone). Each district has its own rules, its own vocabulary, and its own governance. A “permit” means something different in the building department than in the parking department. DDD gives you the same clarity for software: organize by business purpose, not by technical layer, and accept that the same word can mean different things in different contexts.
The subdomain classification exercise is not an academic formality — it is a direct input to your staffing and architecture decisions. Core domains get your best engineers, the most architectural investment, and the largest test suites. Supporting domains get pragmatic “good enough” solutions. Generic domains get outsourced to a vendor via a thin integration layer. If you skip this step, you end up treating all subdomains as equally important, which means you either under-invest in what differentiates your business, over-invest in commodity concerns, or both.Within the broader microservices architecture, subdomain classification tells you which services deserve to be separate microservices (likely your core and most supporting domains) and which should be thin wrappers around third-party SaaS (your generic domains). The tradeoff to watch: classifications drift over time. What is generic today can become core when your business pivots, and what is core today can become commodity when the industry catches up. Revisit these classifications at least annually.
Node.js
Python
// Domain Classification Example// This classification drives your investment strategy. Getting it wrong is expensive:// building a custom email service (generic domain) wastes engineering months,// while outsourcing your pricing engine (core domain) hands your competitive// advantage to a vendor.const domains = { core: { // Build in-house with best resources. This is where 60-70% of engineering // effort should go. These services deserve the best engineers, the most // thorough testing, and the most sophisticated architecture. recommendationEngine: 'Custom ML pipeline', pricingEngine: 'Dynamic pricing algorithm', searchRelevance: 'Custom ranking algorithm' }, supporting: { // Build in-house, standard quality. These need to work reliably but do not // need to be world-class. A "good enough" order management system is fine -- // nobody switches e-commerce platforms because the order status page is 50ms faster. orderManagement: 'Custom service', inventoryTracking: 'Custom service', customerReviews: 'Custom service' }, generic: { // Buy or integrate. Every hour spent building a custom email sender // is an hour NOT spent on your core domain. Use third-party services // behind an Anti-Corruption Layer so you can swap vendors later. emailService: 'SendGrid', paymentProcessing: 'Stripe', authentication: 'Auth0', analytics: 'Segment + Amplitude' }};// Production pitfall: Domain classification is not permanent. What starts as a generic// domain can become core as your business evolves. Netflix's streaming infrastructure// was once a supporting concern (the core was content licensing). When streaming became// the product, that infrastructure moved to core. Re-evaluate classifications annually.
# Domain Classification Example# This classification drives your investment strategy. Getting it wrong is expensive:# building a custom email service (generic domain) wastes engineering months,# while outsourcing your pricing engine (core domain) hands your competitive# advantage to a vendor.from enum import Enumfrom pydantic import BaseModelclass DomainType(str, Enum): CORE = "core" SUPPORTING = "supporting" GENERIC = "generic"class SubDomain(BaseModel): name: str implementation: str domain_type: DomainTypedomains: list[SubDomain] = [ # Build in-house with best resources. This is where 60-70% of engineering # effort should go. These services deserve the best engineers, the most # thorough testing, and the most sophisticated architecture. SubDomain(name="recommendation_engine", implementation="Custom ML pipeline", domain_type=DomainType.CORE), SubDomain(name="pricing_engine", implementation="Dynamic pricing algorithm", domain_type=DomainType.CORE), SubDomain(name="search_relevance", implementation="Custom ranking algorithm", domain_type=DomainType.CORE), # Build in-house, standard quality. These need to work reliably but do not # need to be world-class. A "good enough" order management system is fine -- # nobody switches e-commerce platforms because the order status page is 50ms faster. SubDomain(name="order_management", implementation="Custom service", domain_type=DomainType.SUPPORTING), SubDomain(name="inventory_tracking", implementation="Custom service", domain_type=DomainType.SUPPORTING), SubDomain(name="customer_reviews", implementation="Custom service", domain_type=DomainType.SUPPORTING), # Buy or integrate. Every hour spent building a custom email sender # is an hour NOT spent on your core domain. Use third-party services # behind an Anti-Corruption Layer so you can swap vendors later. SubDomain(name="email_service", implementation="SendGrid", domain_type=DomainType.GENERIC), SubDomain(name="payment_processing", implementation="Stripe", domain_type=DomainType.GENERIC), SubDomain(name="authentication", implementation="Auth0", domain_type=DomainType.GENERIC), SubDomain(name="analytics", implementation="Segment + Amplitude", domain_type=DomainType.GENERIC),]# Production pitfall: Domain classification is not permanent. What starts as a generic# domain can become core as your business evolves. Netflix's streaming infrastructure# was once a supporting concern (the core was content licensing). When streaming became# the product, that infrastructure moved to core. Re-evaluate classifications annually.
Caveats & Common Pitfalls — Bounded Context Violations
Sharing a single “Product” class across services. This is the most common first mistake. Catalog, Sales, and Shipping each conceptualize “Product” differently. Sharing one class forces all three teams to coordinate on every change. Three months in, the class has 80 fields, only 20 of which any given consumer uses.
Context boundaries drawn along technical layers instead of business domains. “UserAPI service,” “UserDB service,” “UserCache service” — you now have three services where one belongs. Split by business capability (user identity, user preferences, authentication) if anything.
Boundaries based on what exists today, not what should exist. A legacy database table is not a bounded context. Do not model your services to mirror the legacy schema; model them to mirror the business domain.
Skipping the domain-expert conversation entirely. Engineers drawing context boundaries from the code alone will preserve existing coupling, because the code reflects the coupling. You need business stakeholders in the room.
Solutions & Patterns — Finding Real Context BoundariesUse the “same word, different meaning” test. If two parts of the business use the same word but mean different things by it, you have found a context boundary.Decision rule:Before drawing a context boundary, you must be able to cite three specific cases where the same domain term (e.g., “Order”, “Customer”, “Product”) means materially different things on either side of the boundary. If you cannot, the “boundary” is fictional.Before/after example (e-commerce):Before: One giant “Product” service owned by a shared team. Product team, Shipping team, and Pricing team all contributed PRs. Deploys required coordinating 3 teams; deploy frequency was weekly.After: Event Storming revealed that “Product” meant three different things:
Catalog Product — SKU, description, images, categories. Owned by Merchandising team.
Sales Product — Price, discount, availability. Owned by Pricing team.
Shipping Product — Weight, dimensions, fragility. Owned by Logistics team.
Each context became its own service with its own team. Product data is replicated at ingestion time (via events), and each service stores only the attributes it cares about. Deploy frequency went from weekly to multiple-per-day per team, and cross-team coordination dropped by ~70%.Implementation pattern: Use Context Maps (from Eric Evans’ DDD) to document relationships between contexts explicitly — Customer-Supplier, Conformist, Anti-Corruption Layer, Shared Kernel, Published Language.
Interview: You join a team where 3 services share a single User table. How do you fix it?
Strong Answer Framework:
Stop the bleeding first. Freeze new writes to User from non-owner services. Any new feature must go through a designated “User Service” API, even if it is temporarily just a thin wrapper over the existing table.
Identify the real bounded contexts. Which services care about authentication? Profile data? Preferences? Billing info? These are likely 2-3 different contexts masquerading as one.
Establish ownership. One team owns User. Other services become consumers via API, not DB clients.
Introduce a read API. Services that currently SELECT from the User table now call GET /users/:id. Start with caching heavily to absorb the latency hit.
Migrate writes using Change Data Capture. Debezium streams changes from the legacy User table; the new User Service gradually takes over writes while CDC keeps legacy in sync.
Split the table by context. Once writes are centralized, extract user-profile data, user-preferences data, and user-auth data into separate schemas/tables. Each can later become its own service if warranted.
Decommission legacy paths. Delete the old direct-SQL code in the two consumer services once the new API is stable.
Real-World Example:Airbnb (2016-2019). Airbnb’s famous “service-oriented architecture” (SOA) migration started by extracting exactly this kind of shared-table dependency. Their “Identity” extraction took ~18 months and required CDC (via their homegrown SpinalTap tool) to keep legacy and new systems in sync during the migration. Jessica Tai’s QCon talks (2018-2019) describe the detailed playbook.Senior Follow-up Questions:
“What if two services ‘really need’ the same field atomically?” They probably belong in the same context, or one of them is computing a denormalized view that should be refreshed via events. Check whether the “need” is actually real-time (rare) or just “close enough” (usually).
“How do you handle the latency hit of switching from SQL JOINs to API calls?” Three tools: (1) API composition at the gateway for read-heavy paths, (2) materialized views in the consuming service updated via events, (3) aggressive caching with explicit invalidation. For high-cardinality reads, option 2 is almost always the right answer.
“What if the migration takes 18 months and leadership wants faster results?” Show the risk of not doing it: every quarter, the cost of the shared table compounds (more services couple to it, harder to untangle). Propose a 3-month milestone: one service fully migrated, with measurable impact on deploy frequency and incident rate.
Common Wrong Answers:
“Just add a view layer in the DB so each service sees its own ‘User’ view.” Fails because the coupling is still at the DB schema layer. Schema changes still ripple through all consumers.
“Use distributed transactions (2PC) for cross-service writes.” Fails because 2PC is a known anti-pattern for microservices — it introduces single points of failure and massive latency.
Further Reading:
Jessica Tai (Airbnb), “Migrating to SOA at Airbnb” (QCon, 2018).
Martin Fowler, “Refactoring Databases” (Addison-Wesley, 2006).
Debezium docs on CDC-based migration patterns.
Interview: Two teams want to share a single 'Order' model across Order Service and Shipping Service. Why is this a bad idea, and what's the alternative?
Strong Answer Framework:
Diagnose the underlying want. What do they actually need to share — the full Order model or a specific view of it (e.g., shipping address, weight, items)?
Apply the Single Responsibility Principle at the context level. Order Service owns the commercial concerns (customer, total, status). Shipping Service owns the fulfillment concerns (address, carrier, tracking).
Propose a Published Language. Order Service publishes OrderShippable events containing only the fields Shipping needs. This becomes the contract.
Explicitly reject the Shared Kernel pattern here. Shared Kernel (a small shared model between two contexts) is reserved for cases where both teams are closely aligned and the shared model is stable. Order and Shipping are typically separate teams with different release cadences — bad Shared Kernel candidates.
Implement an Anti-Corruption Layer on Shipping’s side. Shipping consumes OrderShippable events and translates them into its internal Shipment model.
Real-World Example:Amazon’s order/fulfillment split (2010s-present). Amazon explicitly separates “Order” (the commercial record) from “Shipment” (the fulfillment entity). When you buy 3 items, you get one Order but potentially multiple Shipments, each with its own tracking number. The bounded contexts are different because the lifecycles differ: Order is closed when payment clears; Shipments close when delivered, which might be days later.Senior Follow-up Questions:
“What if ‘OrderShippable’ events contain 90% of the Order model anyway?” That is a signal the two contexts are more coupled than you think. Re-examine whether the boundary is in the right place, or whether the events are over-fetching. Published Languages should contain only what the consumer truly needs.
“How do you version the Published Language?” Additive changes (new optional fields) are safe. Breaking changes require schema registry enforcement (Confluent Schema Registry, Apicurio) and dual-publishing both versions during migration windows.
“What if Shipping needs to know an Order was cancelled?” That is a separate event, OrderCancelled, and Shipping subscribes to it. The events are the coordination mechanism, not a shared model.
Common Wrong Answers:
“Just put the Order class in a shared library.” Fails because every Order schema change now requires both teams to upgrade the library in lockstep. Deploy independence is lost.
“Use a shared database view.” Fails because DB-level coupling is the worst kind — you cannot change the schema without breaking consumers invisibly.
Further Reading:
Eric Evans, Domain-Driven Design (Addison-Wesley, 2003), Chapter 14 (Context Mapping).
Each bounded context has its own vocabulary that everyone (developers and business) uses. This is not just a naming convention — it is a communication discipline. When the code says publish() and the business says “make it live,” you have a language gap that will cause bugs. Ubiquitous language means the code reads like the business speaks, within each context.
New developers often see three “Product” classes and assume it is code duplication that should be consolidated. That instinct is wrong and it is the most common way teams destroy bounded contexts. The Catalog Product, Sales Product, and Shipping Product share a name, but they represent fundamentally different concerns. The Catalog team cares about merchandising (descriptions, images, categories). The Sales team cares about money (prices, discounts, promotions). The Shipping team cares about physics (weight, dimensions, fragility). Merging these into one “Product” class forces the catalog team to coordinate with shipping every time they add a new image size, and forces shipping to coordinate with sales every time they change the weight unit.If you ignore this principle and share a single Product model across contexts, you will build the most coupling-heavy class in your system. Everyone will depend on it, and no one will own it. In the broader microservices architecture, ubiquitous language per context is what makes independent team ownership possible — each team owns their model, their vocabulary, and their evolution. The tradeoff to watch: you will have data duplication and translation overhead at context boundaries. That cost is the price of independence, and it is almost always worth paying.
Node.js
Python
// Catalog Context - Languageclass CatalogProduct { sku: string; title: string; description: string; categories: Category[]; attributes: ProductAttribute[]; images: Image[]; // "Publishing" a product means making it visible publish(): void { /* ... */ } // "Archiving" removes from catalog archive(): void { /* ... */ }}// Sales Context - Different Languageclass SalesProduct { productId: string; basePrice: Money; currentPrice: Money; discount?: Discount; // "Adding to cart" is meaningful here addToCart(cart: Cart, quantity: number): void { /* ... */ } // "Applying promotion" is sales concept applyPromotion(promo: Promotion): Money { /* ... */ }}// Shipping Context - Yet Another Languageclass ShippingProduct { productId: string; weight: Weight; dimensions: Dimensions; isFragile: boolean; requiresSignature: boolean; // "Calculating shipping" makes sense here calculateShippingCost(destination: Address): Money { /* ... */ }}
┌─────────────────┐ ┌─────────────────┐│ Context A │ │ Context B ││ │ │ ││ ┌───────────────────────┐ ││ │ SHARED KERNEL │ ││ │ (Shared Code) │ ││ └───────────────────────┘ ││ │ │ │└─────────────────┘ └─────────────────┘Use when: Teams are closely aligned, small shared modelCaution: Changes require coordination
Upstream supplies data, downstream consumes.
┌─────────────────┐ ┌─────────────────┐│ SUPPLIER │────▶│ CUSTOMER ││ (Upstream) │ │ (Downstream) ││ │ │ ││ Provides API │ │ Consumes API ││ Sets contract │ │ Adapts to it │└─────────────────┘ └─────────────────┘Use when: Clear dependency directionExample: Order Service → Inventory Service
Downstream fully conforms to upstream model.
┌─────────────────┐ ┌─────────────────┐│ UPSTREAM │────▶│ CONFORMIST ││ (Dominant) │ │ (Subordinate) ││ │ │ ││ No changes │ │ Accepts model ││ for downstream │ │ as-is │└─────────────────┘ └─────────────────┘Use when: Upstream won't change (external API, legacy)Example: Using Stripe's payment model exactly
Downstream creates translation layer.
┌─────────────────┐ ┌─────────────────┐│ UPSTREAM │────▶│ ACL ││ (External) │ │ ┌──────────┐ ││ │ │ │Translator│ ││ Foreign model │ │ └──────────┘ ││ │ │ Our Model │└─────────────────┘ └─────────────────┘Use when: External system has undesirable modelExample: Legacy system integration
Caveats & Common Pitfalls — The Shared Kernel TrapShared Kernel — two teams sharing a small piece of domain code — is the most dangerous pattern in the DDD context-mapping toolkit. It looks like reasonable DRY; in practice, it is a landmine.
It requires both teams to coordinate on every change. You cannot change the Shared Kernel without alignment, which destroys team autonomy — the whole point of bounded contexts.
It drifts toward the kitchen sink. “While we’re sharing this, let’s also share X and Y.” A Shared Kernel is supposed to be tiny (maybe a handful of value objects); within a year it is 5000 LOC of shared types that no one team owns.
Version skew becomes a compatibility matrix. Team A uses v1.3; Team B uses v1.5. They disagree on what Money.add() returns. Bugs appear at the integration boundary and are painful to trace.
It’s frequently introduced for convenience, not strategic reasons. The legitimate use case for Shared Kernel is very narrow: two tightly-aligned teams working on closely-related contexts, with a stable shared model. Most “Shared Kernel” implementations in the wild are violating this.
Solutions & Patterns — Published Language Instead Of Shared KernelPrefer a Published Language (a well-documented schema, typically a Protobuf / Avro / JSON Schema definition) over a Shared Kernel (shared code).Decision rule:Shared schema, not shared code. Each team owns their own implementation of the schema; the schema is the contract.Before/after example:Before (Shared Kernel anti-pattern): Team A (Orders) and Team B (Billing) both import @company/core-domain with a shared Invoice class. Any change to Invoice requires coordinated deploys. Over 18 months, the shared lib grew to 12,000 LOC and no team fully owns it.After (Published Language): Teams agree on an InvoiceCreated event schema (Protobuf) managed in a schema registry. Orders produces events conforming to the schema. Billing consumes and translates into its own internal Invoice model via an ACL. Schema changes are backwards-compatible by default; breaking changes go through a deprecation cycle with dual-publishing.Implementation: Use Confluent Schema Registry or Apicurio to enforce contract compatibility. Consumers validate incoming messages against the registered schema. Publishers cannot register incompatible changes without explicit override.Rare legitimate Shared Kernel use cases (use sparingly):
Identity value objects shared across closely-aligned teams (e.g., a shared UserId type).
Generic utilities with no domain semantics (e.g., a Money class used by 10 teams where currency rules are truly universal).
Even in these cases, treat the shared module as a product: explicit owner, semver, deprecation policy, changelogs.
An aggregate is a cluster of domain objects treated as a single unit. Think of an aggregate like a legal document with attachments — you can reference the document by its case number (the aggregate root ID), but you cannot independently modify an attachment without going through the document. The document enforces the rules for what attachments are valid, and when you file the document, all attachments are filed atomically.The key trade-off: smaller aggregates mean better concurrency (fewer conflicts) but more eventual consistency between them. Larger aggregates mean stronger consistency guarantees but worse performance under concurrent writes. In practice, most teams start with aggregates that are too large and gradually split them as contention becomes visible.
Aggregates are the most practically important tactical pattern in DDD, because they define your transaction boundary. Everything inside an aggregate is consistent together, atomically, within a single database transaction. Everything outside the aggregate is eventually consistent, via domain events. That one rule — “one aggregate, one transaction” — is what prevents the accidental distributed transactions that wreck microservices architectures.If you ignore aggregate boundaries and let external code modify OrderItems directly without going through the Order root, you lose the ability to enforce invariants. The rule “an order cannot be modified after submission” disappears the moment someone can UPDATE order_items SET quantity = 5 WHERE order_id = ... without asking the Order aggregate. Within the broader architecture, aggregates map directly to the “consistency guarantees” column of your service contract: within-aggregate changes are atomic, cross-aggregate changes are eventually consistent. Tradeoff: large aggregates create lock contention under high concurrency (two customers trying to update the same order’s items will serialize), so the practical advice is to start small and combine only when you actually need the invariants.
Node.js
Python
// Order Aggregate// The Order is the Aggregate Root -- the only entry point for external code.// External services reference this by OrderId, never by reaching into OrderItem directly.class Order { private _id: OrderId; private _customerId: CustomerId; private _items: OrderItem[]; // Part of aggregate private _status: OrderStatus; private _shippingAddress: Address; // Value Object // Only the Aggregate Root is referenced from outside // Other objects (OrderItem) are internal constructor(customerId: CustomerId, shippingAddress: Address) { this._id = new OrderId(); this._customerId = customerId; this._items = []; this._status = OrderStatus.DRAFT; this._shippingAddress = shippingAddress; } // Business logic lives in the aggregate addItem(productId: ProductId, quantity: number, price: Money): void { if (this._status !== OrderStatus.DRAFT) { throw new Error('Cannot modify non-draft order'); } const existingItem = this._items.find(i => i.productId.equals(productId)); if (existingItem) { existingItem.increaseQuantity(quantity); } else { this._items.push(new OrderItem(productId, quantity, price)); } } removeItem(productId: ProductId): void { if (this._status !== OrderStatus.DRAFT) { throw new Error('Cannot modify non-draft order'); } this._items = this._items.filter(i => !i.productId.equals(productId)); } submit(): DomainEvent[] { if (this._items.length === 0) { throw new Error('Cannot submit empty order'); } this._status = OrderStatus.SUBMITTED; return [ new OrderSubmittedEvent(this._id, this._customerId, this.total()) ]; } total(): Money { return this._items.reduce( (sum, item) => sum.add(item.subtotal()), Money.zero() ); }}// OrderItem is part of the Order aggregate (not a separate aggregate)class OrderItem { constructor( public readonly productId: ProductId, private _quantity: number, public readonly unitPrice: Money ) {} increaseQuantity(amount: number): void { this._quantity += amount; } subtotal(): Money { return this.unitPrice.multiply(this._quantity); }}
# Order Aggregate# The Order is the Aggregate Root -- the only entry point for external code.# External services reference this by OrderId, never by reaching into OrderItem directly.from __future__ import annotationsfrom dataclasses import dataclass, fieldfrom enum import Enumfrom uuid import uuid4class OrderStatus(str, Enum): DRAFT = "draft" SUBMITTED = "submitted" PAID = "paid" CANCELLED = "cancelled"@dataclassclass OrderItem: """Part of the Order aggregate -- not a separate aggregate.""" product_id: "ProductId" quantity: int unit_price: "Money" def increase_quantity(self, amount: int) -> None: self.quantity += amount def subtotal(self) -> "Money": return self.unit_price.multiply(self.quantity)class Order: """Aggregate Root. External services reference this by OrderId only.""" def __init__(self, customer_id: "CustomerId", shipping_address: "Address") -> None: self._id: OrderId = OrderId() self._customer_id = customer_id self._items: list[OrderItem] = [] self._status = OrderStatus.DRAFT self._shipping_address = shipping_address # Business logic lives in the aggregate def add_item(self, product_id: "ProductId", quantity: int, price: "Money") -> None: if self._status != OrderStatus.DRAFT: raise ValueError("Cannot modify non-draft order") existing = next((i for i in self._items if i.product_id == product_id), None) if existing is not None: existing.increase_quantity(quantity) else: self._items.append(OrderItem(product_id, quantity, price)) def remove_item(self, product_id: "ProductId") -> None: if self._status != OrderStatus.DRAFT: raise ValueError("Cannot modify non-draft order") self._items = [i for i in self._items if i.product_id != product_id] def submit(self) -> list["DomainEvent"]: if not self._items: raise ValueError("Cannot submit empty order") self._status = OrderStatus.SUBMITTED return [ OrderSubmittedEvent( order_id=self._id, customer_id=self._customer_id, total=self.total(), ) ] def total(self) -> "Money": total = Money.zero() for item in self._items: total = total.add(item.subtotal()) return total
Caveats & Common Pitfalls — Anemic Domain ModelsAn anemic domain model is one where entities are just data bags (getters and setters) and all the business logic lives in separate “Service” classes. Martin Fowler famously calls this an anti-pattern — and it is the single most common failure mode in DDD adoption.
OrderService.submitOrder(order)` containing all the validation logic. The Order class itself has no behavior; the service class has all the rules. You have just re-implemented Transaction Script with extra ceremony.
Invariants enforced nowhere. Because rules live in services, you can construct an invalid Order (e.g., zero items, negative total) by bypassing the service. The Order class should refuse to exist in an invalid state.
Business logic scattered across many services. The rule “an order cannot be modified after submission” ends up duplicated in 5 places: OrderService, BillingService, ShippingService, AdminService, ReportingService. Each one implements the check slightly differently.
Inability to reason about domain rules locally. A new developer reads the Order class and learns nothing about what an order is. To understand the domain, they must read 10 service classes.
Solutions & Patterns — Rich Aggregates With InvariantsMove business logic into the aggregate. The aggregate is responsible for enforcing its own invariants and protecting its internal state.Decision rule:Any rule about “when can X happen?” or “what makes an X valid?” belongs inside the X aggregate, not in a service class. Services orchestrate; aggregates enforce.Before/after example:
# Anemic (BAD) -- rules live in a service; Order is a dumb data bagclass Order: items: list status: str total: Decimalclass OrderService: def submit(self, order: Order): if order.status != "DRAFT": raise ValueError("Already submitted") if not order.items: raise ValueError("Empty order") if order.total < 0: raise ValueError("Negative total") order.status = "SUBMITTED" # ... publish event
# Rich (GOOD) -- Order enforces its own rules; external code cannot bypass themclass Order: def __init__(self, customer_id, shipping_address): self._items = [] self._status = OrderStatus.DRAFT self._customer_id = customer_id def add_item(self, product_id, quantity, price): if self._status != OrderStatus.DRAFT: raise InvalidOperationError("Cannot modify submitted order") if quantity <= 0: raise ValueError("Quantity must be positive") # ... add item def submit(self) -> list[DomainEvent]: if not self._items: raise InvalidOperationError("Cannot submit empty order") self._status = OrderStatus.SUBMITTED return [OrderSubmittedEvent(self._id, self.total())]
The difference is that external code cannot construct or mutate an Order into an invalid state. The invariants are enforced at the boundary. This is the foundation of why aggregates are load-bearing — they are the only point in your architecture where you can trust the data.
Interview: Your team has a 'UserService' with 20 methods, all of which take a User data object and return it modified. Is this DDD? What's wrong?
Strong Answer Framework:
Identify the anti-pattern. This is an anemic domain model — all logic is in the service, the User class is just a data bag. It is using DDD vocabulary without the substance.
Explain the hidden cost. Invariants are impossible to enforce because any code can construct or mutate User directly, bypassing UserService.
Refactor toward a rich aggregate. Methods like changeEmail, changePassword, deactivate belong on the User class itself. UserService becomes thin — it coordinates repository, event publishing, and transaction boundaries.
Distinguish domain services from application services. Application services (UserService at the use-case level) coordinate workflows. Domain services (e.g., PasswordHashingService) encapsulate logic that does not belong to a single aggregate but is still domain behavior.
Show the test benefit. Rich aggregates are testable without a database. A unit test of user.changeEmail("new@x.com") runs in microseconds. Testing anemic models requires the whole service layer.
Real-World Example:Vaughn Vernon’s IDDD sample code (2013-present). Vernon’s canonical Implementing Domain-Driven Design reference implementation (on GitHub) explicitly shows rich aggregates across identity, collaboration, and agile PM contexts. The User aggregate owns its validation, password rotation rules, and deactivation logic. No “UserService” orchestrates mutations.Senior Follow-up Questions:
“When is a service layer legitimate, vs. anemic?” An application service is legitimate — it orchestrates a use case (load aggregate, invoke behavior, persist, publish events). A domain service is legitimate when the logic genuinely spans multiple aggregates (e.g., TransferMoney between two Accounts). What is illegitimate is business rules inside these services when they belong on an aggregate.
“What about ORMs that want plain data classes?” Separate the persistence model from the domain model. SQLAlchemy 2.0, JPA, and Entity Framework all support mapping rich domain objects. Or use a repository pattern to translate between the two.
“How do you retrofit rich aggregates into a legacy anemic codebase?” Incrementally. Pick one aggregate. Introduce methods for one specific use case (Order.cancel()). Deprecate the equivalent service method. Repeat. This can take months but is non-disruptive.
Common Wrong Answers:
“It’s fine because we’re using DDD terminology.” Fails because DDD is not a naming convention. The semantic substance (invariants, rich behavior, bounded contexts) matters more than class names.
“We need service classes because of dependency injection.” Fails because DI is compatible with rich aggregates — inject dependencies via constructors into services that use aggregates, not into aggregates themselves.
Further Reading:
Martin Fowler, “AnemicDomainModel” (2003) — the original article.
Greg Young’s talk “7 Reasons Why Your DDD is Failing” (DDDEU 2019).
Interview: How do you handle a business rule that spans two aggregates -- like 'a customer cannot place an order if their credit limit is exceeded'?
Strong Answer Framework:
Identify the type of rule. Rules spanning aggregates should either (a) be eventually consistent via events, or (b) live in a domain service that coordinates the two aggregates within a single transaction.
Prefer eventual consistency where possible. If the rule is “warn after exceeding limit,” let credit-limit-exceeded be an event that Customer subscribes to.
For strict rules, use a domain service.OrderPlacementService.place(order, customer) loads both aggregates, checks the invariant, and persists within a transaction. Both aggregates remain rich; the service is the coordinator.
Avoid distributed transactions across services. If Customer and Order are separate services, you cannot atomically check limit + create order. You must choose: (a) optimistic placement + async compensation (saga), or (b) real-time query to Customer at placement time.
Document the consistency guarantee explicitly. “Orders are placed optimistically; exceeded-limit cases compensate by cancelling within 5 seconds.” This is a product contract, not a bug.
Real-World Example:Stripe’s fraud rule engine (2018-present). Stripe places charges optimistically (low-latency path) and runs fraud rules asynchronously. When fraud is detected post-placement, they issue a compensating refund. The consistency model is explicit and documented — merchants build their own flows around it.Senior Follow-up Questions:
“What if the merchant cannot tolerate eventual consistency?” Offer a synchronous fraud check (RadarRules.evaluate() before charging). Trade latency for stronger guarantees. This is a product-level choice.
“How do you handle failures in the compensation path?” Compensation must be idempotent and retried until successful. Use the transactional outbox pattern to ensure compensation events are never lost.
“What’s the risk of relying on domain services for cross-aggregate rules?” The service can grow to be a god-object that does “everything about orders.” Keep domain services narrow — one use case each. Use application services for orchestration.
Common Wrong Answers:
“Put the rule in both aggregates.” Fails because now the rule is duplicated in two places, diverges over time, and is impossible to keep in sync.
“Use a distributed transaction across the two services.” Fails because 2PC is a known anti-pattern for microservices — high latency, single point of failure, poor availability.
Further Reading:
Chris Richardson, Microservices Patterns (Manning, 2018), Chapter 4 (Sagas) and Chapter 5 (Business Logic Organization).
Immutable objects defined by their attributes, not identity. Two Money objects with the same amount and currency are equal — you do not care which specific 10billyouhave,justthatyouhave10. This is the opposite of entities, where two Users with identical names are still different people (they have different IDs).Production pitfall: The most common value object mistake is representing money as a plain number. Floating-point arithmetic means 0.1 + 0.2 !== 0.3 in JavaScript (and 0.1 + 0.2 == 0.30000000000000004 in Python). Use integer cents, Python’s Decimal, or a dedicated Money class. This is not a theoretical concern — rounding errors in financial calculations compound across millions of transactions and trigger real reconciliation failures.
Value objects feel like a small detail but they are load-bearing for domain correctness. When Money, Address, and Quantity are immutable, you can pass them around freely, share them across aggregates, and cache them without fear of action-at-a-distance bugs where one caller mutates a shared reference and breaks another caller. Equality is by value, not identity, which matches how the domain thinks: “these are the same address” is a claim about the content, not the memory location.If you make value objects mutable, you reintroduce the class of bugs immutability was designed to prevent: one piece of code changes an Address field on a shared reference, and three unrelated features suddenly break because they were relying on the original value. In the broader microservices architecture, immutable value objects map cleanly to message payloads — you can serialize them into a Kafka event and know they will survive the round trip intact. Tradeoff: every “modification” allocates a new object. For hot paths with millions of operations per second, this can add GC pressure, but in practice this is almost never the bottleneck in a business application.
Node.js
Python
// Value Object: Money// Immutable, compared by value, and enforces domain rules (no negative amounts).// Notice Object.freeze() -- this prevents accidental mutation, which is critical// when the same Money instance is shared across multiple order items.class Money { constructor( private readonly amount: number, private readonly currency: string ) { if (amount < 0) throw new Error('Amount cannot be negative'); Object.freeze(this); } add(other: Money): Money { if (this.currency !== other.currency) { throw new Error('Cannot add different currencies'); } return new Money(this.amount + other.amount, this.currency); } multiply(factor: number): Money { return new Money(this.amount * factor, this.currency); } equals(other: Money): boolean { return this.amount === other.amount && this.currency === other.currency; } static zero(currency: string = 'USD'): Money { return new Money(0, currency); }}// Value Object: Addressclass Address { constructor( public readonly street: string, public readonly city: string, public readonly state: string, public readonly zipCode: string, public readonly country: string ) { Object.freeze(this); } equals(other: Address): boolean { return ( this.street === other.street && this.city === other.city && this.state === other.state && this.zipCode === other.zipCode && this.country === other.country ); } formatted(): string { return `${this.street}, ${this.city}, ${this.state} ${this.zipCode}, ${this.country}`; }}
# Value Object: Money# Immutable (frozen=True), compared by value (eq=True), and enforces domain rules.# Using Decimal to avoid float precision issues that would silently corrupt totals.from dataclasses import dataclassfrom decimal import Decimal@dataclass(frozen=True)class Money: amount: Decimal currency: str = "USD" def __post_init__(self) -> None: # Validate invariant -- negative amounts are not allowed in our domain if self.amount < Decimal("0"): raise ValueError("Amount cannot be negative") def add(self, other: "Money") -> "Money": if self.currency != other.currency: raise ValueError("Cannot add different currencies") return Money(self.amount + other.amount, self.currency) def multiply(self, factor: int | Decimal) -> "Money": return Money(self.amount * Decimal(factor), self.currency) @classmethod def zero(cls, currency: str = "USD") -> "Money": return cls(Decimal("0"), currency)# Value Object: Address# frozen=True makes the dataclass immutable -- any attempt to reassign a field raises.# This is the Python equivalent of Object.freeze() in JavaScript.@dataclass(frozen=True)class Address: street: str city: str state: str zip_code: str country: str def formatted(self) -> str: return f"{self.street}, {self.city}, {self.state} {self.zip_code}, {self.country}" # Equality is automatic with @dataclass -- two Addresses with identical fields are equal, # regardless of whether they are the same Python object. This is value-object semantics.
Domain events are the mechanism that allows cross-aggregate and cross-service consistency without tight coupling. When an Order is submitted, the Order aggregate does not need to know that Inventory Service will reserve stock, that Email Service will send a confirmation, that Analytics will record a funnel step, and that Recommendations will update the user’s history. It just publishes OrderSubmittedEvent and the interested services subscribe to it. This is the essence of inversion of dependency at the architecture level.If you skip domain events and instead have Order Service directly call Inventory, Email, Analytics, and Recommendations, you have coupled one change (submitting an order) to four downstream systems. Adding a fifth subscriber means modifying Order Service. In the broader microservices architecture, events are what enables true independent evolution — a new team can start consuming OrderSubmittedEvent tomorrow and the Order team never knows. Tradeoff: events are asynchronous and eventually consistent, so you cannot assume all subscribers have processed an event before the next one arrives. Design for idempotency and out-of-order delivery.
Node.js
Python
// Base Domain Eventabstract class DomainEvent { public readonly occurredAt: Date; public readonly eventId: string; constructor() { this.occurredAt = new Date(); this.eventId = uuid(); } abstract get eventType(): string;}// Specific Eventsclass OrderSubmittedEvent extends DomainEvent { constructor( public readonly orderId: OrderId, public readonly customerId: CustomerId, public readonly total: Money, public readonly items: Array<{productId: string, quantity: number}> ) { super(); } get eventType(): string { return 'order.submitted'; }}class OrderPaidEvent extends DomainEvent { constructor( public readonly orderId: OrderId, public readonly paymentId: string, public readonly amount: Money ) { super(); } get eventType(): string { return 'order.paid'; }}class InventoryReservedEvent extends DomainEvent { constructor( public readonly orderId: OrderId, public readonly reservations: Array<{productId: string, quantity: number}> ) { super(); } get eventType(): string { return 'inventory.reserved'; }}
# Base Domain Event# Using Pydantic BaseModel so events serialize/deserialize cleanly when crossing# service boundaries (Kafka, RabbitMQ, etc.) -- same model shape in all languages.from abc import abstractmethodfrom datetime import datetime, timezonefrom typing import ClassVarfrom uuid import UUID, uuid4from pydantic import BaseModel, Fieldclass DomainEvent(BaseModel): event_id: UUID = Field(default_factory=uuid4) occurred_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc)) event_type: ClassVar[str] # set by each subclass class Config: frozen = True # events are immutable facts about the pastclass OrderLineItem(BaseModel): product_id: str quantity: int# Specific Eventsclass OrderSubmittedEvent(DomainEvent): event_type: ClassVar[str] = "order.submitted" order_id: "OrderId" customer_id: "CustomerId" total: "Money" items: list[OrderLineItem]class OrderPaidEvent(DomainEvent): event_type: ClassVar[str] = "order.paid" order_id: "OrderId" payment_id: str amount: "Money"class InventoryReservedEvent(DomainEvent): event_type: ClassVar[str] = "inventory.reserved" order_id: "OrderId" reservations: list[OrderLineItem]
The folder structure below reflects the classic “layered architecture” (sometimes called hexagonal or onion architecture) applied to a DDD microservice. The four layers — domain, application, infrastructure, api — exist so that business logic stays pure and independent of technical details. The domain layer knows nothing about HTTP, PostgreSQL, or RabbitMQ; it just describes what an Order is and how it behaves. This separation pays off when you need to swap Postgres for DynamoDB, or REST for gRPC — only the infrastructure and api layers change.If you skip this structure and mix business logic with HTTP controllers or ORM queries, you get the “fat controller” anti-pattern where a 500-line route handler does validation, database work, external calls, and business rules all in one place. In the broader microservices architecture, layered structure is what makes unit-testing the business logic feasible without spinning up a test database, which in turn enables fast CI pipelines. Tradeoff: the directory count is larger, and junior engineers initially find the indirection confusing (“why do I need an interface AND an implementation?”). The payoff comes around month six, when refactoring the infrastructure does not ripple through business code.
When integrating with external systems that have a different model. The ACL is like a professional translator at a diplomatic meeting — each side speaks their own language, and the translator ensures the meaning is preserved without either side needing to learn the other’s language. Without this layer, external model changes ripple through your entire domain.
The ACL is one of the most misused patterns in DDD. Teams either over-apply it (wrapping every external API in a translator “just in case”) or under-apply it (letting Stripe’s field names leak into their domain classes). The right question to ask is: “If this external system doubled the number of fields it returns or renamed half of them, how many files in my codebase would need to change?” If the answer is “every file that deals with payments,” you need an ACL. If the answer is “just the one client class,” you are already fine.Without an ACL when you need one, external model changes become an existential threat. Stripe releases v3 of their API, half your codebase breaks, and you spend two weeks untangling “charge_id” vs “payment_intent_id” across 40 files. Within the broader architecture, ACLs are what allow generic-subdomain vendors to be swapped with acceptable effort — migrating from Stripe to Adyen stops being a project and becomes a task. Tradeoff: an ACL is extra code, an extra abstraction, and sometimes an extra allocation per call. For truly stable, well-designed external APIs (like Stripe itself, in most cases), conforming directly is often better than adding a layer for its own sake.
Node.js
Python
// External Payment Provider (Stripe-like)interface ExternalPaymentResponse { charge_id: string; amount_cents: number; currency: string; status: 'succeeded' | 'failed' | 'pending'; failure_code?: string; failure_message?: string; created_at: number; // Unix timestamp}// Our Domain Modelinterface PaymentResult { paymentId: PaymentId; amount: Money; status: PaymentStatus; failureReason?: string; processedAt: Date;}// Anti-Corruption Layer// This class is the single point where external models are translated to our domain.// If Stripe changes their API response format, ONLY this class needs to change --// the rest of our codebase continues using PaymentResult, Money, and PaymentStatus.class PaymentProviderACL { constructor(private externalClient: ExternalPaymentClient) {} async processPayment( amount: Money, paymentMethod: PaymentMethod ): Promise<PaymentResult> { // Translate our model to external format const externalRequest = { amount: amount.cents, currency: amount.currency.toLowerCase(), source: this.translatePaymentMethod(paymentMethod) }; // Call external service const externalResponse = await this.externalClient.createCharge(externalRequest); // Translate response back to our model return this.translateToPaymentResult(externalResponse); } private translatePaymentMethod(method: PaymentMethod): string { // Translation logic switch (method.type) { case 'CREDIT_CARD': return method.token; case 'BANK_TRANSFER': return method.accountToken; default: throw new Error(`Unsupported payment method: ${method.type}`); } } private translateToPaymentResult(response: ExternalPaymentResponse): PaymentResult { return { paymentId: new PaymentId(response.charge_id), amount: Money.fromCents(response.amount_cents, response.currency.toUpperCase()), status: this.translateStatus(response.status), failureReason: response.failure_message, processedAt: new Date(response.created_at * 1000) }; } private translateStatus(externalStatus: string): PaymentStatus { const statusMap: Record<string, PaymentStatus> = { 'succeeded': PaymentStatus.COMPLETED, 'failed': PaymentStatus.FAILED, 'pending': PaymentStatus.PENDING }; return statusMap[externalStatus] || PaymentStatus.UNKNOWN; }}
# External Payment Provider (Stripe-like)# Pydantic models enforce the external schema at the ACL boundary -- if Stripe# changes a field type, validation fails loudly in one place instead of silently# corrupting data across the codebase.from datetime import datetime, timezonefrom decimal import Decimalfrom enum import Enumfrom typing import Literalfrom pydantic import BaseModelclass ExternalPaymentResponse(BaseModel): charge_id: str amount_cents: int currency: str status: Literal["succeeded", "failed", "pending"] failure_code: str | None = None failure_message: str | None = None created_at: int # Unix timestamp# Our Domain Modelclass PaymentStatus(str, Enum): COMPLETED = "completed" FAILED = "failed" PENDING = "pending" UNKNOWN = "unknown"class PaymentResult(BaseModel): payment_id: "PaymentId" amount: "Money" status: PaymentStatus failure_reason: str | None = None processed_at: datetime# Anti-Corruption Layer# This class is the single point where external models are translated to our domain.# If Stripe changes their API response format, ONLY this class needs to change --# the rest of our codebase continues using PaymentResult, Money, and PaymentStatus.class PaymentProviderACL: _STATUS_MAP = { "succeeded": PaymentStatus.COMPLETED, "failed": PaymentStatus.FAILED, "pending": PaymentStatus.PENDING, } def __init__(self, external_client: "ExternalPaymentClient") -> None: self._external_client = external_client async def process_payment( self, amount: "Money", payment_method: "PaymentMethod", ) -> PaymentResult: # Translate our model to external format external_request = { "amount": amount.cents, "currency": amount.currency.lower(), "source": self._translate_payment_method(payment_method), } # Call external service raw_response = await self._external_client.create_charge(external_request) external_response = ExternalPaymentResponse.model_validate(raw_response) # Translate response back to our model return self._translate_to_payment_result(external_response) def _translate_payment_method(self, method: "PaymentMethod") -> str: match method.type: case "CREDIT_CARD": return method.token case "BANK_TRANSFER": return method.account_token case _: raise ValueError(f"Unsupported payment method: {method.type}") def _translate_to_payment_result( self, response: ExternalPaymentResponse ) -> PaymentResult: return PaymentResult( payment_id=PaymentId(response.charge_id), amount=Money( amount=Decimal(response.amount_cents) / Decimal(100), currency=response.currency.upper(), ), status=self._translate_status(response.status), failure_reason=response.failure_message, processed_at=datetime.fromtimestamp(response.created_at, tz=timezone.utc), ) def _translate_status(self, external_status: str) -> PaymentStatus: return self._STATUS_MAP.get(external_status, PaymentStatus.UNKNOWN)
Answer:
A bounded context is a conceptual boundary within which a particular domain model is defined and applicable. It’s where a specific ubiquitous language is consistent.Key characteristics:
Same term can mean different things in different contexts (e.g., “Product” in Catalog vs Sales)
Each context has its own models, rules, and language
Maps naturally to microservices boundaries
Reduces complexity by limiting scope
Q2: What is an Aggregate?
Answer:
An aggregate is a cluster of domain objects that can be treated as a single unit, with one entity acting as the root.Rules:
Reference other aggregates by ID only
Keep aggregates small
Changes within aggregate are atomic (consistency boundary)
Cross-aggregate consistency is eventual (via events)
Example: Order (root) contains OrderItems, but references Customer by ID.
Q3: What is an Anti-Corruption Layer?
Answer:
An ACL is a translation layer between two systems with different models. It prevents external/legacy models from “corrupting” your domain model.Use cases:
Integrating with third-party APIs (Stripe, SendGrid)
Communicating with legacy systems
When upstream model doesn’t fit your domain
The ACL translates between external representations and your domain objects.
Q4: How do you identify service boundaries using DDD?
Answer:
Event Storming: Workshop to discover domain events
Identify Bounded Contexts: Group related events and concepts
Define Aggregates: What data changes together?
Map Context Relationships: Customer-Supplier, ACL, etc.
'You are designing service boundaries for a new e-commerce platform. Two teams disagree: one wants a single OrderService, the other wants separate CartService and OrderService. How do you resolve this?'
Strong Answer:This is a classic bounded context question, and the answer depends on whether Cart and Order have genuinely different lifecycles and languages. In my experience, they usually do.A Cart is ephemeral, mutable, and user-facing. Users add items, remove items, change quantities, and abandon carts constantly. The data model is optimized for fast reads and writes — Redis is often the right storage. A Cart has no concept of payment, shipping, or fulfillment.An Order is permanent, append-only (status changes, not item changes), and drives backend workflows. Once an order is placed, it triggers payment processing, inventory reservation, and fulfillment. The data model needs ACID guarantees, audit trails, and complex status machines — PostgreSQL is the natural fit.The transition from Cart to Order is a domain event: “CartCheckedOut” creates an Order and the Cart is either archived or deleted. After that moment, changes to the Cart codebase (like adding a “save for later” feature) should never require changes to the Order codebase.I would validate this boundary using the “change frequency” test: if someone asks “add coupon support to the cart,” does that require changing Order? If no, the boundary is correct. If yes, you need to look deeper at where coupon logic actually belongs — possibly in a separate Pricing context.The team that wants a single OrderService is probably thinking about technical convenience (one codebase, shared models). The team that wants separation is thinking about domain clarity. DDD sides with domain clarity because technical convenience erodes over time as the models diverge.Follow-up: “What happens to items in the cart if a product price changes between when the user added it and when they check out?”This is where bounded contexts earn their keep. The Cart stores a snapshot of the product price at add-time (or refreshes it on each page load — depends on product requirements). At checkout, the Order Service fetches the current price from the Product/Pricing context. If prices diverged, you have a business decision: honor the cart price (better customer experience) or use the current price (revenue accuracy). This decision belongs in the Pricing bounded context, not in Cart or Order. It is a domain event — “PriceVerifiedAtCheckout” — with business rules that the pricing team owns. The fact that this logic naturally sits outside both Cart and Order further validates that they are separate contexts.
'Explain what an Anti-Corruption Layer is and give me a real scenario where you would use one versus just conforming to the upstream API.'
Strong Answer:An Anti-Corruption Layer (ACL) is a translation boundary that prevents an external system’s model from leaking into your domain. It is essentially an adapter pattern at the domain level.You use an ACL when the external model would distort your internal domain language. The classic example: integrating with a legacy ERP system that represents customers as “CUST_ACCT” records with fields like “ACCT_BAL_CR” and “ACCT_STAT_CD.” If you let those names and structures leak into your domain, your codebase becomes a translation nightmare. The ACL translates “ACCT_STAT_CD = ‘A’” into “customer.status = CustomerStatus.ACTIVE” at the boundary, and your domain code never sees the legacy terminology.Conversely, you conform (skip the ACL) when the upstream model is well-designed and aligns with your domain. Stripe’s payment API is a good example. Their concepts — charges, refunds, customers, payment methods — map cleanly to what most e-commerce domains need. Building an ACL on top of Stripe would add indirection without adding clarity. You conform to their model, accept their terminology, and move on.The decision heuristic I use: if more than 30% of the upstream fields/concepts need renaming or restructuring to make sense in your domain, build an ACL. If the mapping is mostly one-to-one with minor naming differences, conform. The risk of conforming when you should not is that the upstream model’s quirks infect your codebase gradually — you end up with methods like “getAccountBalance” in a domain that calls it “available credit.”A real scenario from my experience: integrating with a shipping provider whose API returned all measurements in imperial units with field names like “PKG_WGT_LBS” and status codes that were two-letter abbreviations. Our domain used metric units and descriptive enums. The ACL converted units, translated status codes, and normalized field names. When we later switched shipping providers, only the ACL changed — zero impact on domain logic.Follow-up: “How do you test an ACL effectively, given that the upstream system might change without notice?”Contract testing is essential here. I use a tool like Pact to record the expected responses from the external system and verify that my ACL correctly translates them. But more importantly, I add a “canary integration test” that runs against the real upstream system (in staging or sandbox) on a schedule — say, every 6 hours. If the upstream response format changes, the canary catches it before production traffic does. I also version the ACL translation logic, so if the upstream introduces a v2 API, I can add a new translator without removing the v1 one until migration is complete.
'How would you run an Event Storming session for a system you know nothing about, and what would you do with the output?'
Strong Answer:Event Storming is the most effective technique I have used for discovering bounded contexts, but it only works if you get the right people in the room. I need domain experts — the people who actually do the work, not managers who describe the work. Product managers, customer support leads, operations staff, and engineers. No laptops, just sticky notes and a long wall.The session runs in phases. Phase one: I ask “What happens in this business?” and have everyone write domain events on orange sticky notes in past tense — “Order Placed,” “Payment Received,” “Shipment Dispatched.” No filtering, no organizing, just dump everything. This takes 20-30 minutes and usually produces 50-100 events.Phase two: we arrange events chronologically along the wall, left to right. Disagreements are gold — when two people argue about whether “Payment Authorized” comes before or after “Inventory Reserved,” that reveals a real business process ambiguity that needs resolution.Phase three: we add commands (blue stickies) — what triggers each event? “Submit Order” triggers “Order Placed.” We add actors (yellow) — who initiates the command? Customer? System? Admin? We add aggregates (yellow, large) — what data is needed to process the command?Phase four: we draw boundaries. Events that cluster together, share aggregates, and use the same language become bounded contexts. The biggest insight is usually that the same word means different things in different clusters — “Product” in catalog versus “Product” in shipping. That semantic divergence is the clearest signal that you have found a context boundary.What I do with the output: each bounded context becomes a candidate microservice. The events between contexts become the async messages or API calls between services. The aggregates become the data each service owns. I document this in a context map diagram showing the relationships (Customer-Supplier, Conformist, ACL) between contexts, and this becomes the architectural blueprint that guides the first 6-12 months of development.Follow-up: “What do you do when the Event Storming session reveals that the domain experts disagree about how the business actually works?”That is the single most valuable outcome of Event Storming, even though it feels uncomfortable. When the head of operations says “we always verify payment before reserving inventory” and the head of fulfillment says “we always reserve inventory first to avoid stockouts,” you have discovered a real process inconsistency that is probably causing bugs or customer complaints in the current system. I capture both perspectives, escalate to the product owner for a decision, and design the system to handle whichever flow is chosen — plus make the other flow technically possible via feature flag, because decisions like this often get revisited.