Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Capstone Project: E-Commerce Platform
Build a production-ready e-commerce platform using all the microservices patterns you have learned. This is where theory meets reality. The patterns that sounded clean in isolation — sagas, event sourcing, circuit breakers — start bumping into each other in interesting ways when you wire them together. That friction is the point: wrestling with real trade-offs is what transforms knowledge into skill.- Apply all microservices patterns in practice
- Build a portfolio-worthy project
- Gain hands-on experience with real-world challenges
- Prepare for system design interviews
Project Overview
Phase 1: Project Setup
Directory Structure
Initial Setup Script
Phase 2: Core Services
User Service
Why this service exists and its bounded context. The User Service owns everything related to identity: authentication credentials, profile data, addresses, and user preferences. Its bounded context is deliberately narrow — it does not know about orders, carts, or payments. It answers one question: “Who is this person, and what do we know about them?” Separating identity from the rest of the domain means we can swap authentication providers (e.g., move from password-based auth to OAuth) without touching order logic. How it communicates with other services. The User Service exposes a synchronous REST API for user lookups (e.g.,GET /users/:id called by the Order Service when creating an order). It also publishes domain events (user.created, user.updated, user.deleted) to Kafka so downstream services like Notification and Cart can react without polling. Inter-service reads are sync because callers need the data immediately to complete their own request; fan-out notifications are async because latency tolerance is higher.
What data it owns and why. It owns the canonical user record, including the email (the unique identifier) and the password hash. No other service stores passwords — that would multiply the attack surface. Other services reference users by userId and fetch additional profile data on demand or subscribe to profile-updated events and cache what they need.
Key design decisions. MongoDB was chosen because user profiles evolve (new fields get added frequently) and have natural document shape (addresses nested inside users). PostgreSQL was considered and rejected — migrating schemas for every new profile field would slow feature velocity. We rejected a shared auth library for multiple services (tempting but creates hidden coupling) in favor of short-lived JWTs issued here and verified everywhere.
- Node.js
- Python
Order Service with Saga
Why this service exists and its bounded context. The Order Service is the orchestrator of the checkout journey. Its bounded context covers order lifecycle: creation, confirmation, fulfillment, and cancellation. Crucially, it does not directly reserve inventory or charge cards — it coordinates those operations through a saga. This separation means the Order Service remains the single source of truth for “what orders exist and what state are they in,” even as downstream services come and go. How it communicates with other services. Order Service talks to the outside world (API Gateway, clients) synchronously via REST, because the client needs an immediate order ID to show on screen. Internally, the saga uses Kafka for all coordination. The Order Service publishes commands (inventory.reserve, payment.process) and subscribes to outcomes (inventory.reserved, payment.completed, payment.failed). Async messaging is non-negotiable here because sync chaining across inventory + payment + notification would mean any slow downstream service blocks checkout.
What data it owns and why. Orders, order items, shipping addresses (snapshotted at order time, not referenced — addresses change, but the order’s shipping address must not), and the saga state machine. The saga state (STARTED, INVENTORY_RESERVED, COMPLETED, COMPENSATING, FAILED) is stored alongside the order so that a crashed service can recover mid-saga by reading the last known state.
Key design decisions. We chose orchestration over choreography for the saga. Choreography (each service reacts to events from other services) scales poorly once sagas exceed three steps — debugging becomes a nightmare of “who sent what to whom.” Orchestration centralizes the flow at the cost of a slight coupling increase. PostgreSQL was chosen because order state transitions need ACID guarantees; a half-written order is worse than no order.
- Node.js
- Python
Payment Service
Why this service exists and its bounded context. The Payment Service is the only component allowed to talk to payment providers (Stripe, PayPal, etc.). Its bounded context is narrow on purpose: capture charges, record transactions, and expose idempotent payment outcomes. Keeping this logic isolated also keeps PCI-DSS scope minimal — only one service (and its hosts) is in scope for the audit. How it communicates with other services. Payment Service is almost entirely async. It consumespayment.process commands from Kafka and emits payment.completed / payment.failed events. Avoiding sync HTTP to the Order Service decouples checkout latency from Stripe latency — if Stripe has a 5-second spike, the Order Service is not blocked. The Payment Service does make one sync outbound call: to Stripe itself, because that is the nature of third-party payment APIs.
What data it owns and why. Payment intents, transaction records, and idempotency keys. It does not store raw card data — Stripe tokenizes cards, and we only store Stripe’s customer ID and payment method ID. This is the core of keeping PCI scope manageable: if we never hold a PAN (primary account number), we do not have to secure one.
Key design decisions. Idempotency is implemented at three layers: an idempotency key per incoming command (deduplicates retries), Stripe’s Idempotency-Key header (deduplicates upstream calls), and a database unique constraint on (order_id, payment_intent_id) (last-line defense). We rejected an in-memory-only idempotency store — it loses state on pod restart, and deploys would produce duplicate charges. Redis or the database are the only acceptable stores.
- Node.js
- Python
Phase 3: Infrastructure
Docker Compose
Phase 4: Testing
Contract Test Example
Why contract tests here. Contract tests verify that the Order Service (the consumer) and the Payment Service (the provider) agree on the shape of their interaction, without requiring both services to be deployed at the same time during CI. The test generates a pact file from the consumer side that the provider verifies independently. This lets teams ship independently without a shared staging gate.- Node.js
- Python
E2E Test
Why this E2E test exists. Unit and contract tests validate individual services; E2E tests validate the wired-up system. This test drives a full checkout flow through the API Gateway and asserts that the saga completes. It is slow (seconds, not milliseconds) and flaky-prone (async timing), so we only use it for critical revenue paths — not for every branch.- Node.js
- Python
Technology Selection Rationale
Choosing the right database, broker, and cache for each service is one of the most impactful decisions in the project. Here is why the capstone uses what it does:| Service | Technology | Why This Choice | Alternative Considered | Why Not |
|---|---|---|---|---|
| User Service | MongoDB | Flexible user profile schema; no complex joins needed | PostgreSQL | Profile data is document-shaped; schema evolves frequently |
| Order Service | PostgreSQL | ACID transactions for order lifecycle; complex queries for reporting | MongoDB | Order state machines need transactional guarantees |
| Cart Service | Redis | Ephemeral data; sub-millisecond reads; natural TTL for cart expiry | PostgreSQL | Carts are session-scoped, not permanent records |
| Product Catalog | MongoDB | Varied product attributes per category; nested reviews | PostgreSQL | Each product category has different fields (clothing vs electronics) |
| Inventory Service | PostgreSQL | Exact counts require strong consistency; decrement-and-check is transactional | Redis | Race conditions on concurrent stock decrements without DB transactions |
| Payment Service | External (Stripe) | PCI compliance offloaded; no card data stored locally | Self-hosted | PCI-DSS compliance costs more than the Stripe fee for most businesses |
| Event Bus | Kafka | Durable, replayable event log; supports consumer groups; handles backpressure | RabbitMQ | Need event replay for rebuilding projections; Kafka’s log model fits event sourcing |
Edge Cases to Handle in Your Implementation
These are the scenarios that separate a portfolio project from a production system. Addressing even a few of them demonstrates senior-level thinking:- Double-submit on checkout — User clicks “Pay” twice. Without idempotency keys, you charge them twice. Solution: generate a client-side idempotency key and check it server-side before processing.
- Inventory reserved but payment times out — The saga reserved inventory, but the payment provider never responds. If you do not set a reservation TTL, those items are locked forever. Solution: inventory reservations expire after 15 minutes; a background job releases expired reservations.
- Kafka consumer lag during a flash sale — Your order service publishes events faster than the notification service can consume them. Users get order confirmations 30 minutes late. Solution: monitor consumer lag in Grafana; auto-scale consumers based on lag metrics.
- Partial failure in the observability stack — Jaeger is down, but your services are fine. If your health check includes Jaeger connectivity, you’ll mark healthy services as unhealthy. Solution: observability dependencies should never be in the critical path; degrade tracing gracefully.
Caveats & Common Pitfalls: Integrating All the Patterns
Interview Questions: Greenfield E-commerce Architecture
You are the architect for a greenfield e-commerce platform expected to hit 10K orders/day in year one, 500K/day by year three. Walk me through your microservices decisions from domain decomposition to deployment.
You are the architect for a greenfield e-commerce platform expected to hit 10K orders/day in year one, 500K/day by year three. Walk me through your microservices decisions from domain decomposition to deployment.
- Start with domain decomposition, not services. Map the bounded contexts (Identity, Catalog, Cart, Order, Payment, Inventory, Fulfillment, Notification). Services follow contexts, not the other way around. If two “services” share a database, they are one service with an extra network hop.
- Pick the sync/async axis per interaction. Checkout fan-out (order then inventory then payment) uses async via Kafka; user lookup uses sync REST because the caller cannot proceed without the answer.
- Choose storage per service based on access pattern. Postgres for transactional state (orders, payments), MongoDB for document-shaped entities (users, products), Redis for ephemeral state (carts, sessions).
- Decide on saga orchestration vs choreography. Orchestration for checkout because the flow is linear and debuggable; choreography for non-critical fan-out (notifications, analytics).
- Define the deployment substrate. Kubernetes + GitOps (ArgoCD) + per-service Helm charts. Start with rolling deploys, add canary when volume justifies the tooling cost.
- Observability from day one, not day 90. OpenTelemetry SDK in every service. Prometheus for metrics, Jaeger or Tempo for traces, Loki or CloudWatch for logs. Start with 5% sampling and RED metrics per service.
- Security baselines. mTLS via service mesh or in-app TLS, secrets in Vault or a managed secrets store, per-service IAM/service account scoping. PCI scope is isolated to payment service only.
- Testing strategy. Contract tests between every service pair that communicates; integration tests per service against real dependencies in CI; E2E only for revenue-critical flows.
- Plan for v2. Year three (500K/day, ~6 orders/sec peak) is well within a well-tuned Postgres + 10-service cluster. Do not over-architect for imagined scale; plan to re-shard inventory when it hits 200M rows.
UPDATE inventory SET qty = qty - 1 WHERE product_id = ? AND qty >= 1 and check the affected-rows count. Only one UPDATE succeeds. The saga’s inventory-reservation step then uses a short-TTL lock (15 minutes). This is more scalable than pessimistic locks and avoids the distributed-lock-via-Redis trap where a network partition can issue two locks.- “Start with 20 microservices, one per noun in the requirements.” This is the distributed monolith trap — you get all the operational cost of microservices with none of the team-autonomy benefit. Start with 3-5 coarse services, split when pain justifies it.
- “Use event sourcing for everything because it’s the cleanest architecture.” Event sourcing is expensive (replay complexity, schema evolution, projection cost) and only pays off for domains with complex audit/temporal requirements. Using it for product catalog is tool-driven thinking, not problem-driven.
- Sam Newman, Building Microservices (2nd ed., O’Reilly, 2021) — the definitive reference on decomposition and bounded contexts.
- Shopify Engineering blog posts on “deconstructing the monolith” and “Shopify Pods” (2017-2023).
- Chris Richardson, Microservices Patterns (Manning, 2018) — saga, CQRS, and event sourcing patterns grounded in production use cases.
Six months after launch, your Order service is responsible for 60% of your on-call pages. How do you diagnose whether the problem is architectural, operational, or code-quality?
Six months after launch, your Order service is responsible for 60% of your on-call pages. How do you diagnose whether the problem is architectural, operational, or code-quality?
- Pull the actual data before theorizing. Get 90 days of incidents with root-cause tags from the incident tracker. Do not trust your vibes about “which service is broken” — teams are systematically bad at this.
- Categorize the incidents. Code bugs (fix with better testing), infrastructure flakes (fix with infra investment), upstream failures (fix with resilience patterns), or design flaws (fix with rearchitecting).
- Look at the distribution. If 80% of incidents are one failure mode, that’s the thing to fix. If incidents are spread evenly across many modes, your Order service is probably just too busy — it owns too many responsibilities.
- Check the change log. Is the Order service the highest-churn codebase? High churn + high incidents = quality problem. Low churn + high incidents = architectural problem (the design amplifies external instability).
- Decide the intervention. Code quality: add integration tests, pair rotations, SLO-backed error budget. Operational: fix alerting thresholds, improve runbooks. Architectural: split the service or redesign the saga.
- “The Order service needs to be rewritten in Go/Rust.” Language is almost never the cause of on-call load. Pattern issues and coupling are.
- “Add more automated tests and the incidents will stop.” Tests help code-quality incidents but not architectural ones. If the service fails because it owns too many responsibilities, tests just make the existing complexity more tolerable.
- Google SRE Book, Chapter 11 (“Being On-Call”) — metrics and thresholds for on-call health.
- Will Larson, An Elegant Puzzle (Stripe Press, 2019) — “How to invest in technical infrastructure” chapter on diagnosing system pain.
- Monzo Engineering blog, “How Monzo runs its payments platform” (2020).
Evaluation Checklist
Architecture
- Clear service boundaries
- API Gateway implemented
- Service discovery working
- Database per service
- Event-driven communication
Resilience
- Circuit breakers
- Retry with backoff
- Fallback strategies
- Health checks
- Graceful degradation
Data
- Saga pattern for orders
- Event sourcing (optional)
- Idempotency handling
- Data consistency
Observability
- Distributed tracing
- Centralized logging
- Metrics & dashboards
- Alerting configured
Summary
Congratulations on completing the Microservices Mastery course! You now have:- Deep understanding of microservices patterns
- Hands-on experience with Node.js and Python implementations
- Production-ready code for your portfolio
- Interview preparation for top tech companies
What's Next?
- Deploy your capstone to Kubernetes (local or cloud)
- Add more features (search, recommendations)
- Practice system design interviews
- Share your project on GitHub
Interview Deep-Dive
'You built this e-commerce platform as a capstone project. If you had to deploy it to production for real paying customers tomorrow, what are the top 3 things you would change or add?'
'You built this e-commerce platform as a capstone project. If you had to deploy it to production for real paying customers tomorrow, what are the top 3 things you would change or add?'
'Walk me through how you would handle a Black Friday traffic spike that is 10x your normal load for this e-commerce platform.'
'Walk me through how you would handle a Black Friday traffic spike that is 10x your normal load for this e-commerce platform.'
'In your capstone project, how do you ensure that a payment is never charged twice for the same order, even if the system crashes and retries?'
'In your capstone project, how do you ensure that a payment is never charged twice for the same order, even if the system crashes and retries?'