Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Microservices Mastery
A comprehensive, interview-focused curriculum designed for engineers targeting Senior/Staff roles at top tech companies (Google, Amazon, Netflix, Uber, Stripe, etc.). This course covers everything from microservices fundamentals to production-ready distributed systems.Target Outcome: Senior+ Engineer at FAANG / Top-tier microservices expertise
Prerequisites: Node.js or Python basics, REST APIs, basic database knowledge
Languages: Every code example is shown in both Node.js/TypeScript and Python so you can learn in the stack you use at work
Projects: 10+ hands-on projects including a full e-commerce platform
Chapters: 27 in-depth chapters covering all aspects of microservices
Why This Course?
Interview Ready
Production Patterns
Hands-On Projects
Deep Technical Foundation
Before You Begin: Organizational Readiness
Before you write a singledocker-compose.yml, stop and check whether your organization is ready. The most expensive microservices mistakes are not technical — they are organizational. Teams adopt microservices because they read about Netflix or Amazon, not because their own context calls for it. The result is a distributed monolith with twice the operational cost and half the velocity of what they had before.
Interview: A VP of Engineering asks you 'Why shouldn't we just adopt microservices now so we're ready when we grow?' How do you respond?
Interview: A VP of Engineering asks you 'Why shouldn't we just adopt microservices now so we're ready when we grow?' How do you respond?
- Reframe the question from “when” to “why.” Microservices are not an upgrade path — they are a different set of trade-offs. Ask what specific problems they are trying to solve.
- Quantify the current pain. Deploy frequency, failed deploys, onboarding time, cross-team blockers. If these numbers are fine, the VP is optimizing for a future problem.
- Calculate the distributed-systems tax. Roughly 30-40% of engineering capacity goes to infrastructure for the first year post-migration.
- Propose the modular monolith as a bridge. You get boundary discipline, separate schemas, and architectural tests without the network hop. Extract later when the pain is real.
- Agree on explicit triggers for extraction. “When deploy coordination costs exceed X hours/week per team” beats “when we’re bigger.”
- “What metrics would prove to the VP that microservices are working?” Deploy frequency per team, change failure rate, mean time to recovery, and team-independence index (how often does team A block team B on a deploy?).
- “How would you handle a direct order to proceed anyway?” Propose a pilot: extract one service, run it for one quarter, and measure the metrics above. If they improve, continue. If not, pause.
- “What’s the cheapest experiment that de-risks the decision?” Run a two-week modular-monolith spike: introduce strict module boundaries with architectural tests and see if the deploy coupling pain drops. If it does, you may not need microservices at all.
- “Sure, let’s start extracting services next sprint.” Fails because it skips the readiness audit and the org analysis. You will end up with a distributed monolith.
- “Microservices are always better at scale; we should migrate.” Fails because it treats microservices as universally superior. Shopify, Basecamp, and Stack Overflow all run successful monoliths at scale.
- Martin Fowler, “Microservice Prerequisites” — the canonical readiness checklist.
- Sam Newman, “Monolith to Microservices” (O’Reilly, 2019) — when and how to extract.
- Segment’s “Goodbye Microservices” blog post (2018) — the pivot-back story in detail.
Interview: 'When would you tell a team microservices would hurt more than help?' Give me your explicit decision criteria.
Interview: 'When would you tell a team microservices would hurt more than help?' Give me your explicit decision criteria.
- Team size under 15-20 engineers. Operational overhead per service is roughly constant; fixed costs swamp small teams.
- Unclear or unstable domain boundaries. If you cannot name the contexts, extraction will freeze the wrong lines into APIs.
- Strong consistency requirements across most workflows. If 70% of your flows need ACID across domains, distributed sagas will hurt every feature.
- Limited DevOps maturity. No CI/CD, no observability, manual deploys — microservices will amplify these weaknesses.
- MVP or rapid-pivot stage. You need to change data models weekly; API contracts slow you down.
- Uniform scaling profile. If every part of the system scales together, you gain nothing by scaling independently.
istiod. Reason: the microservices pattern added operational pain for users without buying internal development velocity, because a small team owned all four components.Senior Follow-up Questions:- “What about a 10-person team building a real-time trading platform?” Still probably a monolith, because latency budgets are tight and network hops are measurable. Extract only the pieces with truly different scaling profiles.
- “You said ‘strong consistency across domains kills microservices’ — but Amazon uses microservices for orders and payments.” Amazon uses saga patterns with idempotent compensations, and they accept eventual consistency on non-critical paths. They invested a decade of platform engineering to make that cheap — it is not cheap for most teams.
- “How do you spot ‘unclear domain boundaries’ early?” Run Event Storming. If the domain experts disagree on basic vocabulary (“is a customer the same as a user?”), freeze extraction until the business model clarifies.
- “Microservices are always right once you’re big enough.” Size is necessary but not sufficient. Basecamp (37signals) has stayed monolithic at scale by choice.
- “You should extract when the codebase hits X lines of code.” LOC is a terrible metric. Coupling and team friction are the real signals.
- Google’s Istio v1 -> istiod consolidation blog (2020).
- DHH, “The Majestic Monolith” — Basecamp’s explicit anti-microservices position.
- Matt Klein (Envoy creator), “Microservices: A Retrospective” on why service meshes should not themselves be mesh-of-services.
Interview: Your team of 8 engineers has a 300K-LOC monolith and leadership wants to break it into 20 services in 6 months. What do you recommend?
Interview: Your team of 8 engineers has a 300K-LOC monolith and leadership wants to break it into 20 services in 6 months. What do you recommend?
- Push back on the timeline, not the goal. 20 services in 6 months with 8 engineers is a recipe for a distributed monolith. Propose extracting 1-2 services and reassessing.
- Audit readiness first (2-4 weeks). CI/CD, observability, deployment automation. Fix gaps before extracting.
- Identify the highest-pain boundary. Which module causes the most merge conflicts, has the most distinct scaling needs, or requires the most cross-team coordination? That is extraction #1.
- Use the Strangler Fig pattern. Route new traffic to the new service behind the existing API gateway. Old code path remains until the new service is proven.
- Define success metrics explicitly. Deploy frequency for the extracted service, number of cross-service incidents, team velocity changes.
- Plan to stop. Commit to not extracting service #3 until #1 and #2 are fully stable.
- “What if leadership insists on all 20?” Present the risk in business terms: projected production incident rate, feature velocity drop, time-to-recovery projections. Offer a 3-month checkpoint with explicit kill criteria.
- “How do you pick extraction #1?” Two factors: highest current pain (pick the module causing the most friction) and lowest coupling to the rest (pick something with a clean data boundary). You want an early win that builds confidence.
- “What’s the sign you should stop extracting?” When cross-service debugging starts costing more engineering hours than the benefits of independence provide. Measure this explicitly.
- “Yes, we can do 20 services in 6 months if we hire contractors.” Staffing up mid-migration multiplies coordination cost. You will end with more services and less expertise per service.
- “Let’s do a big-bang rewrite.” Historically, rewrites from monolith to microservices-at-once have a failure rate above 70% (per DORA research). Incremental extraction is the only safe path.
- Shopify Engineering blog, “Deconstructing the Monolith” (2019).
- Sam Newman, “Building Microservices” 2nd edition, Chapter 3 (Strangler Fig).
- DORA Accelerate State of DevOps reports on rewrite failure rates.
What Companies Ask
| Company | Common Microservices Topics |
|---|---|
| Amazon | Service decomposition, eventual consistency, DynamoDB patterns, SQS/SNS |
| Netflix | Circuit breakers, chaos engineering, service mesh, Eureka |
| Uber | Event-driven architecture, Kafka, real-time systems, CQRS |
| Stripe | Distributed transactions, idempotency, exactly-once delivery |
| gRPC, service discovery, load balancing, observability | |
| Meta | Graph services, fan-out patterns, caching at scale |
Complete Curriculum
| # | Chapter | Topics |
|---|---|---|
| 00 | Overview | Course structure, learning path, prerequisites |
| 01 | Foundations | Monolith vs microservices, when to use, trade-offs |
| 02 | Domain-Driven Design | Bounded contexts, aggregates, service decomposition |
| 03 | Sync Communication | REST, gRPC, protocol buffers, API versioning |
| 04 | Async Communication | Message queues, RabbitMQ, Kafka, event-driven |
| 05 | API Gateway | Routing, authentication, rate limiting, Kong |
| 06 | Data Management | Database per service, saga pattern, CQRS |
| 07 | Resilience Patterns | Circuit breaker, retry, bulkhead, timeout |
| 08 | Service Discovery | Consul, Eureka, DNS-based discovery |
| 09 | Observability | Tracing, logging, metrics, Prometheus, Jaeger |
| 10 | Security | OAuth2, JWT, mTLS, secrets management |
| 11 | Containerization | Docker, multi-stage builds, best practices |
| 12 | Kubernetes | Deployments, services, ConfigMaps, scaling |
| 13 | Testing | Unit, integration, contract, E2E testing |
| 14 | Interview Prep | Common questions, system design, coding |
| 15 | Capstone Project | E-commerce platform with 10+ services |
| 16 | Service Mesh | Istio, Linkerd, traffic management, mTLS |
| 17 | Configuration Management | Consul, feature flags, hot reload |
| 18 | CI/CD | GitOps, ArgoCD, GitHub Actions, canary deploys |
| 19 | Database Patterns | Data partitioning, migrations, replication |
| 20 | Caching Strategies | Redis, cache-aside, invalidation, distributed |
| 21 | Chaos Engineering | Chaos Monkey, LitmusChaos, game days |
| 22 | Case Studies | Netflix, Uber, Amazon, Spotify architectures |
| 23 | Load Balancing | Client/server-side, algorithms, health checks |
| 24 | Migration Patterns | Strangler fig, branch by abstraction, CDC |
| 25 | Event Sourcing Deep Dive | Event stores, projections, snapshots |
| 26 | GraphQL Federation | Apollo Federation, schema composition |
Course Structure
The curriculum is organized into 10 tracks progressing from fundamentals to Staff+ expertise:Learning Path
Foundations (Week 1-2)
Communication Patterns (Week 3-4)
Data Management (Week 5-6)
Resilience Patterns (Week 7-8)
Deployment & DevOps (Week 9-10)
Observability (Week 11-12)
Security & Advanced (Week 13-14)
Case Studies (Week 15-16)
GraphQL & Event Sourcing (Week 17-18)
Conway’s Law as a Warning, Not a Tattoo
Every course on microservices mentions Conway’s Law. Most treat it as an inspirational quote. It is not. It is a load-bearing constraint on your architecture, and ignoring it is the fastest way to a distributed monolith.Projects You’ll Build
API Gateway
Event Bus
Saga Orchestrator
Service Mesh Demo
Observability Stack
Chaos Engineering Lab
GraphQL Federation
E-Commerce Platform
Interview Topics Covered
System Design Questions
- Design a URL shortener with microservices
- Design an e-commerce checkout system
- Design a notification service at scale
- Design a real-time messaging system
- Design a rate limiter service
Coding Questions
- Implement circuit breaker from scratch
- Design an event-driven order processing system
- Build a distributed cache invalidation system
- Implement saga pattern with compensation
- Create a service discovery mechanism
Behavioral/Architecture Questions
- When would you choose microservices over monolith?
- How do you handle distributed transactions?
- How do you debug issues in a microservices system?
- How do you handle service failures gracefully?
- How do you ensure data consistency across services?
Tech Stack
| Category | Technologies |
|---|---|
| Languages | Node.js/TypeScript and Python 3.11+ (every example shown in both) |
| Frameworks | Express, Fastify, NestJS (Node) / FastAPI (Python) |
| Databases | PostgreSQL, MongoDB, Redis |
| Message Queues | RabbitMQ, Apache Kafka |
| Containers | Docker, Docker Compose |
| Orchestration | Kubernetes, Helm |
| API Gateway | Kong, Express Gateway |
| Service Mesh | Istio, Linkerd |
| Observability | Prometheus, Grafana, Jaeger, OpenTelemetry |
| CI/CD | GitHub Actions, ArgoCD, GitOps |
| GraphQL | Apollo Server, Apollo Federation |
| Chaos Engineering | Chaos Monkey, LitmusChaos |
| Configuration | Consul, ConfigMaps, Feature Flags |
Prerequisites
Node.js Fundamentals
Node.js Fundamentals
- JavaScript/TypeScript basics
- async/await, Promises
- Express.js basics
- npm/yarn package management
Database Knowledge
Database Knowledge
- SQL basics (PostgreSQL preferred)
- NoSQL concepts (MongoDB)
- Basic caching concepts
Web Development
Web Development
- REST API design
- HTTP methods and status codes
- JSON data format
- Basic authentication concepts
DevOps Basics
DevOps Basics
- Command line basics
- Git version control
- Basic Docker knowledge (helpful but not required)
Languages Used in This Course
Every non-trivial code example in this course is shown in both Node.js/TypeScript and Python, side by side using tabbed blocks. You pick the stack you already use at work — the underlying distributed-systems patterns are identical in either language.Node.js/TypeScript Stack
| Layer | Library | Purpose |
|---|---|---|
| HTTP framework | Express, Fastify | REST APIs, middleware |
| HTTP client | axios, undici | Outbound calls with retries |
| ORM / data access | Prisma, TypeORM | Type-safe queries, migrations |
| Validation | zod, class-validator | Request/response schemas |
| Message broker | amqplib, kafkajs | RabbitMQ, Kafka producers/consumers |
| Resilience | opossum, cockatiel | Circuit breakers, retries, timeouts |
| Observability | @opentelemetry/api, pino | Tracing, structured logs |
| GraphQL | Apollo Server, @apollo/federation | Federated gateway + subgraphs |
| Testing | Jest, Vitest, Pact | Unit, integration, contract tests |
Python Stack
| Layer | Library | Purpose |
|---|---|---|
| HTTP framework | FastAPI | Async REST APIs with Pydantic validation |
| HTTP client | httpx | Async outbound calls with timeouts |
| ORM / data access | SQLAlchemy 2.0 (async) | Typed queries, Alembic migrations |
| Validation | Pydantic v2 | Request/response schemas |
| Message broker | aiokafka, aio-pika | Kafka, RabbitMQ async clients |
| Resilience | pybreaker, tenacity | Circuit breakers, retry with backoff |
| Observability | opentelemetry-sdk, structlog | Tracing, structured logs |
| GraphQL | strawberry-graphql | Federation-compatible subgraphs |
| Testing | pytest, pytest-asyncio, schemathesis | Unit, integration, contract tests |
Learning Path by Background
Not everyone should read this course linearly. Pick the path that matches how you plan to use it.Backend Engineer Migrating to Microservices
Backend Engineer Migrating to Microservices
- 01 Foundations — Just the failure modes and trade-offs section, so you know what to watch for.
- 02 Domain-Driven Design — Bounded contexts are how you decide where to cut the monolith. This is non-negotiable.
- 24 Migration Patterns — Strangler fig, branch by abstraction, CDC. Read this before writing any new service.
- 03 Sync Communication + 04 Async Communication — You’ll use both. Understand when to reach for which.
- 06 Data Management + 19 Database Patterns — The hardest part of migration is the data. Budget weeks here.
- 07 Resilience Patterns + 09 Observability — Non-negotiable before production traffic hits the new service.
- 13 Testing — Contract tests are how you avoid 3am pages when your service schema drifts.
SRE / Platform Engineer
SRE / Platform Engineer
- 11 Containerization + 12 Kubernetes — Your bread and butter. Read carefully.
- 09 Observability — Distributed tracing, metrics, log aggregation. This is your on-call lifeline.
- 08 Service Discovery + 23 Load Balancing — How requests actually find services in production.
- 16 Service Mesh — Istio, Linkerd. Decide whether you need it (usually only at 20+ services).
- 07 Resilience Patterns + 21 Chaos Engineering — You’ll be the one running game days.
- 10 Security — mTLS, secrets rotation, zero-trust networking.
- 17 Configuration Management + 18 CI/CD — Platform plumbing.
- 22 Case Studies — How Netflix, Uber, Amazon structured their platform teams.
Full-Stack Engineer Preparing for Interviews
Full-Stack Engineer Preparing for Interviews
- 14 Interview Prep — Read this first to calibrate what’s actually asked.
- 01 Foundations + 02 Domain-Driven Design — The “why” questions.
- 03 + 04 Communication — Sync vs async trade-offs, the most common follow-up.
- 06 Data Management — Sagas, CQRS, eventual consistency. Expect a drill-down here.
- 07 Resilience Patterns — Circuit breakers come up in almost every interview.
- 09 Observability — “How would you debug this?” is a senior-level screener.
- 22 Case Studies — Memorize 2-3 you can cite by name (Netflix Chaos Monkey, Uber Cadence, Amazon service ownership).
- 15 Capstone — Skim the architecture diagrams. You don’t need to build it.
Engineering Manager
Engineering Manager
- 01 Foundations — Read the trade-offs and “hidden costs” sections carefully. Skim the code.
- 22 Case Studies — How Netflix, Uber, Amazon, Spotify structured their teams. This is Conway’s Law in practice.
- 24 Migration Patterns — If you’re leading a decomposition, this chapter is what you’ll reference for stakeholder conversations.
- 14 Interview Prep — Read the “What are the hidden costs?” answer. You’ll use this framing in budget discussions.
- 13 Testing + 18 CI/CD — Skim. Understand what contract testing buys you so you can advocate for it.
- 09 Observability — Skim the “infrastructure cost” commentary. Know the rough price tag.
- 21 Chaos Engineering — Just the philosophy section. Know what game days are and why they matter.
How to Use This Course
There’s no single right way to work through 27 chapters. Pick the strategy that matches your timeline and goals.Strategy 1: Linear Deep Dive
Strategy 2: Topic-Focused
Strategy 3: Interview Prep Only
Strategy 4: Reference Guide
Ready to Begin?
Start with Foundations
Interview Deep-Dive
You are interviewing for a Senior Backend role. The interviewer says: 'We have a successful monolith serving 5 million users. Leadership wants to move to microservices. Walk me through how you would advise them.'
You are interviewing for a Senior Backend role. The interviewer says: 'We have a successful monolith serving 5 million users. Leadership wants to move to microservices. Walk me through how you would advise them.'
An interviewer asks: 'You mentioned that microservices are not always the right answer. Give me a concrete example of a company or team that adopted microservices too early and what went wrong.'
An interviewer asks: 'You mentioned that microservices are not always the right answer. Give me a concrete example of a company or team that adopted microservices too early and what went wrong.'
'What are the hidden costs of microservices that most teams underestimate during planning?'
'What are the hidden costs of microservices that most teams underestimate during planning?'