> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# 11. Containerization with Docker

> Master Docker for microservices with multi-stage builds, optimization, and docker-compose orchestration

# Containerization with Docker

Docker provides consistent, isolated environments for microservices. Learn to build optimized containers and orchestrate multiple services.

<Info>
  **Learning Objectives:**

  * Write production-ready Dockerfiles
  * Implement multi-stage builds
  * Optimize image size and build time
  * Use docker-compose for local development
  * Apply container security best practices
</Info>

***

## Docker Fundamentals for Microservices

Before we dive into Dockerfiles, it's worth slowing down to ask: why did the industry settle on containers as the unit of deployment for microservices? The answer is that microservices fundamentally change the operational problem. When you had one monolith, you had one runtime to standardize. When you have 50 services written in 4 languages with different library versions, you either standardize the host (which freezes everyone to a single Node version) or you standardize the unit of deployment. Containers let you standardize the latter while keeping each team free to choose their own language, framework, and dependency versions.

The deeper tradeoff is isolation versus overhead. A virtual machine gives you perfect isolation but boots in minutes and consumes gigabytes of RAM. Running 50 VMs on a single host is impractical. Containers share the host kernel and provide namespace-level isolation — which is "good enough" for most workloads — and boot in milliseconds with megabytes of overhead. The catch: containers are not a security boundary in the same way VMs are. A kernel exploit in one container can potentially escape to the host. For untrusted code, you still want VMs or micro-VMs like Firecracker underneath.

If you skip containerization in a microservices architecture, you'll feel the pain quickly: environment drift between dev and prod, dependency conflicts between services on the same host, slow and unreliable deployments, and no clean way to scale individual services independently. Docker solves all of these at once, which is why it became the de facto standard.

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY DOCKER FOR MICROSERVICES?                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    WITHOUT CONTAINERS                                │    │
│  │                                                                      │    │
│  │  Developer Machine         Staging              Production          │    │
│  │  ┌──────────────────┐     ┌──────────────┐     ┌──────────────┐    │    │
│  │  │ Node 16          │     │ Node 14      │     │ Node 18      │    │    │
│  │  │ npm 7            │     │ npm 6        │     │ npm 9        │    │    │
│  │  │ Linux Mint       │     │ Ubuntu 20.04 │     │ Amazon Linux │    │    │
│  │  └──────────────────┘     └──────────────┘     └──────────────┘    │    │
│  │                                                                      │    │
│  │  "It works on my machine!" 🤷                                        │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    WITH CONTAINERS                                   │    │
│  │                                                                      │    │
│  │  Developer Machine         Staging              Production          │    │
│  │  ┌──────────────────┐     ┌──────────────────┐ ┌──────────────────┐│    │
│  │  │ ┌──────────────┐ │     │ ┌──────────────┐ │ │ ┌──────────────┐ ││    │
│  │  │ │  Container   │ │     │ │  Container   │ │ │ │  Container   │ ││    │
│  │  │ │  Node 20     │ │     │ │  Node 20     │ │ │ │  Node 20     │ ││    │
│  │  │ │  Alpine      │ │     │ │  Alpine      │ │ │ │  Alpine      │ ││    │
│  │  │ └──────────────┘ │     │ └──────────────┘ │ │ └──────────────┘ ││    │
│  │  └──────────────────┘     └──────────────────┘ └──────────────────┘│    │
│  │                                                                      │    │
│  │  Same container everywhere! ✅                                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

***

## Production-Ready Dockerfile

### Dockerfile Stages: Why the Order Matters

A Dockerfile is not a shell script — it's a declarative build recipe that the Docker daemon compiles into an immutable image. Each instruction creates a discrete layer that is content-addressed by its hash, and that hash is the basis for every caching decision Docker makes. This means the ordering of your instructions is not a stylistic choice; it's the single most consequential factor in build performance and image correctness. The stages we'll build through — base image, system packages, dependency manifest, dependency install, source code, runtime configuration — are deliberately sequenced from "rarely changes" to "changes on every commit."

What goes wrong when you ignore this? In the best case, your CI pipeline takes 10x longer than it should because dependencies reinstall on every build. In the worst case, you ship stale dependencies because a cached layer higher in the chain masks an update you thought you made. Teams routinely waste hundreds of engineer-hours per year on slow builds that a 30-second Dockerfile reorder would have prevented. The tradeoff to be aware of: aggressive caching can occasionally mask bugs where a build works in CI but fails with `--no-cache`, so every production image should be rebuilt without cache at least weekly to catch latent issues.

### Basic Node.js Dockerfile

A Dockerfile is a build recipe. Every instruction produces a new layer in the image, and those layers are cached independently. The ordering of instructions is not cosmetic — it controls caching behavior, which in turn controls your build time and your CI pipeline cost. The basic Dockerfile below demonstrates several non-obvious principles: copy the dependency manifest before the source code (so dependency installation is cached), set `NODE_ENV=production` so libraries skip development-only code paths, create a dedicated non-root user (so a compromised process cannot modify the filesystem or escalate privileges), and declare a `HEALTHCHECK` so the container runtime can detect when the app is alive but not responsive.

If you skip the non-root user step, your container runs as UID 0 inside the namespace — and in some container escape scenarios, that maps to root on the host. This is one of the most consequential security mistakes you can make. Fixing it is trivially cheap; not fixing it can be catastrophic. Similarly, if you omit the healthcheck, orchestrators cannot distinguish a frozen process from a healthy one, and you end up routing traffic to dead pods.

<Warning>
  **Caveats & Common Pitfalls in Dockerfile Construction**

  * **Running as root.** The default `FROM node:20` or `FROM python:3.12` runs as UID 0. A code execution bug in the app now runs as root inside the container — and with privilege escalation CVEs (e.g., CVE-2022-0185, CVE-2024-21626) can escape to the host as root.
  * **Base images with known CVEs.** Pulling `node:20` once and never rebuilding means you inherit every CVE disclosed in the base image since then. Some of these are critical (glibc, OpenSSL, libxml2).
  * **No `HEALTHCHECK`.** Orchestrators cannot tell a zombie process from a working one. Traffic routes to frozen pods; users see timeouts.
  * **`latest` tag in production.** `node:latest` today is not `node:latest` tomorrow — a silent base-image change at 3am means your build is no longer reproducible and a broken upstream release breaks your deploy.
</Warning>

<Tip>
  **Solutions & Patterns**

  * **Always create and switch to a non-root user** with explicit UID/GID (1001 is conventional and non-privileged). This is a 3-line Dockerfile change and one of the highest-leverage security improvements you can make.
  * **Rebuild base images weekly** in CI (scheduled job) so newly disclosed CVEs get picked up via OS package updates. Pair with a Trivy / Grype scan that fails the build on CRITICAL findings.
  * **Pin base images by digest**, not tag: `FROM node:20-alpine@sha256:abc123...`. Reproducible builds, no silent upgrades. Bump the digest intentionally as part of a PR.
  * **Add a small, local `HEALTHCHECK`** that calls the app's `/health` over `localhost` — no external dependencies. Use `curl`/`wget` only if already present; otherwise use the language's stdlib HTTP client to avoid adding a binary.
  * Prefer **distroless** (`gcr.io/distroless/nodejs20-debian12`) or **Chainguard Images** for production — no shell, no package manager, minimal attack surface.
</Tip>

```dockerfile theme={null}
# Dockerfile - Basic
FROM node:20-alpine

WORKDIR /app

# Copy package files first (better caching)
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code
COPY . .

# Set environment
ENV NODE_ENV=production
ENV PORT=3000

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Start application
CMD ["node", "src/index.js"]
```

The same principles apply to a Python service. The layering is identical — manifest first, install, then source — but the mechanics differ. `requirements.txt` is your `package.json` equivalent, `pip install` is your `npm ci`, and you want `PYTHONUNBUFFERED=1` so stdout flushes immediately (critical for containerized logging). Setting `PYTHONDONTWRITEBYTECODE=1` prevents Python from writing `.pyc` files at runtime, which are useless in an ephemeral container and just bloat the writable layer.

```dockerfile theme={null}
# Dockerfile - Basic Python FastAPI
FROM python:3.12-alpine

WORKDIR /app

# Copy dependency manifest first (better caching)
COPY requirements.txt ./

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PORT=8000

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Expose port
EXPOSE 8000

# Health check (urllib is built in — no extra deps)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)" || exit 1

# Start application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Multi-Stage Build (Optimized)

The single biggest lever for reducing image size and improving security is multi-stage builds. The idea is deceptively simple: you define several independent build stages in one Dockerfile and copy only the artifacts you need from earlier stages into the final image. The build stage can contain compilers, test tools, source code, and devDependencies — none of which ship to production. The final stage contains only the runtime, the compiled output, and production dependencies.

Why does this matter in a microservices context? Because you're building dozens of images per day across many services. A 1GB image versus a 150MB image means 10x more pull time during autoscaling events, 10x more registry storage cost, and 10x more attack surface. The build tools you use at build time (TypeScript compiler, webpack, test frameworks) are often the source of CVEs — shipping them to production is gratuitous risk. Multi-stage builds also enforce a clean separation between "what you need to build the app" and "what you need to run it," which is a discipline that pays dividends as services grow.

One gotcha worth flagging: `dumb-init` (or `tini`) is installed as PID 1 inside the container. Without it, Node.js becomes PID 1, and it doesn't handle Unix signals the way init systems do — meaning `docker stop` sends SIGTERM, Node ignores it, and after 10 seconds Docker escalates to SIGKILL. Your in-flight requests are cut mid-response. With `dumb-init`, signals are forwarded correctly and graceful shutdown works.

<Warning>
  **Caveats & Common Pitfalls with Multi-Stage Builds**

  * **1GB+ images in production.** Typical symptoms: using `node:20` full image (not Alpine/slim), copying `devDependencies`, including the `.git` directory, or forgetting to prune dev deps. A 1GB image means 5-10x longer pod startup during autoscaling, 10x higher registry egress cost, and a much larger CVE surface.
  * **Multi-stage confusion.** People write a multi-stage Dockerfile that copies from the wrong stage, ends up with both dev and prod deps, or rebuilds dependencies in the final stage — defeating the whole point.
  * **`COPY --from=builder /app /app` copies everything including `.git`, test fixtures, `__pycache__`, and `node_modules/.cache`.** The final image still balloons.
  * **Shipping build tools (gcc, make, TypeScript compiler, webpack) to production.** Each is an attack-surface expansion and none are needed at runtime.
</Warning>

<Tip>
  **Solutions & Patterns**

  * **Multi-stage with explicit copies.** Name stages (`AS builder`, `AS production`) and copy specific paths, not the entire directory. For Node: `COPY --from=builder /app/dist /app/node_modules ./`. For Python: build wheels in one stage, `pip install --no-index --find-links=/wheels` in the final stage with no compiler in sight.
  * **Use `node:20-alpine` or `python:3.12-slim`** as the production base — roughly 50-150 MB vs. 350-900 MB for the full variants.
  * **Aggressive `.dockerignore`.** Exclude `.git`, `node_modules`, `__pycache__`, `.venv`, tests, docs, local `.env` files. This shrinks the build context and prevents accidental leaks (like `.env` with secrets getting baked in via `COPY . .`).
  * **Verify image size in CI.** Fail the build if the final image exceeds a budget (e.g., 200 MB for a Node service, 300 MB for a Python service with typical deps). Use `docker image inspect --format='{{.Size}}'` or `dive` for layer analysis.
  * For the smallest possible images, try **distroless** or **static-linked Go / Rust binaries on `scratch`** — sub-20 MB is achievable.
</Tip>

```dockerfile theme={null}
# Dockerfile - Multi-stage build
# ================================

# Stage 1: Dependencies
FROM node:20-alpine AS deps

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install ALL dependencies (including devDependencies for build)
RUN npm ci

# ================================

# Stage 2: Build
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy source code
COPY . .

# Build the application (TypeScript, bundling, etc.)
RUN npm run build

# Prune devDependencies
RUN npm prune --production

# ================================

# Stage 3: Production
FROM node:20-alpine AS production

WORKDIR /app

# Add labels for image metadata
LABEL org.opencontainers.image.source="https://github.com/myorg/order-service"
LABEL org.opencontainers.image.description="Order Service"
LABEL org.opencontainers.image.version="1.0.0"

# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Copy built application
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./

# Set environment
ENV NODE_ENV=production
ENV PORT=3000

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

# Use dumb-init as entrypoint
ENTRYPOINT ["dumb-init", "--"]

# Start application
CMD ["node", "dist/index.js"]
```

### TypeScript Specific Dockerfile

TypeScript adds an extra wrinkle: you need the compiler (`tsc`), type definitions, and your source files at build time, but none of them belong in production. The pattern below explicitly separates "build with types" from "run compiled JS." Running `npm run type-check && npm run build` ensures the image fails fast if there are type errors, catching problems in CI before they reach production. Forgetting the type-check step is a subtle mistake — `tsc` with `--noEmitOnError` still emits by default in many setups, meaning broken code can ship.

```dockerfile theme={null}
# TypeScript Service Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
COPY tsconfig*.json ./

RUN npm ci

COPY src ./src

# Type checking and build
RUN npm run type-check && npm run build

# Production stage
FROM node:20-alpine

WORKDIR /app

RUN apk add --no-cache dumb-init && \
    addgroup -S app && adduser -S app -G app

COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./

RUN npm ci --only=production && npm cache clean --force

USER app

ENV NODE_ENV=production

EXPOSE 3000

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
```

For Python microservices, the same multi-stage pattern applies, but the details differ. Python doesn't have a compilation step in the same way TypeScript does, but it does have wheel building (for packages with C extensions) which is expensive and requires build tools you don't want in production. The pattern below uses a builder stage to install dependencies into a virtualenv, then copies the virtualenv into a slim runtime image.

```dockerfile theme={null}
# Python FastAPI Service Dockerfile (multi-stage)
FROM python:3.12-alpine AS builder

WORKDIR /app

# Install build dependencies (only in builder stage)
RUN apk add --no-cache gcc musl-dev libffi-dev

# Create virtualenv
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.12-alpine

WORKDIR /app

RUN apk add --no-cache dumb-init && \
    addgroup -S app && adduser -S app -G app

# Copy virtualenv from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY --chown=app:app src ./src

USER app

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8000

ENTRYPOINT ["dumb-init", "--"]
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
```

### Python FastAPI Multi-Stage with Wheel Trick

For production FastAPI workloads, the "wheel trick" is a step beyond a plain virtualenv copy. Instead of installing packages directly into the virtualenv during build, you first build wheels (pre-compiled binary distributions) in the builder stage, then install them in the runtime stage. This separates compilation (which needs gcc, make, and headers) from installation (which only needs pip). The benefit: the runtime stage never sees a C compiler, and the wheel cache can be reused across multiple services that share dependencies. For a service with packages like `pydantic-core` (Rust-compiled), `uvloop` (C extension), or `asyncpg` (C extension), this shaves significant time and image size.

Production Python services typically run under `gunicorn` with `uvicorn` workers rather than `uvicorn` directly. `gunicorn` gives you battle-tested process management, graceful restarts, and worker lifecycle hooks; `uvicorn` provides the ASGI event loop that FastAPI needs. The combination — `gunicorn` as the supervisor, `uvicorn.workers.UvicornWorker` as the worker class — is the standard production pattern.

```dockerfile theme={null}
# Dockerfile - Production Python FastAPI (wheel trick)
# ================================

# Stage 1: Build wheels
FROM python:3.12-slim AS builder

WORKDIR /build

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    libffi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt ./

# Build wheels for all dependencies into /build/wheels
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# ================================

# Stage 2: Production runtime
FROM python:3.12-slim AS production

WORKDIR /app

# Add image metadata
LABEL org.opencontainers.image.source="https://github.com/myorg/order-service"
LABEL org.opencontainers.image.description="Order Service (FastAPI)"
LABEL org.opencontainers.image.version="1.0.0"

# Install only runtime deps (dumb-init for signal handling)
RUN apt-get update && apt-get install -y --no-install-recommends \
    dumb-init \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user with specific UID/GID
RUN groupadd -r -g 1001 appgroup && \
    useradd -r -u 1001 -g appgroup appuser

# Copy wheels from builder and install (no compiler needed)
COPY --from=builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

# Copy application with correct ownership
COPY --chown=appuser:appgroup src ./src
COPY --chown=appuser:appgroup gunicorn_conf.py ./

# Runtime environment
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PORT=8000
ENV WORKERS=4

USER appuser

EXPOSE 8000

# Health check using stdlib urllib (no curl/wget needed)
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"

ENTRYPOINT ["dumb-init", "--"]

# gunicorn with uvicorn workers is the production standard for FastAPI
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000", \
     "--access-logfile", "-", \
     "--error-logfile", "-", \
     "--timeout", "30", \
     "--graceful-timeout", "30", \
     "--keep-alive", "5"]
```

***

## Docker Optimization

### Layer Caching: The Invisible Performance Lever

Layer caching is one of those topics that feels academic until the first time you watch a junior engineer wait 12 minutes for a CI build that should take 90 seconds. The reason it matters in microservices specifically: you are not building one image — you are building dozens, across multiple pipelines, many times per day. A 10-minute cache-miss build multiplied by 30 services multiplied by 15 daily merges is a full-time engineer's worth of wall-clock time waiting on CI every week. That's real money, and it's recovered by getting the layer order right.

What goes wrong without proper caching? CI pipelines become slow enough that developers batch changes instead of pushing small commits (because "builds are slow, I'll wait"). Feedback loops lengthen from minutes to hours. Developers lose the flow state of "commit, see test result, iterate." The downstream cost is not just raw time — it's degraded engineering culture. The tradeoff to know: aggressive caching can hide transitive dependency issues (a bumped lockfile that wasn't actually installed because the layer was cached against stale content), so scheduled `--no-cache` rebuilds are a hygiene practice worth scripting.

### Layer Caching Strategy

Docker layer caching is one of those topics that feels academic until your CI pipeline takes 12 minutes instead of 90 seconds — then it becomes urgent. Each instruction in a Dockerfile creates a layer, and Docker caches each layer based on the instruction plus the files it depends on. If nothing above a layer has changed, Docker reuses the cached layer instead of re-executing the instruction. The implication: put the instructions that change rarely (base image, system packages) at the top, and the instructions that change frequently (source code) at the bottom.

The classic mistake is copying the entire project with `COPY . .` before running `npm ci`. Every time any source file changes — even a comment in a README — the cache invalidates for the `npm ci` layer and you reinstall all dependencies from scratch. On a large project with 500 dependencies, that's 2-3 minutes wasted on every single build. By splitting `COPY package*.json ./` and `RUN npm ci` into their own earlier layers, dependency installation is cached until `package.json` itself changes. This single reordering often cuts CI time by 80%.

```dockerfile theme={null}
# Optimize layer caching
# ================================

# Base image (rarely changes)
FROM node:20-alpine

WORKDIR /app

# 1. System dependencies (rarely change)
RUN apk add --no-cache dumb-init

# 2. Package files (change sometimes)
COPY package*.json ./

# 3. Dependencies (change when package.json changes)
RUN npm ci --only=production

# 4. Application code (changes frequently)
COPY . .

# Order matters: least changing layers first!
```

### Reducing Image Size

Image size is not just an aesthetic concern. In Kubernetes, a larger image means slower pod startup (more time to pull from the registry), which means slower autoscaling response to traffic spikes. It also means more egress bandwidth cost from your registry and more storage cost. For organizations running hundreds of services, a 500MB reduction per image compounds into meaningful savings. The four levers are: minimal base images, cleanup in the same layer, `.dockerignore`, and multi-stage builds.

The "same layer" rule is subtle: each `RUN` command creates a layer, and deleting files in a later layer doesn't remove them from the earlier layer. If you download a 200MB tarball in one `RUN` and delete it in the next `RUN`, the tarball still lives in your image, just hidden. That's why you see those long `&&` chains in production Dockerfiles — everything that creates and deletes transient files must happen within a single layer.

```dockerfile theme={null}
# Size optimization techniques

# 1. Use slim base images
FROM node:20-alpine  # ~50MB vs node:20 ~350MB

# 2. Clean up in same layer
RUN npm ci --only=production && \
    npm cache clean --force && \
    rm -rf /tmp/*

# 3. Use .dockerignore
# .dockerignore file:
# node_modules
# npm-debug.log
# Dockerfile*
# docker-compose*
# .git
# .gitignore
# .env*
# *.md
# tests
# coverage
# .nyc_output

# 4. Multi-stage builds (shown above)

# 5. Don't install dev dependencies in production
RUN npm ci --only=production
```

### .dockerignore

The `.dockerignore` file is the most underappreciated tool in the Docker ecosystem. It works like `.gitignore` but for the Docker build context — the set of files Docker sends to the daemon when you run `docker build`. Without a `.dockerignore`, every build ships your entire `.git` directory, local `node_modules`, `.env` files (which may contain secrets), test fixtures, and documentation to the daemon. This bloats the build context, slows down builds, and risks baking secrets into images. The fix takes five minutes; the cost of skipping it can be much higher.

A particularly nasty scenario: a developer has `.env` in their local directory with production database credentials for debugging. They run `docker build` without a `.dockerignore`. The `.env` file is included in the context, and a later `COPY . .` baked it into the image. The image is pushed to a public registry. The credentials leak. This has happened to real companies.

```text theme={null}
# .dockerignore
node_modules
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Python
__pycache__
*.pyc
*.pyo
*.egg-info
.pytest_cache
.mypy_cache
.ruff_cache
.venv
venv

# Git
.git
.gitignore
.gitattributes

# Docker
Dockerfile*
docker-compose*
.docker

# IDE
.idea
.vscode
*.swp
*.swo

# Test
tests
__tests__
coverage
.nyc_output
jest.config.*

# Docs
*.md
docs

# Environment files
.env*
!.env.example

# Build artifacts
dist
build

# Misc
.DS_Store
Thumbs.db
```

***

## Docker Compose for Microservices

### Why Compose Exists and What It Saves You

Compose is the answer to a very specific question: how do you bring up an entire microservices topology on a developer laptop without losing your mind? In production you have Kubernetes doing this job, but Kubernetes on a laptop (minikube, kind, k3d) is heavy — it adds a layer of etcd, a scheduler, a control plane, and requires you to understand Pods, Deployments, Services, and Ingresses just to see logs. Compose strips all of that away and gives you "a YAML file that starts all the containers." For 99% of local development, that's exactly right.

What goes wrong without Compose? Teams invent fragile shell scripts that run `docker run` for each service, with hardcoded IP addresses, race conditions on startup order, and broken cleanup when one service crashes. New developers spend their first week fighting "the setup" instead of writing code. With Compose, `docker-compose up` brings everything online, `docker-compose down` tears it all down cleanly, and the YAML file lives in version control as executable documentation of how services fit together. The tradeoff: Compose is not production-grade. Don't try to run it on a server "just for staging" — you lose scheduling, self-healing, rolling updates, and every other feature that makes orchestration valuable. Use Compose for dev; use Kubernetes for anything shared.

Docker Compose is the pragmatic choice for local development in a microservices architecture. The problem it solves: you have 8 services, 3 databases, a message broker, and a cache — starting them individually with correct network configuration, environment variables, and startup order is tedious and error-prone. Compose lets you declare the entire topology in one YAML file and spin it up with `docker-compose up`. It's not production-grade (that's Kubernetes's job), but for local development and simple CI environments, it's dramatically simpler than Kubernetes.

The key capabilities Compose provides: a shared network where services can reach each other by service name (no IP addresses to hardcode), volume management for persistent data, dependency ordering via `depends_on`, and healthchecks to ensure dependencies are actually ready (not just started). The `depends_on` with `condition: service_healthy` pattern is particularly important — without it, services start before their databases are ready and fail on the first query. With it, Compose waits for the database's healthcheck to pass before starting the dependent service.

If you try to replicate this setup without Compose (running each service manually with `docker run`), you'll spend more time fighting the tooling than writing code. If you try to use Kubernetes locally (via minikube or kind), you get production-like behavior but with much higher mental overhead. Compose hits the sweet spot for local development.

### Complete Microservices Stack

```yaml theme={null}
# docker-compose.yml
version: '3.8'

services:
  # ===================
  # API Gateway
  # ===================
  api-gateway:
    build:
      context: ./services/api-gateway
      dockerfile: Dockerfile
    ports:
      - "8080:3000"
    environment:
      - NODE_ENV=development
      - ORDER_SERVICE_URL=http://order-service:3000
      - PAYMENT_SERVICE_URL=http://payment-service:3000
      - INVENTORY_SERVICE_URL=http://inventory-service:3000
      - JWT_SECRET=${JWT_SECRET}
      - REDIS_URL=redis://redis:6379
    depends_on:
      - order-service
      - payment-service
      - inventory-service
      - redis
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # ===================
  # Order Service
  # ===================
  order-service:
    build:
      context: ./services/order-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/order-service:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@order-db:5432/orders
      - KAFKA_BROKERS=kafka:9092
      - REDIS_URL=redis://redis:6379
    depends_on:
      order-db:
        condition: service_healthy
      kafka:
        condition: service_healthy
    networks:
      - microservices

  order-db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=orders
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - order-db-data:/var/lib/postgresql/data
      - ./services/order-service/db/init.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - microservices
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Payment Service (Python FastAPI)
  # ===================
  payment-service:
    build:
      context: ./services/payment-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/payment-service:/app
      # Keep container's virtualenv isolated from host bind mount
      - /app/.venv
    environment:
      - PYTHONUNBUFFERED=1
      - PYTHONDONTWRITEBYTECODE=1
      - MONGODB_URI=mongodb://payment-db:27017/payments
      - STRIPE_API_KEY=${STRIPE_API_KEY}
      - KAFKA_BROKERS=kafka:9092
      - LOG_LEVEL=debug
    ports:
      - "8001:8000"
    depends_on:
      payment-db:
        condition: service_healthy
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  payment-db:
    image: mongo:6
    volumes:
      - payment-db-data:/data/db
    networks:
      - microservices
    healthcheck:
      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Inventory Service
  # ===================
  inventory-service:
    build:
      context: ./services/inventory-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/inventory-service:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@inventory-db:5432/inventory
      - REDIS_URL=redis://redis:6379
    depends_on:
      inventory-db:
        condition: service_healthy
    networks:
      - microservices

  inventory-db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=inventory
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - inventory-db-data:/var/lib/postgresql/data
    networks:
      - microservices
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Infrastructure
  # ===================
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - kafka-data:/var/lib/kafka/data
    networks:
      - microservices
    healthcheck:
      test: kafka-topics --bootstrap-server localhost:9092 --list
      interval: 30s
      timeout: 10s
      retries: 5

  # ===================
  # Observability
  # ===================
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./infrastructure/prometheus:/etc/prometheus
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    networks:
      - microservices

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
      - ./infrastructure/grafana/provisioning:/etc/grafana/provisioning
    networks:
      - microservices

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
      - "4317:4317"
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - microservices

networks:
  microservices:
    driver: bridge

volumes:
  order-db-data:
  payment-db-data:
  inventory-db-data:
  redis-data:
  kafka-data:
  prometheus-data:
  grafana-data:
```

### Development Dockerfile with Hot Reload

Developer ergonomics matter. If every code change requires rebuilding the container (30-60 seconds), developers will disengage from containerized workflows and go back to running services directly on their laptops — which reintroduces all the "works on my machine" problems. The solution is a development target in your Dockerfile that mounts source code as a volume and runs with a file watcher (`nodemon` for Node, `uvicorn --reload` for Python). Changes to source files trigger automatic reloads without rebuilding the container.

The key insight is the `target` keyword in Compose. One Dockerfile, two targets: `development` (with file watching) and `production` (compiled, locked down). In CI/CD, you build the production target. In local development, Compose builds the development target. Same Dockerfile, same base image, same dependencies — only the final layer differs. This is leagues better than maintaining two separate Dockerfiles.

```dockerfile theme={null}
# Dockerfile with development and production targets (Node.js)
FROM node:20-alpine AS base

WORKDIR /app
RUN apk add --no-cache dumb-init

# Development stage
FROM base AS development

# Install all dependencies including devDependencies
COPY package*.json ./
RUN npm install

# Mount source code as volume in docker-compose
# No COPY needed - will use volume mount

ENV NODE_ENV=development

EXPOSE 3000

# Use nodemon for hot reload
CMD ["npx", "nodemon", "--watch", "src", "src/index.js"]

# Production stage
FROM base AS production

COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

COPY . .

RUN addgroup -S app && adduser -S app -G app
USER app

ENV NODE_ENV=production
EXPOSE 3000

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "src/index.js"]
```

The Python equivalent uses `uvicorn --reload` as the file watcher during development and `gunicorn` with uvicorn workers in production. Two important mechanics: in development we mount the source code as a volume and install all requirements (including dev tooling); in production we bake the code into the image and install only runtime dependencies. The `--reload` flag uses `watchfiles` under the hood to detect file changes and restart the ASGI server.

```dockerfile theme={null}
# Dockerfile with development and production targets (Python FastAPI)
FROM python:3.12-slim AS base

WORKDIR /app

# Runtime deps needed in both dev and prod
RUN apt-get update && apt-get install -y --no-install-recommends \
    dumb-init \
    && rm -rf /var/lib/apt/lists/*

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Development stage
FROM base AS development

# Install build deps (devs may need to reinstall packages with C extensions)
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    libffi-dev \
    && rm -rf /var/lib/apt/lists/*

# Install all deps including dev tooling (pytest, black, mypy, etc.)
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt -r requirements-dev.txt

# Source mounted as volume via docker-compose — no COPY needed

EXPOSE 8000

# uvicorn --reload watches filesystem and restarts on change
CMD ["uvicorn", "src.main:app", \
     "--host", "0.0.0.0", \
     "--port", "8000", \
     "--reload", \
     "--reload-dir", "/app/src"]

# Production stage
FROM base AS production

# Build stage for wheels
FROM python:3.12-slim AS prod-builder

WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc g++ libffi-dev && rm -rf /var/lib/apt/lists/*

COPY requirements.txt ./
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# Final production image
FROM base AS prod-final

WORKDIR /app

RUN groupadd -r -g 1001 app && useradd -r -u 1001 -g app app

COPY --from=prod-builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

COPY --chown=app:app src ./src

USER app

EXPOSE 8000

ENTRYPOINT ["dumb-init", "--"]
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000"]
```

### Docker Compose Override for Development

`docker-compose.override.yml` is automatically merged with `docker-compose.yml` when you run `docker-compose up`. This lets you keep the base file production-like and layer in development-only settings (volume mounts, debug ports, hot reload commands) without duplicating the entire configuration. The same pattern works for environment-specific overrides: `docker-compose.staging.yml`, `docker-compose.production.yml`. Keep the base file as the source of truth and use overrides for the deltas.

```yaml theme={null}
# docker-compose.override.yml
version: '3.8'

# This file is automatically merged with docker-compose.yml
# for local development

services:
  order-service:
    build:
      target: development
    volumes:
      - ./services/order-service/src:/app/src
    command: npx nodemon --watch src src/index.js
    ports:
      - "3001:3000"
      - "9229:9229"  # Node debugger

  # Python FastAPI payment-service: mount source + enable reload
  payment-service:
    build:
      target: development
    volumes:
      - ./services/payment-service/src:/app/src
      # Shield the container's virtualenv from host bind mount
      - /app/.venv
    command: >
      uvicorn src.main:app
      --host 0.0.0.0
      --port 8000
      --reload
      --reload-dir /app/src
      --log-level debug
    ports:
      - "8001:8000"
      - "5678:5678"  # debugpy remote debugger

  inventory-service:
    build:
      target: development
    volumes:
      - ./services/inventory-service/src:/app/src
    command: npx nodemon --watch src src/index.js
    ports:
      - "3003:3000"
      - "9231:9229"
```

***

## Container Security

### Security Hardening: Why Defaults Are Dangerous

Docker's defaults were designed for ease of onboarding, not production security. Out of the box, your container runs as UID 0, has a writable root filesystem, retains most Linux capabilities, and binds to any network it can reach. Every one of those defaults is a liability in a microservice that may be exposed to untrusted input. The threat model is not theoretical: real production breaches in the last few years have started with a code-execution vulnerability in a service (deserialization bug, SSRF, prototype pollution) and escalated because the container defaults gave the attacker far more than they needed.

What goes wrong if you skip hardening? Best case, you pass a compliance audit with a stern finding and spend the next quarter doing it properly under pressure. Worst case, a single RCE in a dependency becomes a lateral movement vector into your node, your cloud metadata service, or your Kubernetes API. The fixes — non-root user, read-only filesystem, capability drops, resource limits — are cheap to add at build time and dramatically expensive to add after a breach. The tradeoff: hardened containers are slightly harder to debug live (you can't write files, you can't install diagnostic tools), which is why we pair them with observability, remote debuggers, and ephemeral debug pods rather than shell access.

Container security is an area where defaults betray you. Out of the box, a container runs as root, can modify its own filesystem, has all Linux capabilities, and can read/write anywhere in the image. Every one of those defaults is wrong for a production microservice. The threat model is straightforward: if an attacker achieves remote code execution in your container (via SQL injection, deserialization bug, or dependency vulnerability), you want the blast radius to be as small as possible. Security hardening is about shrinking that blast radius.

The hardening below combines several techniques: non-root user (so the process cannot modify OS files), read-only filesystem (so the attacker cannot drop a malicious binary), dropped capabilities (so the process cannot perform privileged operations like raw socket access), and careful file permissions (so even if the attacker runs as `appuser`, they cannot modify application code). Each technique individually is cheap; combined, they dramatically raise the cost of exploitation.

The tradeoff is that hardened containers are slightly harder to debug. You cannot `exec` in and write diagnostic files if the filesystem is read-only. The fix is to mount a writable `tmpfs` at `/tmp` for transient files, and use remote debugging tools (kubectl port-forward to a debug endpoint) rather than shell access for production troubleshooting.

<Warning>
  **Caveats & Common Pitfalls in Container Runtime**

  * **No resource limits = noisy neighbor problem.** A pod with no memory limit can consume all node memory; Kubernetes has to evict other pods on the same node to recover. A CPU-unbounded pod starves every other container on the node. One buggy service takes down five innocent ones.
  * **`privileged: true` or `hostNetwork: true` in production.** Gives the container full root on the host network stack. Any RCE becomes a host compromise.
  * **Writable root filesystem.** Attacker with code execution can drop a malicious binary into `/usr/local/bin/` and persist across restarts. Read-only root blocks this entirely.
  * **Docker socket mounted into a pod** (`/var/run/docker.sock`). Container escape is trivial — the pod can now start arbitrary containers on the host with any privileges it wants. This pattern shows up in CI runners and "ops helper" pods and is how several cluster-wide compromises happened.
</Warning>

<Tip>
  **Solutions & Patterns**

  * **Always set `requests` and `limits` for CPU and memory.** Base `requests` on actual p90 usage from production metrics; set `limits` to roughly 1.5-2x `requests` to absorb spikes. For memory, `requests == limits` is a common pattern (guaranteed QoS class) for latency-sensitive services.
  * **`readOnlyRootFilesystem: true` + writable `tmpfs` at `/tmp`.** Pair with `allowPrivilegeEscalation: false` and `capabilities.drop: ["ALL"]` in the pod `securityContext`.
  * **Use Kubernetes `PodSecurity` admission** with the `restricted` profile enforced at the namespace level — this structurally denies privileged pods, host network, host PID, and other high-risk configurations.
  * **Never mount the Docker socket** into application pods. If a pod genuinely needs to manage containers (rare — usually only CI runners), use rootless Docker or Buildah / Kaniko instead.
  * **Use `tools.sigstore.dev/cosign`** or similar to sign images and admit only signed images into the cluster (via OPA Gatekeeper / Kyverno policy).
</Tip>

<AccordionGroup>
  <Accordion title="Scenario: Your container keeps getting OOMKilled in production with 2 GB memory limit, but runs fine locally at 512 MB. How do you diagnose and fix?">
    **Strong Answer Framework:**

    1. **Confirm it is actually an OOMKill, not just a crash.** `kubectl describe pod` shows `Last State: Terminated, Reason: OOMKilled, Exit Code: 137`. `dmesg` on the node shows the kernel OOM-killer event. If the exit code is 137 with OOMKilled reason, you have confirmed the memory-limit hit.
    2. **Understand why staging differs from production.** Staging typically sees low concurrent load. Production has 100-1000x more concurrent requests, each allocating request/response objects, DB result sets, and connection buffers. Memory grows linearly with concurrency — 512 MB fine at 10 RPS becomes 2+ GB at 1000 RPS.
    3. **Runtime-specific root causes:**
       * *Node.js:* V8's default heap limit is \~1.5 GB regardless of container memory. Plus off-heap: Buffers, native bindings, worker threads. Fix: `NODE_OPTIONS=--max-old-space-size=1536` for a 2 GB container, so V8 leaves room for Buffers and OS overhead.
       * *Python:* Each uvicorn worker is a full process (\~100-300 MB baseline). `--workers 8` is 800 MB before your app allocates anything. Fix: reduce workers, or switch to threads/async if appropriate.
       * *JVM:* Before JDK 10, the JVM didn't respect cgroup memory limits. On modern JDKs, verify with `-XX:+PrintFlagsFinal | grep MaxHeapSize` inside the container.
    4. **Check for memory leaks.** Take two heap snapshots 10 minutes apart at steady load. Node: `v8.writeHeapSnapshot()` triggered by admin endpoint, analyzed in Chrome DevTools. Python: `tracemalloc` or `memray`. Growth of retained objects between snapshots is the leak.
    5. **Common leak culprits.** Event listeners never removed, global caches with no eviction, unreleased DB connections on error paths, closures capturing request objects. Fix each systematically.
    6. **Set the correct limits.** Set `requests.memory` to observed p90 usage (e.g., 512 Mi) and `limits.memory` to \~2x that (1 Gi). Alert on `container_memory_working_set_bytes / limit > 0.8` so you know before an OOMKill happens.

    **Real-World Example:** A 2018 Heroku engineering blog covered exactly this pattern — Node.js apps happily allocating past cgroup limits because V8 did not know about cgroups. The fix: set `--max-old-space-size` to 75% of the container limit. Every Node container in production should have this set explicitly.

    **Senior Follow-up Questions:**

    * *"How do you distinguish a memory leak from normal load-based growth?"* A leak keeps growing even at constant load; load-based growth plateaus at steady state. Plot `container_memory_working_set_bytes` over 24 hours with constant traffic — if it grows linearly, it is a leak. If it plateaus after warm-up, it is working-set size (fix with limits, not leak hunting).
    * *"What is the difference between RSS, working set, and heap?"* RSS is everything the process has in physical RAM (heap + stack + shared libraries + file mappings). Working set is RSS minus cold/reclaimable pages — this is what Kubernetes compares against `limits.memory`. Heap is just the V8/Python/JVM managed allocator — a subset of RSS. OOMKiller uses working set, so heap-only profiling misses things like Buffers, native memory, and mmap'd files.
    * *"How would you design a canary deployment to catch memory issues before full rollout?"* Deploy to 5% of traffic, watch `container_memory_working_set_bytes` slope over 30 minutes, auto-rollback if slope exceeds a threshold (e.g., 10 MB/min growth at steady load). Argo Rollouts or Flagger can do this declaratively with Prometheus queries as success criteria.

    **Common Wrong Answers:**

    * "Just increase the memory limit to 4 GB." Works until next quarter when it OOMs again at 4 GB. You have deferred the problem and doubled the infrastructure cost.
    * "Restart the pod every hour via a liveness probe hack." Masks the leak, causes request drops during restarts, and leaves the actual bug in place.

    **Further Reading:**

    * Heroku blog: ["Node.js memory limits in containers"](https://devcenter.heroku.com/articles/node-memory-use)
    * Kubernetes docs: [Resource management for Pods and Containers](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)
    * Python: [`memray`](https://github.com/bloomberg/memray) by Bloomberg for production-grade memory profiling
  </Accordion>

  <Accordion title="Scenario: A security scan flags 40 HIGH/CRITICAL CVEs in your production Node.js base image, but rebuilding hasn't happened in 6 months. How do you handle this without breaking production?">
    **Strong Answer Framework:**

    1. **Triage which CVEs are actually exploitable in your context.** Not every CVE in a base image is a real risk. `libxml2` CVE in an image that never parses XML is a paper tiger. Use Grype or Trivy with `--ignore-unfixed` and review the remaining list; cross-reference each HIGH/CRITICAL against actual usage.
    2. **Rebuild first, upgrade second.** 80% of CVEs disappear just by rebuilding against the current base image tag (since the upstream image has been patched). `docker pull node:20-alpine` and rebuild — many findings evaporate without any deliberate upgrade.
    3. **Pin by digest after rebuild.** Now that you have a fresh, patched image, pin the digest so the build is reproducible: `FROM node:20-alpine@sha256:...`. Future builds will not silently pick up new changes.
    4. **Canary the new image.** Deploy to 5-10% of traffic for 30-60 minutes, watch RED metrics (Rate, Errors, Duration) plus CPU/memory, then roll forward. Unexpected base-image regressions (e.g., a glibc change breaking `dns.lookup()`) have happened — always canary.
    5. **Automate from here on.** Weekly scheduled rebuild + scan + canary deploy. Dependabot / Renovate watches the base image and opens PRs when upstream updates. The goal: no image in production is more than 14 days behind its base.
    6. **Policy at admission.** OPA Gatekeeper / Kyverno rule: no image older than 30 days can be admitted to the cluster. Combined with signed images, this enforces hygiene structurally.

    **Real-World Example:** The 2021 `log4j` (CVE-2021-44228) incident was a textbook case — images that had not been rebuilt in months were exposed for weeks longer than necessary because teams did not have an automated rebuild pipeline. Teams with weekly rebuilds and Dependabot on base images patched within 24 hours; teams without it took 2-4 weeks.

    **Senior Follow-up Questions:**

    * *"What do you do when a CVE has no fix yet (0-day)?"* Three layers: (1) WAF / rate-limit rules that block the exploit pattern, (2) network-level segmentation so the vulnerable service has minimal lateral reach, (3) temporary feature flag to disable the vulnerable code path if it is isolatable. Accept residual risk and monitor for indicators of compromise until a patch ships.
    * *"How do you prevent alert fatigue from CVE scans?"* Scan policy: fail build only on CRITICAL and HIGH with `fix-available: true`. Warn (do not fail) on MEDIUM. Ignore LOW. Pair with a monthly triage where an engineer reviews the warning backlog so things do not silently rot.
    * *"What about CVEs in the Node.js or Python packages themselves, not just OS packages?"* Dependabot / Renovate / Snyk for `package.json` and `requirements.txt` / `pyproject.toml`. Prefer `npm audit --production` / `pip-audit` in CI. Lock the lockfile and use `npm ci` / `pip install -r requirements.txt --require-hashes` to make the build reproducible and tamper-evident.

    **Common Wrong Answers:**

    * "We'll wait until the next release cycle to patch." Not acceptable for HIGH/CRITICAL — you are leaving a known exploitable window open. Emergency patches get fast-tracked outside the regular cycle.
    * "Ignore CVEs in base image and patch only app dependencies." Base image CVEs are real — `glibc`, `openssl`, `zlib` bugs affect everyone regardless of app code.

    **Further Reading:**

    * Trivy: [Aqua Security Trivy docs](https://aquasecurity.github.io/trivy/)
    * NIST: [CVSS v3.1 scoring](https://www.first.org/cvss/v3.1/specification-document) — understand what HIGH vs. CRITICAL actually means
    * Chainguard: ["Why minimal container images"](https://www.chainguard.dev/unchained/minimal-container-images-starting-with-the-essentials)
  </Accordion>

  <Accordion title="Scenario: You inherited a fleet of 30 services where every container runs as root and no pod has resource limits. How do you migrate to hardened containers without causing incidents?">
    **Strong Answer Framework:**

    1. **Measure before changing.** Collect 2 weeks of `container_memory_working_set_bytes` and `container_cpu_usage_seconds_total` per service. This gives you p90/p99 usage to set reasonable initial `requests` and `limits` from data, not guesses.
    2. **Attack in waves by risk, not alphabetically.** Start with one low-traffic, non-critical internal service as the pilot. Get the hardening pattern right there, document it, then roll out in waves of 3-5 services. Never do 30 services at once — one subtle issue (e.g., `readOnlyRootFilesystem` breaking a library that writes to `/etc/hosts`) will take down everything.
    3. **Non-root first, by itself.** One PR per service that only changes: add `USER 1001`, add `runAsNonRoot: true` and `runAsUser: 1001` in pod spec. Deploy. Watch for 48 hours. Some services will break because they bind to port 80 (requires root) — switch to port 8080 + Kubernetes Service mapping 80→8080. Some will break because they write to `/var/log` — switch to stdout.
    4. **Resource limits next.** Set `requests` to p90 observed + 20% headroom, `limits` to 1.5x requests. Deploy canary, watch for OOMKills and CPU throttling (`container_cpu_cfs_throttled_periods_total` > 0 means your CPU limit is too tight). Tune based on data.
    5. **Read-only filesystem third.** This is the most likely to break something. Identify writable paths first: `strace -e openat` in a test pod to see what the app tries to write. Mount `emptyDir` at those paths (`/tmp`, `/var/cache`, language-specific dirs). Then enable `readOnlyRootFilesystem: true`.
    6. **Capability drop last.** Add `capabilities.drop: ["ALL"]`, then add back only what is needed (typically nothing; a load balancer may need `NET_BIND_SERVICE` for sub-1024 ports but you already moved off those).
    7. **Enforce structurally.** After all services are migrated, enable the Kubernetes `PodSecurity` admission `restricted` profile at the namespace level. Now any new workload is required to be hardened from day one — no regression possible.

    **Real-World Example:** Shopify published (around 2020-2021) a multi-year journey going from a permissive security posture to strict `restricted` pod security across thousands of services. Their key insight: per-service migration with metrics-based canaries took quarters, not weeks, but the gradual approach had zero incidents while the "big bang" approach at another org (they referenced anonymously) caused multi-hour outages.

    **Senior Follow-up Questions:**

    * *"How do you handle services that genuinely need privileged operations, like a logging sidecar?"* Narrow-scoped capabilities. `CAP_DAC_READ_SEARCH` for a log tailer that reads files owned by other users, not `privileged: true`. Document the capability and why, so future reviewers understand. For truly privileged workloads (node-problem-detector, storage drivers), they get their own namespace with relaxed policy — not the application namespace.
    * *"What about legacy services that cannot be changed (vendor container, no source)?"* Run them in a dedicated namespace with relaxed `PodSecurity`. Isolate via NetworkPolicy. Put a service mesh gateway in front. If the vendor's container is a persistent security debt, that becomes a procurement signal to replace the vendor.
    * *"How do you validate the migration actually improved security, not just moved things around?"* Run a before/after pen-test or automated tool like `kube-bench` and `kube-hunter`. Number of findings should drop dramatically. Also track: % of pods running as non-root (target: 100% in app namespaces), % with resource limits (target: 100%), % with read-only root FS (target: >90%).

    **Common Wrong Answers:**

    * "Roll out hardening via a single admission policy change." Breaks everything that was not migrated, incidents everywhere.
    * "Only harden new services going forward." The 30 existing services remain vulnerable indefinitely. Security debt is real debt with interest.

    **Further Reading:**

    * Kubernetes: [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/) and [Pod Security Admission](https://kubernetes.io/docs/concepts/security/pod-security-admission/)
    * CIS Kubernetes Benchmark — authoritative hardening checklist
    * Shopify Engineering: [Building resilient GraphQL APIs](https://shopify.engineering/) — various posts on their reliability/security journey
  </Accordion>
</AccordionGroup>

### Security Best Practices

```dockerfile theme={null}
# Security-hardened Dockerfile
FROM node:20-alpine AS builder

# Update and patch base image
RUN apk update && apk upgrade

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Production stage with security hardening
FROM node:20-alpine

# Security updates
RUN apk update && \
    apk upgrade && \
    apk add --no-cache dumb-init && \
    rm -rf /var/cache/apk/*

WORKDIR /app

# Create non-root user with specific UID/GID
RUN addgroup -S -g 1001 appgroup && \
    adduser -S -u 1001 -G appgroup appuser

# Copy with correct ownership
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app .

# Remove unnecessary files
RUN rm -rf .git .gitignore .env* Dockerfile* docker-compose* && \
    chmod -R 500 /app && \
    chmod -R 400 /app/node_modules

# Set security environment variables
ENV NODE_ENV=production
ENV NPM_CONFIG_LOGLEVEL=warn

# Read-only root filesystem support
# (use with --read-only in docker run)
RUN mkdir -p /tmp && chown appuser:appgroup /tmp

# Switch to non-root user
USER appuser

# No capabilities needed
# Use with: docker run --cap-drop=ALL

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node healthcheck.js

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
```

The Python equivalent applies the same hardening principles: minimal base image, non-root user with explicit UID/GID, restrictive file permissions, and read-only filesystem compatibility. One Python-specific note: ensure `__pycache__` directories don't cause issues with a read-only filesystem by setting `PYTHONDONTWRITEBYTECODE=1` — otherwise Python tries to write `.pyc` files at import time and crashes on the read-only mount.

```dockerfile theme={null}
# Security-hardened Python FastAPI Dockerfile
FROM python:3.12-slim AS builder

RUN apt-get update && apt-get upgrade -y && \
    apt-get install -y --no-install-recommends gcc g++ libffi-dev && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY requirements.txt ./
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# Production stage
FROM python:3.12-slim

# Security updates
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends dumb-init && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/*

WORKDIR /app

# Non-root user with specific UID/GID
RUN groupadd -r -g 1001 appgroup && \
    useradd -r -u 1001 -g appgroup appuser

# Install from wheels (no compilers in final image)
COPY --from=builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

# Copy app with correct ownership
COPY --chown=appuser:appgroup src ./src

# Remove anything sensitive that may have snuck into context
RUN find /app -name '.env*' -delete && \
    find /app -name '*.pyc' -delete && \
    find /app -name '__pycache__' -type d -exec rm -rf {} + 2>/dev/null || true && \
    chmod -R 500 /app/src

# Required for read-only filesystem compatibility
RUN mkdir -p /tmp && chown appuser:appgroup /tmp

# Security / runtime env
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PIP_NO_CACHE_DIR=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1

USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"

ENTRYPOINT ["dumb-init", "--"]
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000"]
```

### Healthcheck Script: What It Actually Needs to Do

A healthcheck is a contract between your process and the container runtime (Docker, Kubernetes, ECS). It answers exactly one question: "Is this process able to respond right now?" It should not answer "Is the entire dependency graph healthy?" — that conflation is the single most common mistake in production healthchecks. If your liveness check pings the database, the cache, and three downstream APIs, then a 30-second blip in any one of them causes cascading container restarts across your fleet, which typically makes the outage worse, not better.

The right healthcheck is small, fast, and local. It calls the process's own `/health` endpoint, which returns 200 if the process's event loop is responsive. Deep dependency checks belong in a separate `/ready` endpoint that Kubernetes uses to decide whether to route traffic — a distinction we'll explore in the Kubernetes chapter. The healthcheck script itself should use the language's stdlib HTTP client (no `curl`, no extra binaries) so the image stays small and the check has no extra dependencies to break.

### Healthcheck Script

The healthcheck script is what the container runtime calls periodically to ask "are you alive?" A well-written healthcheck is minimal, fast, and does not require additional binaries (which would bloat the image). Using the language's built-in HTTP client is the standard pattern — no `curl`, no `wget`. The healthcheck should call the app's own `/health` endpoint, which in turn should check basic liveness (not deep dependencies — those belong in a separate readiness check).

A common mistake is making the healthcheck too ambitious: checking the database, the cache, and three downstream services. When any one of those is briefly unavailable, your container is marked unhealthy and restarted, even though the app itself is fine. Liveness checks should verify "can this process respond to HTTP?" — nothing more. Dependency health belongs in readiness checks, which we'll cover in the Kubernetes chapter.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // healthcheck.js
    const http = require('http');

    const options = {
      hostname: 'localhost',
      port: process.env.PORT || 3000,
      path: '/health',
      method: 'GET',
      timeout: 5000
    };

    const req = http.request(options, (res) => {
      process.exit(res.statusCode === 200 ? 0 : 1);
    });

    req.on('error', () => {
      process.exit(1);
    });

    req.on('timeout', () => {
      req.destroy();
      process.exit(1);
    });

    req.end();
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # healthcheck.py
    """
    Minimal healthcheck: uses only Python stdlib (urllib) so no extra
    dependencies are required in the image. Exit code 0 => healthy,
    non-zero => unhealthy (Docker / Kubernetes interpret accordingly).
    """
    import os
    import sys
    import urllib.request
    import urllib.error

    def main() -> None:
        port = os.environ.get("PORT", "8000")
        path = os.environ.get("HEALTHCHECK_PATH", "/health")
        url = f"http://localhost:{port}{path}"
        timeout = float(os.environ.get("HEALTHCHECK_TIMEOUT", "5"))

        req = urllib.request.Request(url, method="GET")
        try:
            with urllib.request.urlopen(req, timeout=timeout) as response:
                sys.exit(0 if 200 <= response.status < 300 else 1)
        except (urllib.error.URLError, TimeoutError, ConnectionError, OSError):
            # Any network/connection error => unhealthy
            sys.exit(1)

    if __name__ == "__main__":
        main()
    ```
  </Tab>
</Tabs>

### Security Scanning

Automated vulnerability scanning in CI is non-negotiable for any serious microservices deployment. The attack surface of a modern Node.js or Python service is primarily the transitive dependency tree — hundreds of third-party packages, any of which could have a known CVE. Trivy, Snyk, and Grype are the common tools; they compare your image's installed packages against public vulnerability databases and fail the build if any HIGH or CRITICAL vulnerabilities are found. Running this check weekly (via scheduled rebuilds) catches newly disclosed CVEs that weren't known when you built the image originally.

The pitfall with automated scanning is alert fatigue. If your policy is "fail on any vulnerability," you'll soon have a backlog of findings that nobody addresses. A pragmatic policy: fail on CRITICAL and HIGH in production code paths, warn on MEDIUM, ignore LOW. Pair this with a scheduled rebuild job that rebuilds images weekly against the latest base image, so you automatically pick up OS-level patches.

```yaml theme={null}
# .github/workflows/docker-security.yml
name: Docker Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Run Hadolint
        uses: hadolint/hadolint-action@v3.1.0
        with:
          dockerfile: Dockerfile
```

***

## Programmatic Docker Management

Sometimes you need to manage containers from your own scripts or services — building images in CI runners, running integration tests against real databases in ephemeral containers, or managing developer sandbox environments. Both Node.js and Python have official SDKs that wrap the Docker Engine API. The pattern below shows a typical use case: programmatically building and running a container for integration testing.

The key consideration: these scripts must handle cleanup carefully. Containers started by scripts that crash mid-execution will stick around forever, consuming disk space and port allocations. Always wrap container lifecycle in `try/finally` (or `async with` in Python) to guarantee cleanup even on failure.

<Tabs>
  <Tab title="Node.js">
    ```javascript theme={null}
    // scripts/run-integration-test.js
    const Docker = require('dockerode');
    const docker = new Docker();

    async function runIntegrationTest() {
      // Pull the image
      await new Promise((resolve, reject) => {
        docker.pull('postgres:15-alpine', (err, stream) => {
          if (err) return reject(err);
          docker.modem.followProgress(stream, (err) => err ? reject(err) : resolve());
        });
      });

      // Start a test database
      const container = await docker.createContainer({
        Image: 'postgres:15-alpine',
        Env: [
          'POSTGRES_PASSWORD=test',
          'POSTGRES_DB=testdb'
        ],
        HostConfig: {
          PortBindings: { '5432/tcp': [{ HostPort: '5433' }] },
          AutoRemove: true
        }
      });

      try {
        await container.start();
        console.log('Test DB started on port 5433');

        // Run your tests here against localhost:5433
        await waitForPostgres();
        // ... test code ...
      } finally {
        await container.stop();
      }
    }

    runIntegrationTest().catch(console.error);
    ```
  </Tab>

  <Tab title="Python">
    ```python theme={null}
    # scripts/run_integration_test.py
    """
    Programmatic container management using the official Python Docker SDK.
    Uses a context manager so the Postgres container is always cleaned up,
    even if tests raise an exception mid-run.
    """
    import time
    import socket
    from contextlib import contextmanager
    from typing import Iterator

    import docker
    from docker.models.containers import Container
    from docker.errors import NotFound, APIError

    client = docker.from_env()

    def _wait_for_port(host: str, port: int, timeout: float = 30.0) -> None:
        """Poll until a TCP port accepts connections, or timeout."""
        deadline = time.monotonic() + timeout
        while time.monotonic() < deadline:
            try:
                with socket.create_connection((host, port), timeout=1.0):
                    return
            except OSError:
                time.sleep(0.5)
        raise TimeoutError(f"{host}:{port} not ready within {timeout}s")

    @contextmanager
    def test_database(
        image: str = "postgres:15-alpine",
        host_port: int = 5433,
        password: str = "test",
        db_name: str = "testdb",
    ) -> Iterator[Container]:
        """Spin up an ephemeral Postgres container for the duration of tests."""
        # Pull image (noop if already local)
        client.images.pull(image)

        container = client.containers.run(
            image,
            environment={
                "POSTGRES_PASSWORD": password,
                "POSTGRES_DB": db_name,
            },
            ports={"5432/tcp": host_port},
            detach=True,
            auto_remove=True,
            name=f"pytest-pg-{int(time.time())}",
        )
        try:
            _wait_for_port("localhost", host_port, timeout=30)
            print(f"Test DB ready on localhost:{host_port}")
            yield container
        finally:
            try:
                container.stop(timeout=5)
            except (NotFound, APIError):
                pass  # auto_remove may have already cleaned it up

    def build_service_image(context: str, tag: str) -> str:
        """Build a service image from a Dockerfile context and return its ID."""
        image, build_logs = client.images.build(
            path=context,
            tag=tag,
            rm=True,          # remove intermediate containers
            forcerm=True,     # always remove intermediate even on failure
            pull=True,        # always pull newer base image
        )
        for chunk in build_logs:
            if "stream" in chunk:
                print(chunk["stream"], end="")
        return image.id

    def list_running_services(label: str = "app=microservices") -> list[dict]:
        """Inspect all running containers matching a label selector."""
        containers = client.containers.list(filters={"label": label})
        return [
            {
                "name": c.name,
                "image": c.image.tags[0] if c.image.tags else c.image.id,
                "status": c.status,
                "ports": c.ports,
            }
            for c in containers
        ]

    def run_integration_test() -> None:
        with test_database() as pg:
            # Run your tests here against localhost:5433
            # e.g. pytest.main(["-xvs", "tests/integration"])
            print(f"Container {pg.short_id} serving tests")

    if __name__ == "__main__":
        run_integration_test()
    ```
  </Tab>
</Tabs>

***

## Docker Commands Reference

```bash theme={null}
# Build commands
docker build -t order-service:1.0.0 .
docker build -t order-service:1.0.0 --target production .
docker build --no-cache -t order-service:1.0.0 .

# Run commands
docker run -d --name order -p 3000:3000 order-service:1.0.0
docker run -d --read-only --cap-drop=ALL -p 3000:3000 order-service:1.0.0
docker run --rm -it order-service:1.0.0 /bin/sh

# Compose commands
docker-compose up -d
docker-compose up --build
docker-compose down
docker-compose down -v  # Remove volumes
docker-compose logs -f order-service
docker-compose exec order-service /bin/sh

# Debug commands
docker logs -f order-service
docker exec -it order-service /bin/sh
docker inspect order-service
docker stats

# Cleanup
docker system prune -a
docker volume prune
docker image prune -a
```

***

## Interview Questions

<AccordionGroup>
  <Accordion title="Q1: What is multi-stage build and why use it?">
    **Answer:**

    **Multi-stage build** uses multiple `FROM` statements to create intermediate images.

    **Benefits:**

    * Smaller final images (only runtime dependencies)
    * Separate build and runtime environments
    * Don't expose build tools in production
    * Better security (fewer attack vectors)

    **Example:**

    ```dockerfile theme={null}
    FROM node:20 AS builder
    RUN npm ci && npm run build

    FROM node:20-alpine AS production
    COPY --from=builder /app/dist ./dist
    COPY --from=builder /app/node_modules ./node_modules
    ```

    **Typical reduction:** 500MB → 100MB
  </Accordion>

  <Accordion title="Q2: How do you optimize Docker layer caching?">
    **Answer:**

    **Key principles:**

    1. Order instructions by change frequency (least → most)
    2. Copy dependency files before source code
    3. Combine RUN commands to reduce layers

    **Example:**

    ```dockerfile theme={null}
    # Good: package.json rarely changes
    COPY package*.json ./
    RUN npm ci

    # Then source code (changes often)
    COPY . .
    ```

    **Tips:**

    * Use `.dockerignore` to exclude unnecessary files
    * Pin versions to ensure consistent builds
    * Use `--no-cache` only when needed
  </Accordion>

  <Accordion title="Q3: What are Docker security best practices?">
    **Answer:**

    1. **Don't run as root**
       ```dockerfile theme={null}
       RUN adduser -S app
       USER app
       ```

    2. **Use minimal base images**
       * Alpine, distroless, slim variants

    3. **Scan for vulnerabilities**
       * Trivy, Snyk, Clair

    4. **Drop capabilities**
       ```bash theme={null}
       docker run --cap-drop=ALL
       ```

    5. **Read-only filesystem**
       ```bash theme={null}
       docker run --read-only
       ```

    6. **No secrets in images**
       * Use environment variables or secrets managers

    7. **Keep images updated**
       * Regularly rebuild with patched base images
  </Accordion>
</AccordionGroup>

***

## Summary

<CardGroup cols={2}>
  <Card title="Key Takeaways" icon="lightbulb">
    * Multi-stage builds for optimized images
    * Layer caching for faster builds
    * Run as non-root user
    * docker-compose for local development
    * Scan images for vulnerabilities
  </Card>

  <Card title="Next Steps" icon="arrow-right">
    In the next chapter, we'll deploy to **Kubernetes** - the industry standard for container orchestration.
  </Card>
</CardGroup>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="'Your Docker image for a Node.js microservice is 1.2GB. Walk me through how you would reduce it and why size matters in a microservices context.'">
    **Strong Answer:**

    A 1.2GB image typically means using a full `node:20` base image (which includes Debian, build tools, and development dependencies) and including `node_modules` with devDependencies. In microservices, image size matters for three concrete reasons: pull time during scaling events (pulling 1.2GB across 10 new pods takes minutes, not seconds), registry storage costs (20 services x 1.2GB x 50 builds/week adds up), and security attack surface (every unnecessary package is a potential CVE).

    Step one: multi-stage build. The build stage uses `node:20` (full image with build tools) to run `npm ci` and compile TypeScript. The production stage uses `node:20-alpine` (50MB base) and copies only the production node\_modules and compiled output. This alone typically drops the image from 1.2GB to 200-300MB.

    Step two: production-only dependencies. Use `npm ci --only=production` in the production stage. Development dependencies (TypeScript, Jest, ESLint) are huge and not needed at runtime.

    Step three: `.dockerignore` to exclude `.git`, `test/`, `docs/`, local `.env` files, and `node_modules` (they get reinstalled in the container anyway). I have seen images that included the entire git history because there was no `.dockerignore`.

    Step four: if the application does not need native bindings, consider `node:20-alpine` with `--production` flag throughout. Alpine-based images are 50MB versus 900MB for Debian-based. The caveat: some npm packages with native bindings (bcrypt, sharp) need additional build tools on Alpine (`apk add --no-cache python3 make g++`).

    The result is typically a 100-150MB image. For extreme optimization, I have seen teams use distroless images (Google's `gcr.io/distroless/nodejs20`) that contain only the Node.js runtime and nothing else -- no shell, no package manager, no utilities. This is the most secure option but makes debugging harder because you cannot exec into the container.

    **Follow-up: "How do you handle the security scanning of container images in a CI/CD pipeline with 20 microservices?"**

    Every image gets scanned as part of the CI pipeline before it can be pushed to the registry. I use Trivy (open source) or Snyk Container integrated into the GitHub Actions workflow. The pipeline fails if any CRITICAL or HIGH severity CVE is detected in the base image or dependencies. The key insight is pinning base image digests, not just tags: `node:20-alpine@sha256:abc123` ensures reproducibility, while `node:20-alpine` can silently change when the upstream tag is updated.
  </Accordion>

  <Accordion title="'How do you handle configuration differences between local development (docker-compose), staging, and production (Kubernetes) without maintaining three separate sets of config?'">
    **Strong Answer:**

    The 12-Factor App principle says configuration should come from the environment, not from code. The application reads configuration from environment variables, and the deployment platform provides the values.

    For local development, I use docker-compose with a `.env` file that mirrors the production variable names but with local values (`DATABASE_URL=postgres://localhost:5432/mydb`). The `docker-compose.override.yml` file adds development-specific settings (volume mounts for hot reload, debug ports) without modifying the base `docker-compose.yml`.

    For staging and production on Kubernetes, ConfigMaps hold non-sensitive config and Secrets hold sensitive values. Both are injected as environment variables into the pod spec. The same application binary reads the same environment variable names regardless of whether the value comes from `.env`, ConfigMap, or Vault.

    The common mistake I see is teams creating environment-specific Dockerfiles or building different images per environment. The image should be identical across all environments -- only the configuration differs. Build once, deploy everywhere. This guarantees that the image you tested in staging is the exact same binary running in production.

    For complex configuration that does not fit in environment variables (feature flag rulesets, routing tables), I use a configuration service (Consul KV, AWS AppConfig) that the application polls at startup and watches for changes. The configuration service has its own environment hierarchy (dev -> staging -> production) with inheritance and overrides.

    **Follow-up: "How do you handle secrets in the local docker-compose development environment without committing them to the repo?"**

    I use a `.env.example` file in the repo with placeholder values, and the actual `.env` file is in `.gitignore`. New developers copy `.env.example` to `.env` and fill in their local values. For shared development secrets (a staging API key), I store them in a team password manager (1Password, Vault) and reference them in documentation. Some teams use `docker-compose` with Vault integration even locally, which is more secure but adds setup complexity. The pragmatic middle ground: use a tool like `direnv` that loads `.env` files automatically and warns if required variables are missing.
  </Accordion>

  <Accordion title="'A container is using 2GB of memory and getting OOM-killed in production. The same service runs fine locally with 512MB. What is going on?'">
    **Strong Answer:**

    The most common cause in Node.js: the default V8 heap limit. Node.js does not automatically limit its heap to the container's memory limit. By default, V8 allocates up to 1.5GB of heap (on 64-bit systems), regardless of the container's memory constraint. If the container has a 2GB memory limit, Node.js happily grows its heap to 1.5GB, and when you add the non-heap overhead (V8 internals, native buffers, connection pools), total RSS exceeds 2GB and Kubernetes OOM-kills the container.

    Locally, you run one instance with low traffic and the heap never grows beyond 512MB. In production, with hundreds of concurrent requests, each holding state (request objects, response buffers, database query results), the heap grows.

    The fix is to set `--max-old-space-size` to about 75% of the container memory limit. If the container has 2GB, set `--max-old-space-size=1536` (1.5GB). This leaves 512MB for V8 internals, native buffers, and OS overhead. In the Dockerfile: `CMD ["node", "--max-old-space-size=1536", "src/index.js"]`.

    If the memory issue persists after setting the heap limit, you have a memory leak. Common culprits in microservices: event listeners that are registered but never removed (adding a new listener on every request), growing in-memory caches without eviction policies, unreleased database connections in error paths, and closures that capture large objects and prevent garbage collection.

    To diagnose, I enable `--expose-gc` and take heap snapshots in production using `v8.writeHeapSnapshot()` triggered by an admin endpoint. Comparing two snapshots taken 10 minutes apart reveals which objects are growing. Chrome DevTools can analyze these snapshots to find the retention path.

    **Follow-up: "How do you set memory limits in Kubernetes to prevent one misbehaving service from affecting others?"**

    Every pod gets resource requests and limits. Requests are the guaranteed amount (used for scheduling), limits are the maximum (OOM-killed if exceeded). I set requests to the P90 memory usage from production metrics and limits to 2x the requests. For a service that typically uses 512MB, I would set `requests.memory: 512Mi` and `limits.memory: 1Gi`. The requests ensure Kubernetes schedules the pod on a node with enough free memory. The limits prevent a memory leak from consuming the entire node. I also set up alerts when memory usage exceeds 80% of the limit -- that gives the team time to investigate before the OOM kill happens.
  </Accordion>
</AccordionGroup>
