Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Containerization with Docker

Docker provides consistent, isolated environments for microservices. Learn to build optimized containers and orchestrate multiple services.
Learning Objectives:
  • Write production-ready Dockerfiles
  • Implement multi-stage builds
  • Optimize image size and build time
  • Use docker-compose for local development
  • Apply container security best practices

Docker Fundamentals for Microservices

Before we dive into Dockerfiles, it’s worth slowing down to ask: why did the industry settle on containers as the unit of deployment for microservices? The answer is that microservices fundamentally change the operational problem. When you had one monolith, you had one runtime to standardize. When you have 50 services written in 4 languages with different library versions, you either standardize the host (which freezes everyone to a single Node version) or you standardize the unit of deployment. Containers let you standardize the latter while keeping each team free to choose their own language, framework, and dependency versions. The deeper tradeoff is isolation versus overhead. A virtual machine gives you perfect isolation but boots in minutes and consumes gigabytes of RAM. Running 50 VMs on a single host is impractical. Containers share the host kernel and provide namespace-level isolation — which is “good enough” for most workloads — and boot in milliseconds with megabytes of overhead. The catch: containers are not a security boundary in the same way VMs are. A kernel exploit in one container can potentially escape to the host. For untrusted code, you still want VMs or micro-VMs like Firecracker underneath. If you skip containerization in a microservices architecture, you’ll feel the pain quickly: environment drift between dev and prod, dependency conflicts between services on the same host, slow and unreliable deployments, and no clean way to scale individual services independently. Docker solves all of these at once, which is why it became the de facto standard.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY DOCKER FOR MICROSERVICES?                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    WITHOUT CONTAINERS                                │    │
│  │                                                                      │    │
│  │  Developer Machine         Staging              Production          │    │
│  │  ┌──────────────────┐     ┌──────────────┐     ┌──────────────┐    │    │
│  │  │ Node 16          │     │ Node 14      │     │ Node 18      │    │    │
│  │  │ npm 7            │     │ npm 6        │     │ npm 9        │    │    │
│  │  │ Linux Mint       │     │ Ubuntu 20.04 │     │ Amazon Linux │    │    │
│  │  └──────────────────┘     └──────────────┘     └──────────────┘    │    │
│  │                                                                      │    │
│  │  "It works on my machine!" 🤷                                        │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                    WITH CONTAINERS                                   │    │
│  │                                                                      │    │
│  │  Developer Machine         Staging              Production          │    │
│  │  ┌──────────────────┐     ┌──────────────────┐ ┌──────────────────┐│    │
│  │  │ ┌──────────────┐ │     │ ┌──────────────┐ │ │ ┌──────────────┐ ││    │
│  │  │ │  Container   │ │     │ │  Container   │ │ │ │  Container   │ ││    │
│  │  │ │  Node 20     │ │     │ │  Node 20     │ │ │ │  Node 20     │ ││    │
│  │  │ │  Alpine      │ │     │ │  Alpine      │ │ │ │  Alpine      │ ││    │
│  │  │ └──────────────┘ │     │ └──────────────┘ │ │ └──────────────┘ ││    │
│  │  └──────────────────┘     └──────────────────┘ └──────────────────┘│    │
│  │                                                                      │    │
│  │  Same container everywhere! ✅                                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Production-Ready Dockerfile

Dockerfile Stages: Why the Order Matters

A Dockerfile is not a shell script — it’s a declarative build recipe that the Docker daemon compiles into an immutable image. Each instruction creates a discrete layer that is content-addressed by its hash, and that hash is the basis for every caching decision Docker makes. This means the ordering of your instructions is not a stylistic choice; it’s the single most consequential factor in build performance and image correctness. The stages we’ll build through — base image, system packages, dependency manifest, dependency install, source code, runtime configuration — are deliberately sequenced from “rarely changes” to “changes on every commit.” What goes wrong when you ignore this? In the best case, your CI pipeline takes 10x longer than it should because dependencies reinstall on every build. In the worst case, you ship stale dependencies because a cached layer higher in the chain masks an update you thought you made. Teams routinely waste hundreds of engineer-hours per year on slow builds that a 30-second Dockerfile reorder would have prevented. The tradeoff to be aware of: aggressive caching can occasionally mask bugs where a build works in CI but fails with --no-cache, so every production image should be rebuilt without cache at least weekly to catch latent issues.

Basic Node.js Dockerfile

A Dockerfile is a build recipe. Every instruction produces a new layer in the image, and those layers are cached independently. The ordering of instructions is not cosmetic — it controls caching behavior, which in turn controls your build time and your CI pipeline cost. The basic Dockerfile below demonstrates several non-obvious principles: copy the dependency manifest before the source code (so dependency installation is cached), set NODE_ENV=production so libraries skip development-only code paths, create a dedicated non-root user (so a compromised process cannot modify the filesystem or escalate privileges), and declare a HEALTHCHECK so the container runtime can detect when the app is alive but not responsive. If you skip the non-root user step, your container runs as UID 0 inside the namespace — and in some container escape scenarios, that maps to root on the host. This is one of the most consequential security mistakes you can make. Fixing it is trivially cheap; not fixing it can be catastrophic. Similarly, if you omit the healthcheck, orchestrators cannot distinguish a frozen process from a healthy one, and you end up routing traffic to dead pods.
Caveats & Common Pitfalls in Dockerfile Construction
  • Running as root. The default FROM node:20 or FROM python:3.12 runs as UID 0. A code execution bug in the app now runs as root inside the container — and with privilege escalation CVEs (e.g., CVE-2022-0185, CVE-2024-21626) can escape to the host as root.
  • Base images with known CVEs. Pulling node:20 once and never rebuilding means you inherit every CVE disclosed in the base image since then. Some of these are critical (glibc, OpenSSL, libxml2).
  • No HEALTHCHECK. Orchestrators cannot tell a zombie process from a working one. Traffic routes to frozen pods; users see timeouts.
  • latest tag in production. node:latest today is not node:latest tomorrow — a silent base-image change at 3am means your build is no longer reproducible and a broken upstream release breaks your deploy.
Solutions & Patterns
  • Always create and switch to a non-root user with explicit UID/GID (1001 is conventional and non-privileged). This is a 3-line Dockerfile change and one of the highest-leverage security improvements you can make.
  • Rebuild base images weekly in CI (scheduled job) so newly disclosed CVEs get picked up via OS package updates. Pair with a Trivy / Grype scan that fails the build on CRITICAL findings.
  • Pin base images by digest, not tag: FROM node:20-alpine@sha256:abc123.... Reproducible builds, no silent upgrades. Bump the digest intentionally as part of a PR.
  • Add a small, local HEALTHCHECK that calls the app’s /health over localhost — no external dependencies. Use curl/wget only if already present; otherwise use the language’s stdlib HTTP client to avoid adding a binary.
  • Prefer distroless (gcr.io/distroless/nodejs20-debian12) or Chainguard Images for production — no shell, no package manager, minimal attack surface.
# Dockerfile - Basic
FROM node:20-alpine

WORKDIR /app

# Copy package files first (better caching)
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code
COPY . .

# Set environment
ENV NODE_ENV=production
ENV PORT=3000

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Start application
CMD ["node", "src/index.js"]
The same principles apply to a Python service. The layering is identical — manifest first, install, then source — but the mechanics differ. requirements.txt is your package.json equivalent, pip install is your npm ci, and you want PYTHONUNBUFFERED=1 so stdout flushes immediately (critical for containerized logging). Setting PYTHONDONTWRITEBYTECODE=1 prevents Python from writing .pyc files at runtime, which are useless in an ephemeral container and just bloat the writable layer.
# Dockerfile - Basic Python FastAPI
FROM python:3.12-alpine

WORKDIR /app

# Copy dependency manifest first (better caching)
COPY requirements.txt ./

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Set environment
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PORT=8000

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

# Expose port
EXPOSE 8000

# Health check (urllib is built in — no extra deps)
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)" || exit 1

# Start application
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-Stage Build (Optimized)

The single biggest lever for reducing image size and improving security is multi-stage builds. The idea is deceptively simple: you define several independent build stages in one Dockerfile and copy only the artifacts you need from earlier stages into the final image. The build stage can contain compilers, test tools, source code, and devDependencies — none of which ship to production. The final stage contains only the runtime, the compiled output, and production dependencies. Why does this matter in a microservices context? Because you’re building dozens of images per day across many services. A 1GB image versus a 150MB image means 10x more pull time during autoscaling events, 10x more registry storage cost, and 10x more attack surface. The build tools you use at build time (TypeScript compiler, webpack, test frameworks) are often the source of CVEs — shipping them to production is gratuitous risk. Multi-stage builds also enforce a clean separation between “what you need to build the app” and “what you need to run it,” which is a discipline that pays dividends as services grow. One gotcha worth flagging: dumb-init (or tini) is installed as PID 1 inside the container. Without it, Node.js becomes PID 1, and it doesn’t handle Unix signals the way init systems do — meaning docker stop sends SIGTERM, Node ignores it, and after 10 seconds Docker escalates to SIGKILL. Your in-flight requests are cut mid-response. With dumb-init, signals are forwarded correctly and graceful shutdown works.
Caveats & Common Pitfalls with Multi-Stage Builds
  • 1GB+ images in production. Typical symptoms: using node:20 full image (not Alpine/slim), copying devDependencies, including the .git directory, or forgetting to prune dev deps. A 1GB image means 5-10x longer pod startup during autoscaling, 10x higher registry egress cost, and a much larger CVE surface.
  • Multi-stage confusion. People write a multi-stage Dockerfile that copies from the wrong stage, ends up with both dev and prod deps, or rebuilds dependencies in the final stage — defeating the whole point.
  • COPY --from=builder /app /app copies everything including .git, test fixtures, __pycache__, and node_modules/.cache. The final image still balloons.
  • Shipping build tools (gcc, make, TypeScript compiler, webpack) to production. Each is an attack-surface expansion and none are needed at runtime.
Solutions & Patterns
  • Multi-stage with explicit copies. Name stages (AS builder, AS production) and copy specific paths, not the entire directory. For Node: COPY --from=builder /app/dist /app/node_modules ./. For Python: build wheels in one stage, pip install --no-index --find-links=/wheels in the final stage with no compiler in sight.
  • Use node:20-alpine or python:3.12-slim as the production base — roughly 50-150 MB vs. 350-900 MB for the full variants.
  • Aggressive .dockerignore. Exclude .git, node_modules, __pycache__, .venv, tests, docs, local .env files. This shrinks the build context and prevents accidental leaks (like .env with secrets getting baked in via COPY . .).
  • Verify image size in CI. Fail the build if the final image exceeds a budget (e.g., 200 MB for a Node service, 300 MB for a Python service with typical deps). Use docker image inspect --format='{{.Size}}' or dive for layer analysis.
  • For the smallest possible images, try distroless or static-linked Go / Rust binaries on scratch — sub-20 MB is achievable.
# Dockerfile - Multi-stage build
# ================================

# Stage 1: Dependencies
FROM node:20-alpine AS deps

WORKDIR /app

# Copy package files
COPY package*.json ./

# Install ALL dependencies (including devDependencies for build)
RUN npm ci

# ================================

# Stage 2: Build
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependencies from deps stage
COPY --from=deps /app/node_modules ./node_modules

# Copy source code
COPY . .

# Build the application (TypeScript, bundling, etc.)
RUN npm run build

# Prune devDependencies
RUN npm prune --production

# ================================

# Stage 3: Production
FROM node:20-alpine AS production

WORKDIR /app

# Add labels for image metadata
LABEL org.opencontainers.image.source="https://github.com/myorg/order-service"
LABEL org.opencontainers.image.description="Order Service"
LABEL org.opencontainers.image.version="1.0.0"

# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init

# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Copy built application
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/package.json ./

# Set environment
ENV NODE_ENV=production
ENV PORT=3000

# Switch to non-root user
USER appuser

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => process.exit(r.statusCode === 200 ? 0 : 1))"

# Use dumb-init as entrypoint
ENTRYPOINT ["dumb-init", "--"]

# Start application
CMD ["node", "dist/index.js"]

TypeScript Specific Dockerfile

TypeScript adds an extra wrinkle: you need the compiler (tsc), type definitions, and your source files at build time, but none of them belong in production. The pattern below explicitly separates “build with types” from “run compiled JS.” Running npm run type-check && npm run build ensures the image fails fast if there are type errors, catching problems in CI before they reach production. Forgetting the type-check step is a subtle mistake — tsc with --noEmitOnError still emits by default in many setups, meaning broken code can ship.
# TypeScript Service Dockerfile
FROM node:20-alpine AS builder

WORKDIR /app

COPY package*.json ./
COPY tsconfig*.json ./

RUN npm ci

COPY src ./src

# Type checking and build
RUN npm run type-check && npm run build

# Production stage
FROM node:20-alpine

WORKDIR /app

RUN apk add --no-cache dumb-init && \
    addgroup -S app && adduser -S app -G app

COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package*.json ./

RUN npm ci --only=production && npm cache clean --force

USER app

ENV NODE_ENV=production

EXPOSE 3000

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
For Python microservices, the same multi-stage pattern applies, but the details differ. Python doesn’t have a compilation step in the same way TypeScript does, but it does have wheel building (for packages with C extensions) which is expensive and requires build tools you don’t want in production. The pattern below uses a builder stage to install dependencies into a virtualenv, then copies the virtualenv into a slim runtime image.
# Python FastAPI Service Dockerfile (multi-stage)
FROM python:3.12-alpine AS builder

WORKDIR /app

# Install build dependencies (only in builder stage)
RUN apk add --no-cache gcc musl-dev libffi-dev

# Create virtualenv
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.12-alpine

WORKDIR /app

RUN apk add --no-cache dumb-init && \
    addgroup -S app && adduser -S app -G app

# Copy virtualenv from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY --chown=app:app src ./src

USER app

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

EXPOSE 8000

ENTRYPOINT ["dumb-init", "--"]
CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

Python FastAPI Multi-Stage with Wheel Trick

For production FastAPI workloads, the “wheel trick” is a step beyond a plain virtualenv copy. Instead of installing packages directly into the virtualenv during build, you first build wheels (pre-compiled binary distributions) in the builder stage, then install them in the runtime stage. This separates compilation (which needs gcc, make, and headers) from installation (which only needs pip). The benefit: the runtime stage never sees a C compiler, and the wheel cache can be reused across multiple services that share dependencies. For a service with packages like pydantic-core (Rust-compiled), uvloop (C extension), or asyncpg (C extension), this shaves significant time and image size. Production Python services typically run under gunicorn with uvicorn workers rather than uvicorn directly. gunicorn gives you battle-tested process management, graceful restarts, and worker lifecycle hooks; uvicorn provides the ASGI event loop that FastAPI needs. The combination — gunicorn as the supervisor, uvicorn.workers.UvicornWorker as the worker class — is the standard production pattern.
# Dockerfile - Production Python FastAPI (wheel trick)
# ================================

# Stage 1: Build wheels
FROM python:3.12-slim AS builder

WORKDIR /build

# Install build dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    libffi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt ./

# Build wheels for all dependencies into /build/wheels
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# ================================

# Stage 2: Production runtime
FROM python:3.12-slim AS production

WORKDIR /app

# Add image metadata
LABEL org.opencontainers.image.source="https://github.com/myorg/order-service"
LABEL org.opencontainers.image.description="Order Service (FastAPI)"
LABEL org.opencontainers.image.version="1.0.0"

# Install only runtime deps (dumb-init for signal handling)
RUN apt-get update && apt-get install -y --no-install-recommends \
    dumb-init \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user with specific UID/GID
RUN groupadd -r -g 1001 appgroup && \
    useradd -r -u 1001 -g appgroup appuser

# Copy wheels from builder and install (no compiler needed)
COPY --from=builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

# Copy application with correct ownership
COPY --chown=appuser:appgroup src ./src
COPY --chown=appuser:appgroup gunicorn_conf.py ./

# Runtime environment
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PORT=8000
ENV WORKERS=4

USER appuser

EXPOSE 8000

# Health check using stdlib urllib (no curl/wget needed)
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"

ENTRYPOINT ["dumb-init", "--"]

# gunicorn with uvicorn workers is the production standard for FastAPI
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000", \
     "--access-logfile", "-", \
     "--error-logfile", "-", \
     "--timeout", "30", \
     "--graceful-timeout", "30", \
     "--keep-alive", "5"]

Docker Optimization

Layer Caching: The Invisible Performance Lever

Layer caching is one of those topics that feels academic until the first time you watch a junior engineer wait 12 minutes for a CI build that should take 90 seconds. The reason it matters in microservices specifically: you are not building one image — you are building dozens, across multiple pipelines, many times per day. A 10-minute cache-miss build multiplied by 30 services multiplied by 15 daily merges is a full-time engineer’s worth of wall-clock time waiting on CI every week. That’s real money, and it’s recovered by getting the layer order right. What goes wrong without proper caching? CI pipelines become slow enough that developers batch changes instead of pushing small commits (because “builds are slow, I’ll wait”). Feedback loops lengthen from minutes to hours. Developers lose the flow state of “commit, see test result, iterate.” The downstream cost is not just raw time — it’s degraded engineering culture. The tradeoff to know: aggressive caching can hide transitive dependency issues (a bumped lockfile that wasn’t actually installed because the layer was cached against stale content), so scheduled --no-cache rebuilds are a hygiene practice worth scripting.

Layer Caching Strategy

Docker layer caching is one of those topics that feels academic until your CI pipeline takes 12 minutes instead of 90 seconds — then it becomes urgent. Each instruction in a Dockerfile creates a layer, and Docker caches each layer based on the instruction plus the files it depends on. If nothing above a layer has changed, Docker reuses the cached layer instead of re-executing the instruction. The implication: put the instructions that change rarely (base image, system packages) at the top, and the instructions that change frequently (source code) at the bottom. The classic mistake is copying the entire project with COPY . . before running npm ci. Every time any source file changes — even a comment in a README — the cache invalidates for the npm ci layer and you reinstall all dependencies from scratch. On a large project with 500 dependencies, that’s 2-3 minutes wasted on every single build. By splitting COPY package*.json ./ and RUN npm ci into their own earlier layers, dependency installation is cached until package.json itself changes. This single reordering often cuts CI time by 80%.
# Optimize layer caching
# ================================

# Base image (rarely changes)
FROM node:20-alpine

WORKDIR /app

# 1. System dependencies (rarely change)
RUN apk add --no-cache dumb-init

# 2. Package files (change sometimes)
COPY package*.json ./

# 3. Dependencies (change when package.json changes)
RUN npm ci --only=production

# 4. Application code (changes frequently)
COPY . .

# Order matters: least changing layers first!

Reducing Image Size

Image size is not just an aesthetic concern. In Kubernetes, a larger image means slower pod startup (more time to pull from the registry), which means slower autoscaling response to traffic spikes. It also means more egress bandwidth cost from your registry and more storage cost. For organizations running hundreds of services, a 500MB reduction per image compounds into meaningful savings. The four levers are: minimal base images, cleanup in the same layer, .dockerignore, and multi-stage builds. The “same layer” rule is subtle: each RUN command creates a layer, and deleting files in a later layer doesn’t remove them from the earlier layer. If you download a 200MB tarball in one RUN and delete it in the next RUN, the tarball still lives in your image, just hidden. That’s why you see those long && chains in production Dockerfiles — everything that creates and deletes transient files must happen within a single layer.
# Size optimization techniques

# 1. Use slim base images
FROM node:20-alpine  # ~50MB vs node:20 ~350MB

# 2. Clean up in same layer
RUN npm ci --only=production && \
    npm cache clean --force && \
    rm -rf /tmp/*

# 3. Use .dockerignore
# .dockerignore file:
# node_modules
# npm-debug.log
# Dockerfile*
# docker-compose*
# .git
# .gitignore
# .env*
# *.md
# tests
# coverage
# .nyc_output

# 4. Multi-stage builds (shown above)

# 5. Don't install dev dependencies in production
RUN npm ci --only=production

.dockerignore

The .dockerignore file is the most underappreciated tool in the Docker ecosystem. It works like .gitignore but for the Docker build context — the set of files Docker sends to the daemon when you run docker build. Without a .dockerignore, every build ships your entire .git directory, local node_modules, .env files (which may contain secrets), test fixtures, and documentation to the daemon. This bloats the build context, slows down builds, and risks baking secrets into images. The fix takes five minutes; the cost of skipping it can be much higher. A particularly nasty scenario: a developer has .env in their local directory with production database credentials for debugging. They run docker build without a .dockerignore. The .env file is included in the context, and a later COPY . . baked it into the image. The image is pushed to a public registry. The credentials leak. This has happened to real companies.
# .dockerignore
node_modules
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Python
__pycache__
*.pyc
*.pyo
*.egg-info
.pytest_cache
.mypy_cache
.ruff_cache
.venv
venv

# Git
.git
.gitignore
.gitattributes

# Docker
Dockerfile*
docker-compose*
.docker

# IDE
.idea
.vscode
*.swp
*.swo

# Test
tests
__tests__
coverage
.nyc_output
jest.config.*

# Docs
*.md
docs

# Environment files
.env*
!.env.example

# Build artifacts
dist
build

# Misc
.DS_Store
Thumbs.db

Docker Compose for Microservices

Why Compose Exists and What It Saves You

Compose is the answer to a very specific question: how do you bring up an entire microservices topology on a developer laptop without losing your mind? In production you have Kubernetes doing this job, but Kubernetes on a laptop (minikube, kind, k3d) is heavy — it adds a layer of etcd, a scheduler, a control plane, and requires you to understand Pods, Deployments, Services, and Ingresses just to see logs. Compose strips all of that away and gives you “a YAML file that starts all the containers.” For 99% of local development, that’s exactly right. What goes wrong without Compose? Teams invent fragile shell scripts that run docker run for each service, with hardcoded IP addresses, race conditions on startup order, and broken cleanup when one service crashes. New developers spend their first week fighting “the setup” instead of writing code. With Compose, docker-compose up brings everything online, docker-compose down tears it all down cleanly, and the YAML file lives in version control as executable documentation of how services fit together. The tradeoff: Compose is not production-grade. Don’t try to run it on a server “just for staging” — you lose scheduling, self-healing, rolling updates, and every other feature that makes orchestration valuable. Use Compose for dev; use Kubernetes for anything shared. Docker Compose is the pragmatic choice for local development in a microservices architecture. The problem it solves: you have 8 services, 3 databases, a message broker, and a cache — starting them individually with correct network configuration, environment variables, and startup order is tedious and error-prone. Compose lets you declare the entire topology in one YAML file and spin it up with docker-compose up. It’s not production-grade (that’s Kubernetes’s job), but for local development and simple CI environments, it’s dramatically simpler than Kubernetes. The key capabilities Compose provides: a shared network where services can reach each other by service name (no IP addresses to hardcode), volume management for persistent data, dependency ordering via depends_on, and healthchecks to ensure dependencies are actually ready (not just started). The depends_on with condition: service_healthy pattern is particularly important — without it, services start before their databases are ready and fail on the first query. With it, Compose waits for the database’s healthcheck to pass before starting the dependent service. If you try to replicate this setup without Compose (running each service manually with docker run), you’ll spend more time fighting the tooling than writing code. If you try to use Kubernetes locally (via minikube or kind), you get production-like behavior but with much higher mental overhead. Compose hits the sweet spot for local development.

Complete Microservices Stack

# docker-compose.yml
version: '3.8'

services:
  # ===================
  # API Gateway
  # ===================
  api-gateway:
    build:
      context: ./services/api-gateway
      dockerfile: Dockerfile
    ports:
      - "8080:3000"
    environment:
      - NODE_ENV=development
      - ORDER_SERVICE_URL=http://order-service:3000
      - PAYMENT_SERVICE_URL=http://payment-service:3000
      - INVENTORY_SERVICE_URL=http://inventory-service:3000
      - JWT_SECRET=${JWT_SECRET}
      - REDIS_URL=redis://redis:6379
    depends_on:
      - order-service
      - payment-service
      - inventory-service
      - redis
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "wget", "-q", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # ===================
  # Order Service
  # ===================
  order-service:
    build:
      context: ./services/order-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/order-service:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@order-db:5432/orders
      - KAFKA_BROKERS=kafka:9092
      - REDIS_URL=redis://redis:6379
    depends_on:
      order-db:
        condition: service_healthy
      kafka:
        condition: service_healthy
    networks:
      - microservices

  order-db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=orders
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - order-db-data:/var/lib/postgresql/data
      - ./services/order-service/db/init.sql:/docker-entrypoint-initdb.d/init.sql
    networks:
      - microservices
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Payment Service (Python FastAPI)
  # ===================
  payment-service:
    build:
      context: ./services/payment-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/payment-service:/app
      # Keep container's virtualenv isolated from host bind mount
      - /app/.venv
    environment:
      - PYTHONUNBUFFERED=1
      - PYTHONDONTWRITEBYTECODE=1
      - MONGODB_URI=mongodb://payment-db:27017/payments
      - STRIPE_API_KEY=${STRIPE_API_KEY}
      - KAFKA_BROKERS=kafka:9092
      - LOG_LEVEL=debug
    ports:
      - "8001:8000"
    depends_on:
      payment-db:
        condition: service_healthy
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s

  payment-db:
    image: mongo:6
    volumes:
      - payment-db-data:/data/db
    networks:
      - microservices
    healthcheck:
      test: echo 'db.runCommand("ping").ok' | mongosh localhost:27017/test --quiet
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Inventory Service
  # ===================
  inventory-service:
    build:
      context: ./services/inventory-service
      dockerfile: Dockerfile
      target: development
    volumes:
      - ./services/inventory-service:/app
      - /app/node_modules
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgresql://postgres:postgres@inventory-db:5432/inventory
      - REDIS_URL=redis://redis:6379
    depends_on:
      inventory-db:
        condition: service_healthy
    networks:
      - microservices

  inventory-db:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=inventory
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - inventory-db-data:/var/lib/postgresql/data
    networks:
      - microservices
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  # ===================
  # Infrastructure
  # ===================
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    networks:
      - microservices
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

  kafka:
    image: confluentinc/cp-kafka:7.5.0
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - kafka-data:/var/lib/kafka/data
    networks:
      - microservices
    healthcheck:
      test: kafka-topics --bootstrap-server localhost:9092 --list
      interval: 30s
      timeout: 10s
      retries: 5

  # ===================
  # Observability
  # ===================
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./infrastructure/prometheus:/etc/prometheus
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    networks:
      - microservices

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana
      - ./infrastructure/grafana/provisioning:/etc/grafana/provisioning
    networks:
      - microservices

  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
      - "4317:4317"
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - microservices

networks:
  microservices:
    driver: bridge

volumes:
  order-db-data:
  payment-db-data:
  inventory-db-data:
  redis-data:
  kafka-data:
  prometheus-data:
  grafana-data:

Development Dockerfile with Hot Reload

Developer ergonomics matter. If every code change requires rebuilding the container (30-60 seconds), developers will disengage from containerized workflows and go back to running services directly on their laptops — which reintroduces all the “works on my machine” problems. The solution is a development target in your Dockerfile that mounts source code as a volume and runs with a file watcher (nodemon for Node, uvicorn --reload for Python). Changes to source files trigger automatic reloads without rebuilding the container. The key insight is the target keyword in Compose. One Dockerfile, two targets: development (with file watching) and production (compiled, locked down). In CI/CD, you build the production target. In local development, Compose builds the development target. Same Dockerfile, same base image, same dependencies — only the final layer differs. This is leagues better than maintaining two separate Dockerfiles.
# Dockerfile with development and production targets (Node.js)
FROM node:20-alpine AS base

WORKDIR /app
RUN apk add --no-cache dumb-init

# Development stage
FROM base AS development

# Install all dependencies including devDependencies
COPY package*.json ./
RUN npm install

# Mount source code as volume in docker-compose
# No COPY needed - will use volume mount

ENV NODE_ENV=development

EXPOSE 3000

# Use nodemon for hot reload
CMD ["npx", "nodemon", "--watch", "src", "src/index.js"]

# Production stage
FROM base AS production

COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

COPY . .

RUN addgroup -S app && adduser -S app -G app
USER app

ENV NODE_ENV=production
EXPOSE 3000

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "src/index.js"]
The Python equivalent uses uvicorn --reload as the file watcher during development and gunicorn with uvicorn workers in production. Two important mechanics: in development we mount the source code as a volume and install all requirements (including dev tooling); in production we bake the code into the image and install only runtime dependencies. The --reload flag uses watchfiles under the hood to detect file changes and restart the ASGI server.
# Dockerfile with development and production targets (Python FastAPI)
FROM python:3.12-slim AS base

WORKDIR /app

# Runtime deps needed in both dev and prod
RUN apt-get update && apt-get install -y --no-install-recommends \
    dumb-init \
    && rm -rf /var/lib/apt/lists/*

ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Development stage
FROM base AS development

# Install build deps (devs may need to reinstall packages with C extensions)
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc \
    g++ \
    libffi-dev \
    && rm -rf /var/lib/apt/lists/*

# Install all deps including dev tooling (pytest, black, mypy, etc.)
COPY requirements.txt requirements-dev.txt ./
RUN pip install --no-cache-dir -r requirements.txt -r requirements-dev.txt

# Source mounted as volume via docker-compose — no COPY needed

EXPOSE 8000

# uvicorn --reload watches filesystem and restarts on change
CMD ["uvicorn", "src.main:app", \
     "--host", "0.0.0.0", \
     "--port", "8000", \
     "--reload", \
     "--reload-dir", "/app/src"]

# Production stage
FROM base AS production

# Build stage for wheels
FROM python:3.12-slim AS prod-builder

WORKDIR /build
RUN apt-get update && apt-get install -y --no-install-recommends \
    gcc g++ libffi-dev && rm -rf /var/lib/apt/lists/*

COPY requirements.txt ./
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# Final production image
FROM base AS prod-final

WORKDIR /app

RUN groupadd -r -g 1001 app && useradd -r -u 1001 -g app app

COPY --from=prod-builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

COPY --chown=app:app src ./src

USER app

EXPOSE 8000

ENTRYPOINT ["dumb-init", "--"]
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000"]

Docker Compose Override for Development

docker-compose.override.yml is automatically merged with docker-compose.yml when you run docker-compose up. This lets you keep the base file production-like and layer in development-only settings (volume mounts, debug ports, hot reload commands) without duplicating the entire configuration. The same pattern works for environment-specific overrides: docker-compose.staging.yml, docker-compose.production.yml. Keep the base file as the source of truth and use overrides for the deltas.
# docker-compose.override.yml
version: '3.8'

# This file is automatically merged with docker-compose.yml
# for local development

services:
  order-service:
    build:
      target: development
    volumes:
      - ./services/order-service/src:/app/src
    command: npx nodemon --watch src src/index.js
    ports:
      - "3001:3000"
      - "9229:9229"  # Node debugger

  # Python FastAPI payment-service: mount source + enable reload
  payment-service:
    build:
      target: development
    volumes:
      - ./services/payment-service/src:/app/src
      # Shield the container's virtualenv from host bind mount
      - /app/.venv
    command: >
      uvicorn src.main:app
      --host 0.0.0.0
      --port 8000
      --reload
      --reload-dir /app/src
      --log-level debug
    ports:
      - "8001:8000"
      - "5678:5678"  # debugpy remote debugger

  inventory-service:
    build:
      target: development
    volumes:
      - ./services/inventory-service/src:/app/src
    command: npx nodemon --watch src src/index.js
    ports:
      - "3003:3000"
      - "9231:9229"

Container Security

Security Hardening: Why Defaults Are Dangerous

Docker’s defaults were designed for ease of onboarding, not production security. Out of the box, your container runs as UID 0, has a writable root filesystem, retains most Linux capabilities, and binds to any network it can reach. Every one of those defaults is a liability in a microservice that may be exposed to untrusted input. The threat model is not theoretical: real production breaches in the last few years have started with a code-execution vulnerability in a service (deserialization bug, SSRF, prototype pollution) and escalated because the container defaults gave the attacker far more than they needed. What goes wrong if you skip hardening? Best case, you pass a compliance audit with a stern finding and spend the next quarter doing it properly under pressure. Worst case, a single RCE in a dependency becomes a lateral movement vector into your node, your cloud metadata service, or your Kubernetes API. The fixes — non-root user, read-only filesystem, capability drops, resource limits — are cheap to add at build time and dramatically expensive to add after a breach. The tradeoff: hardened containers are slightly harder to debug live (you can’t write files, you can’t install diagnostic tools), which is why we pair them with observability, remote debuggers, and ephemeral debug pods rather than shell access. Container security is an area where defaults betray you. Out of the box, a container runs as root, can modify its own filesystem, has all Linux capabilities, and can read/write anywhere in the image. Every one of those defaults is wrong for a production microservice. The threat model is straightforward: if an attacker achieves remote code execution in your container (via SQL injection, deserialization bug, or dependency vulnerability), you want the blast radius to be as small as possible. Security hardening is about shrinking that blast radius. The hardening below combines several techniques: non-root user (so the process cannot modify OS files), read-only filesystem (so the attacker cannot drop a malicious binary), dropped capabilities (so the process cannot perform privileged operations like raw socket access), and careful file permissions (so even if the attacker runs as appuser, they cannot modify application code). Each technique individually is cheap; combined, they dramatically raise the cost of exploitation. The tradeoff is that hardened containers are slightly harder to debug. You cannot exec in and write diagnostic files if the filesystem is read-only. The fix is to mount a writable tmpfs at /tmp for transient files, and use remote debugging tools (kubectl port-forward to a debug endpoint) rather than shell access for production troubleshooting.
Caveats & Common Pitfalls in Container Runtime
  • No resource limits = noisy neighbor problem. A pod with no memory limit can consume all node memory; Kubernetes has to evict other pods on the same node to recover. A CPU-unbounded pod starves every other container on the node. One buggy service takes down five innocent ones.
  • privileged: true or hostNetwork: true in production. Gives the container full root on the host network stack. Any RCE becomes a host compromise.
  • Writable root filesystem. Attacker with code execution can drop a malicious binary into /usr/local/bin/ and persist across restarts. Read-only root blocks this entirely.
  • Docker socket mounted into a pod (/var/run/docker.sock). Container escape is trivial — the pod can now start arbitrary containers on the host with any privileges it wants. This pattern shows up in CI runners and “ops helper” pods and is how several cluster-wide compromises happened.
Solutions & Patterns
  • Always set requests and limits for CPU and memory. Base requests on actual p90 usage from production metrics; set limits to roughly 1.5-2x requests to absorb spikes. For memory, requests == limits is a common pattern (guaranteed QoS class) for latency-sensitive services.
  • readOnlyRootFilesystem: true + writable tmpfs at /tmp. Pair with allowPrivilegeEscalation: false and capabilities.drop: ["ALL"] in the pod securityContext.
  • Use Kubernetes PodSecurity admission with the restricted profile enforced at the namespace level — this structurally denies privileged pods, host network, host PID, and other high-risk configurations.
  • Never mount the Docker socket into application pods. If a pod genuinely needs to manage containers (rare — usually only CI runners), use rootless Docker or Buildah / Kaniko instead.
  • Use tools.sigstore.dev/cosign or similar to sign images and admit only signed images into the cluster (via OPA Gatekeeper / Kyverno policy).
Strong Answer Framework:
  1. Confirm it is actually an OOMKill, not just a crash. kubectl describe pod shows Last State: Terminated, Reason: OOMKilled, Exit Code: 137. dmesg on the node shows the kernel OOM-killer event. If the exit code is 137 with OOMKilled reason, you have confirmed the memory-limit hit.
  2. Understand why staging differs from production. Staging typically sees low concurrent load. Production has 100-1000x more concurrent requests, each allocating request/response objects, DB result sets, and connection buffers. Memory grows linearly with concurrency — 512 MB fine at 10 RPS becomes 2+ GB at 1000 RPS.
  3. Runtime-specific root causes:
    • Node.js: V8’s default heap limit is ~1.5 GB regardless of container memory. Plus off-heap: Buffers, native bindings, worker threads. Fix: NODE_OPTIONS=--max-old-space-size=1536 for a 2 GB container, so V8 leaves room for Buffers and OS overhead.
    • Python: Each uvicorn worker is a full process (~100-300 MB baseline). --workers 8 is 800 MB before your app allocates anything. Fix: reduce workers, or switch to threads/async if appropriate.
    • JVM: Before JDK 10, the JVM didn’t respect cgroup memory limits. On modern JDKs, verify with -XX:+PrintFlagsFinal | grep MaxHeapSize inside the container.
  4. Check for memory leaks. Take two heap snapshots 10 minutes apart at steady load. Node: v8.writeHeapSnapshot() triggered by admin endpoint, analyzed in Chrome DevTools. Python: tracemalloc or memray. Growth of retained objects between snapshots is the leak.
  5. Common leak culprits. Event listeners never removed, global caches with no eviction, unreleased DB connections on error paths, closures capturing request objects. Fix each systematically.
  6. Set the correct limits. Set requests.memory to observed p90 usage (e.g., 512 Mi) and limits.memory to ~2x that (1 Gi). Alert on container_memory_working_set_bytes / limit > 0.8 so you know before an OOMKill happens.
Real-World Example: A 2018 Heroku engineering blog covered exactly this pattern — Node.js apps happily allocating past cgroup limits because V8 did not know about cgroups. The fix: set --max-old-space-size to 75% of the container limit. Every Node container in production should have this set explicitly.Senior Follow-up Questions:
  • “How do you distinguish a memory leak from normal load-based growth?” A leak keeps growing even at constant load; load-based growth plateaus at steady state. Plot container_memory_working_set_bytes over 24 hours with constant traffic — if it grows linearly, it is a leak. If it plateaus after warm-up, it is working-set size (fix with limits, not leak hunting).
  • “What is the difference between RSS, working set, and heap?” RSS is everything the process has in physical RAM (heap + stack + shared libraries + file mappings). Working set is RSS minus cold/reclaimable pages — this is what Kubernetes compares against limits.memory. Heap is just the V8/Python/JVM managed allocator — a subset of RSS. OOMKiller uses working set, so heap-only profiling misses things like Buffers, native memory, and mmap’d files.
  • “How would you design a canary deployment to catch memory issues before full rollout?” Deploy to 5% of traffic, watch container_memory_working_set_bytes slope over 30 minutes, auto-rollback if slope exceeds a threshold (e.g., 10 MB/min growth at steady load). Argo Rollouts or Flagger can do this declaratively with Prometheus queries as success criteria.
Common Wrong Answers:
  • “Just increase the memory limit to 4 GB.” Works until next quarter when it OOMs again at 4 GB. You have deferred the problem and doubled the infrastructure cost.
  • “Restart the pod every hour via a liveness probe hack.” Masks the leak, causes request drops during restarts, and leaves the actual bug in place.
Further Reading:
Strong Answer Framework:
  1. Triage which CVEs are actually exploitable in your context. Not every CVE in a base image is a real risk. libxml2 CVE in an image that never parses XML is a paper tiger. Use Grype or Trivy with --ignore-unfixed and review the remaining list; cross-reference each HIGH/CRITICAL against actual usage.
  2. Rebuild first, upgrade second. 80% of CVEs disappear just by rebuilding against the current base image tag (since the upstream image has been patched). docker pull node:20-alpine and rebuild — many findings evaporate without any deliberate upgrade.
  3. Pin by digest after rebuild. Now that you have a fresh, patched image, pin the digest so the build is reproducible: FROM node:20-alpine@sha256:.... Future builds will not silently pick up new changes.
  4. Canary the new image. Deploy to 5-10% of traffic for 30-60 minutes, watch RED metrics (Rate, Errors, Duration) plus CPU/memory, then roll forward. Unexpected base-image regressions (e.g., a glibc change breaking dns.lookup()) have happened — always canary.
  5. Automate from here on. Weekly scheduled rebuild + scan + canary deploy. Dependabot / Renovate watches the base image and opens PRs when upstream updates. The goal: no image in production is more than 14 days behind its base.
  6. Policy at admission. OPA Gatekeeper / Kyverno rule: no image older than 30 days can be admitted to the cluster. Combined with signed images, this enforces hygiene structurally.
Real-World Example: The 2021 log4j (CVE-2021-44228) incident was a textbook case — images that had not been rebuilt in months were exposed for weeks longer than necessary because teams did not have an automated rebuild pipeline. Teams with weekly rebuilds and Dependabot on base images patched within 24 hours; teams without it took 2-4 weeks.Senior Follow-up Questions:
  • “What do you do when a CVE has no fix yet (0-day)?” Three layers: (1) WAF / rate-limit rules that block the exploit pattern, (2) network-level segmentation so the vulnerable service has minimal lateral reach, (3) temporary feature flag to disable the vulnerable code path if it is isolatable. Accept residual risk and monitor for indicators of compromise until a patch ships.
  • “How do you prevent alert fatigue from CVE scans?” Scan policy: fail build only on CRITICAL and HIGH with fix-available: true. Warn (do not fail) on MEDIUM. Ignore LOW. Pair with a monthly triage where an engineer reviews the warning backlog so things do not silently rot.
  • “What about CVEs in the Node.js or Python packages themselves, not just OS packages?” Dependabot / Renovate / Snyk for package.json and requirements.txt / pyproject.toml. Prefer npm audit --production / pip-audit in CI. Lock the lockfile and use npm ci / pip install -r requirements.txt --require-hashes to make the build reproducible and tamper-evident.
Common Wrong Answers:
  • “We’ll wait until the next release cycle to patch.” Not acceptable for HIGH/CRITICAL — you are leaving a known exploitable window open. Emergency patches get fast-tracked outside the regular cycle.
  • “Ignore CVEs in base image and patch only app dependencies.” Base image CVEs are real — glibc, openssl, zlib bugs affect everyone regardless of app code.
Further Reading:
Strong Answer Framework:
  1. Measure before changing. Collect 2 weeks of container_memory_working_set_bytes and container_cpu_usage_seconds_total per service. This gives you p90/p99 usage to set reasonable initial requests and limits from data, not guesses.
  2. Attack in waves by risk, not alphabetically. Start with one low-traffic, non-critical internal service as the pilot. Get the hardening pattern right there, document it, then roll out in waves of 3-5 services. Never do 30 services at once — one subtle issue (e.g., readOnlyRootFilesystem breaking a library that writes to /etc/hosts) will take down everything.
  3. Non-root first, by itself. One PR per service that only changes: add USER 1001, add runAsNonRoot: true and runAsUser: 1001 in pod spec. Deploy. Watch for 48 hours. Some services will break because they bind to port 80 (requires root) — switch to port 8080 + Kubernetes Service mapping 80→8080. Some will break because they write to /var/log — switch to stdout.
  4. Resource limits next. Set requests to p90 observed + 20% headroom, limits to 1.5x requests. Deploy canary, watch for OOMKills and CPU throttling (container_cpu_cfs_throttled_periods_total > 0 means your CPU limit is too tight). Tune based on data.
  5. Read-only filesystem third. This is the most likely to break something. Identify writable paths first: strace -e openat in a test pod to see what the app tries to write. Mount emptyDir at those paths (/tmp, /var/cache, language-specific dirs). Then enable readOnlyRootFilesystem: true.
  6. Capability drop last. Add capabilities.drop: ["ALL"], then add back only what is needed (typically nothing; a load balancer may need NET_BIND_SERVICE for sub-1024 ports but you already moved off those).
  7. Enforce structurally. After all services are migrated, enable the Kubernetes PodSecurity admission restricted profile at the namespace level. Now any new workload is required to be hardened from day one — no regression possible.
Real-World Example: Shopify published (around 2020-2021) a multi-year journey going from a permissive security posture to strict restricted pod security across thousands of services. Their key insight: per-service migration with metrics-based canaries took quarters, not weeks, but the gradual approach had zero incidents while the “big bang” approach at another org (they referenced anonymously) caused multi-hour outages.Senior Follow-up Questions:
  • “How do you handle services that genuinely need privileged operations, like a logging sidecar?” Narrow-scoped capabilities. CAP_DAC_READ_SEARCH for a log tailer that reads files owned by other users, not privileged: true. Document the capability and why, so future reviewers understand. For truly privileged workloads (node-problem-detector, storage drivers), they get their own namespace with relaxed policy — not the application namespace.
  • “What about legacy services that cannot be changed (vendor container, no source)?” Run them in a dedicated namespace with relaxed PodSecurity. Isolate via NetworkPolicy. Put a service mesh gateway in front. If the vendor’s container is a persistent security debt, that becomes a procurement signal to replace the vendor.
  • “How do you validate the migration actually improved security, not just moved things around?” Run a before/after pen-test or automated tool like kube-bench and kube-hunter. Number of findings should drop dramatically. Also track: % of pods running as non-root (target: 100% in app namespaces), % with resource limits (target: 100%), % with read-only root FS (target: >90%).
Common Wrong Answers:
  • “Roll out hardening via a single admission policy change.” Breaks everything that was not migrated, incidents everywhere.
  • “Only harden new services going forward.” The 30 existing services remain vulnerable indefinitely. Security debt is real debt with interest.
Further Reading:

Security Best Practices

# Security-hardened Dockerfile
FROM node:20-alpine AS builder

# Update and patch base image
RUN apk update && apk upgrade

WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Production stage with security hardening
FROM node:20-alpine

# Security updates
RUN apk update && \
    apk upgrade && \
    apk add --no-cache dumb-init && \
    rm -rf /var/cache/apk/*

WORKDIR /app

# Create non-root user with specific UID/GID
RUN addgroup -S -g 1001 appgroup && \
    adduser -S -u 1001 -G appgroup appuser

# Copy with correct ownership
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app .

# Remove unnecessary files
RUN rm -rf .git .gitignore .env* Dockerfile* docker-compose* && \
    chmod -R 500 /app && \
    chmod -R 400 /app/node_modules

# Set security environment variables
ENV NODE_ENV=production
ENV NPM_CONFIG_LOGLEVEL=warn

# Read-only root filesystem support
# (use with --read-only in docker run)
RUN mkdir -p /tmp && chown appuser:appgroup /tmp

# Switch to non-root user
USER appuser

# No capabilities needed
# Use with: docker run --cap-drop=ALL

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
  CMD node healthcheck.js

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
The Python equivalent applies the same hardening principles: minimal base image, non-root user with explicit UID/GID, restrictive file permissions, and read-only filesystem compatibility. One Python-specific note: ensure __pycache__ directories don’t cause issues with a read-only filesystem by setting PYTHONDONTWRITEBYTECODE=1 — otherwise Python tries to write .pyc files at import time and crashes on the read-only mount.
# Security-hardened Python FastAPI Dockerfile
FROM python:3.12-slim AS builder

RUN apt-get update && apt-get upgrade -y && \
    apt-get install -y --no-install-recommends gcc g++ libffi-dev && \
    rm -rf /var/lib/apt/lists/*

WORKDIR /build
COPY requirements.txt ./
RUN pip wheel --no-cache-dir --wheel-dir /build/wheels -r requirements.txt

# Production stage
FROM python:3.12-slim

# Security updates
RUN apt-get update && \
    apt-get upgrade -y && \
    apt-get install -y --no-install-recommends dumb-init && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/*

WORKDIR /app

# Non-root user with specific UID/GID
RUN groupadd -r -g 1001 appgroup && \
    useradd -r -u 1001 -g appgroup appuser

# Install from wheels (no compilers in final image)
COPY --from=builder /build/wheels /wheels
COPY requirements.txt ./
RUN pip install --no-cache-dir --no-index --find-links=/wheels -r requirements.txt && \
    rm -rf /wheels

# Copy app with correct ownership
COPY --chown=appuser:appgroup src ./src

# Remove anything sensitive that may have snuck into context
RUN find /app -name '.env*' -delete && \
    find /app -name '*.pyc' -delete && \
    find /app -name '__pycache__' -type d -exec rm -rf {} + 2>/dev/null || true && \
    chmod -R 500 /app/src

# Required for read-only filesystem compatibility
RUN mkdir -p /tmp && chown appuser:appgroup /tmp

# Security / runtime env
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV PIP_NO_CACHE_DIR=1
ENV PIP_DISABLE_PIP_VERSION_CHECK=1

USER appuser

EXPOSE 8000

HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
  CMD python -c "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://localhost:8000/health',timeout=5).status==200 else 1)"

ENTRYPOINT ["dumb-init", "--"]
CMD ["gunicorn", "src.main:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000"]

Healthcheck Script: What It Actually Needs to Do

A healthcheck is a contract between your process and the container runtime (Docker, Kubernetes, ECS). It answers exactly one question: “Is this process able to respond right now?” It should not answer “Is the entire dependency graph healthy?” — that conflation is the single most common mistake in production healthchecks. If your liveness check pings the database, the cache, and three downstream APIs, then a 30-second blip in any one of them causes cascading container restarts across your fleet, which typically makes the outage worse, not better. The right healthcheck is small, fast, and local. It calls the process’s own /health endpoint, which returns 200 if the process’s event loop is responsive. Deep dependency checks belong in a separate /ready endpoint that Kubernetes uses to decide whether to route traffic — a distinction we’ll explore in the Kubernetes chapter. The healthcheck script itself should use the language’s stdlib HTTP client (no curl, no extra binaries) so the image stays small and the check has no extra dependencies to break.

Healthcheck Script

The healthcheck script is what the container runtime calls periodically to ask “are you alive?” A well-written healthcheck is minimal, fast, and does not require additional binaries (which would bloat the image). Using the language’s built-in HTTP client is the standard pattern — no curl, no wget. The healthcheck should call the app’s own /health endpoint, which in turn should check basic liveness (not deep dependencies — those belong in a separate readiness check). A common mistake is making the healthcheck too ambitious: checking the database, the cache, and three downstream services. When any one of those is briefly unavailable, your container is marked unhealthy and restarted, even though the app itself is fine. Liveness checks should verify “can this process respond to HTTP?” — nothing more. Dependency health belongs in readiness checks, which we’ll cover in the Kubernetes chapter.
// healthcheck.js
const http = require('http');

const options = {
  hostname: 'localhost',
  port: process.env.PORT || 3000,
  path: '/health',
  method: 'GET',
  timeout: 5000
};

const req = http.request(options, (res) => {
  process.exit(res.statusCode === 200 ? 0 : 1);
});

req.on('error', () => {
  process.exit(1);
});

req.on('timeout', () => {
  req.destroy();
  process.exit(1);
});

req.end();

Security Scanning

Automated vulnerability scanning in CI is non-negotiable for any serious microservices deployment. The attack surface of a modern Node.js or Python service is primarily the transitive dependency tree — hundreds of third-party packages, any of which could have a known CVE. Trivy, Snyk, and Grype are the common tools; they compare your image’s installed packages against public vulnerability databases and fail the build if any HIGH or CRITICAL vulnerabilities are found. Running this check weekly (via scheduled rebuilds) catches newly disclosed CVEs that weren’t known when you built the image originally. The pitfall with automated scanning is alert fatigue. If your policy is “fail on any vulnerability,” you’ll soon have a backlog of findings that nobody addresses. A pragmatic policy: fail on CRITICAL and HIGH in production code paths, warn on MEDIUM, ignore LOW. Pair this with a scheduled rebuild job that rebuilds images weekly against the latest base image, so you automatically pick up OS-level patches.
# .github/workflows/docker-security.yml
name: Docker Security Scan

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: 'myapp:${{ github.sha }}'
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Run Hadolint
        uses: hadolint/hadolint-action@v3.1.0
        with:
          dockerfile: Dockerfile

Programmatic Docker Management

Sometimes you need to manage containers from your own scripts or services — building images in CI runners, running integration tests against real databases in ephemeral containers, or managing developer sandbox environments. Both Node.js and Python have official SDKs that wrap the Docker Engine API. The pattern below shows a typical use case: programmatically building and running a container for integration testing. The key consideration: these scripts must handle cleanup carefully. Containers started by scripts that crash mid-execution will stick around forever, consuming disk space and port allocations. Always wrap container lifecycle in try/finally (or async with in Python) to guarantee cleanup even on failure.
// scripts/run-integration-test.js
const Docker = require('dockerode');
const docker = new Docker();

async function runIntegrationTest() {
  // Pull the image
  await new Promise((resolve, reject) => {
    docker.pull('postgres:15-alpine', (err, stream) => {
      if (err) return reject(err);
      docker.modem.followProgress(stream, (err) => err ? reject(err) : resolve());
    });
  });

  // Start a test database
  const container = await docker.createContainer({
    Image: 'postgres:15-alpine',
    Env: [
      'POSTGRES_PASSWORD=test',
      'POSTGRES_DB=testdb'
    ],
    HostConfig: {
      PortBindings: { '5432/tcp': [{ HostPort: '5433' }] },
      AutoRemove: true
    }
  });

  try {
    await container.start();
    console.log('Test DB started on port 5433');

    // Run your tests here against localhost:5433
    await waitForPostgres();
    // ... test code ...
  } finally {
    await container.stop();
  }
}

runIntegrationTest().catch(console.error);

Docker Commands Reference

# Build commands
docker build -t order-service:1.0.0 .
docker build -t order-service:1.0.0 --target production .
docker build --no-cache -t order-service:1.0.0 .

# Run commands
docker run -d --name order -p 3000:3000 order-service:1.0.0
docker run -d --read-only --cap-drop=ALL -p 3000:3000 order-service:1.0.0
docker run --rm -it order-service:1.0.0 /bin/sh

# Compose commands
docker-compose up -d
docker-compose up --build
docker-compose down
docker-compose down -v  # Remove volumes
docker-compose logs -f order-service
docker-compose exec order-service /bin/sh

# Debug commands
docker logs -f order-service
docker exec -it order-service /bin/sh
docker inspect order-service
docker stats

# Cleanup
docker system prune -a
docker volume prune
docker image prune -a

Interview Questions

Answer:Multi-stage build uses multiple FROM statements to create intermediate images.Benefits:
  • Smaller final images (only runtime dependencies)
  • Separate build and runtime environments
  • Don’t expose build tools in production
  • Better security (fewer attack vectors)
Example:
FROM node:20 AS builder
RUN npm ci && npm run build

FROM node:20-alpine AS production
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
Typical reduction: 500MB → 100MB
Answer:Key principles:
  1. Order instructions by change frequency (least → most)
  2. Copy dependency files before source code
  3. Combine RUN commands to reduce layers
Example:
# Good: package.json rarely changes
COPY package*.json ./
RUN npm ci

# Then source code (changes often)
COPY . .
Tips:
  • Use .dockerignore to exclude unnecessary files
  • Pin versions to ensure consistent builds
  • Use --no-cache only when needed
Answer:
  1. Don’t run as root
    RUN adduser -S app
    USER app
    
  2. Use minimal base images
    • Alpine, distroless, slim variants
  3. Scan for vulnerabilities
    • Trivy, Snyk, Clair
  4. Drop capabilities
    docker run --cap-drop=ALL
    
  5. Read-only filesystem
    docker run --read-only
    
  6. No secrets in images
    • Use environment variables or secrets managers
  7. Keep images updated
    • Regularly rebuild with patched base images

Summary

Key Takeaways

  • Multi-stage builds for optimized images
  • Layer caching for faster builds
  • Run as non-root user
  • docker-compose for local development
  • Scan images for vulnerabilities

Next Steps

In the next chapter, we’ll deploy to Kubernetes - the industry standard for container orchestration.

Interview Deep-Dive

Strong Answer:A 1.2GB image typically means using a full node:20 base image (which includes Debian, build tools, and development dependencies) and including node_modules with devDependencies. In microservices, image size matters for three concrete reasons: pull time during scaling events (pulling 1.2GB across 10 new pods takes minutes, not seconds), registry storage costs (20 services x 1.2GB x 50 builds/week adds up), and security attack surface (every unnecessary package is a potential CVE).Step one: multi-stage build. The build stage uses node:20 (full image with build tools) to run npm ci and compile TypeScript. The production stage uses node:20-alpine (50MB base) and copies only the production node_modules and compiled output. This alone typically drops the image from 1.2GB to 200-300MB.Step two: production-only dependencies. Use npm ci --only=production in the production stage. Development dependencies (TypeScript, Jest, ESLint) are huge and not needed at runtime.Step three: .dockerignore to exclude .git, test/, docs/, local .env files, and node_modules (they get reinstalled in the container anyway). I have seen images that included the entire git history because there was no .dockerignore.Step four: if the application does not need native bindings, consider node:20-alpine with --production flag throughout. Alpine-based images are 50MB versus 900MB for Debian-based. The caveat: some npm packages with native bindings (bcrypt, sharp) need additional build tools on Alpine (apk add --no-cache python3 make g++).The result is typically a 100-150MB image. For extreme optimization, I have seen teams use distroless images (Google’s gcr.io/distroless/nodejs20) that contain only the Node.js runtime and nothing else — no shell, no package manager, no utilities. This is the most secure option but makes debugging harder because you cannot exec into the container.Follow-up: “How do you handle the security scanning of container images in a CI/CD pipeline with 20 microservices?”Every image gets scanned as part of the CI pipeline before it can be pushed to the registry. I use Trivy (open source) or Snyk Container integrated into the GitHub Actions workflow. The pipeline fails if any CRITICAL or HIGH severity CVE is detected in the base image or dependencies. The key insight is pinning base image digests, not just tags: node:20-alpine@sha256:abc123 ensures reproducibility, while node:20-alpine can silently change when the upstream tag is updated.
Strong Answer:The 12-Factor App principle says configuration should come from the environment, not from code. The application reads configuration from environment variables, and the deployment platform provides the values.For local development, I use docker-compose with a .env file that mirrors the production variable names but with local values (DATABASE_URL=postgres://localhost:5432/mydb). The docker-compose.override.yml file adds development-specific settings (volume mounts for hot reload, debug ports) without modifying the base docker-compose.yml.For staging and production on Kubernetes, ConfigMaps hold non-sensitive config and Secrets hold sensitive values. Both are injected as environment variables into the pod spec. The same application binary reads the same environment variable names regardless of whether the value comes from .env, ConfigMap, or Vault.The common mistake I see is teams creating environment-specific Dockerfiles or building different images per environment. The image should be identical across all environments — only the configuration differs. Build once, deploy everywhere. This guarantees that the image you tested in staging is the exact same binary running in production.For complex configuration that does not fit in environment variables (feature flag rulesets, routing tables), I use a configuration service (Consul KV, AWS AppConfig) that the application polls at startup and watches for changes. The configuration service has its own environment hierarchy (dev -> staging -> production) with inheritance and overrides.Follow-up: “How do you handle secrets in the local docker-compose development environment without committing them to the repo?”I use a .env.example file in the repo with placeholder values, and the actual .env file is in .gitignore. New developers copy .env.example to .env and fill in their local values. For shared development secrets (a staging API key), I store them in a team password manager (1Password, Vault) and reference them in documentation. Some teams use docker-compose with Vault integration even locally, which is more secure but adds setup complexity. The pragmatic middle ground: use a tool like direnv that loads .env files automatically and warns if required variables are missing.
Strong Answer:The most common cause in Node.js: the default V8 heap limit. Node.js does not automatically limit its heap to the container’s memory limit. By default, V8 allocates up to 1.5GB of heap (on 64-bit systems), regardless of the container’s memory constraint. If the container has a 2GB memory limit, Node.js happily grows its heap to 1.5GB, and when you add the non-heap overhead (V8 internals, native buffers, connection pools), total RSS exceeds 2GB and Kubernetes OOM-kills the container.Locally, you run one instance with low traffic and the heap never grows beyond 512MB. In production, with hundreds of concurrent requests, each holding state (request objects, response buffers, database query results), the heap grows.The fix is to set --max-old-space-size to about 75% of the container memory limit. If the container has 2GB, set --max-old-space-size=1536 (1.5GB). This leaves 512MB for V8 internals, native buffers, and OS overhead. In the Dockerfile: CMD ["node", "--max-old-space-size=1536", "src/index.js"].If the memory issue persists after setting the heap limit, you have a memory leak. Common culprits in microservices: event listeners that are registered but never removed (adding a new listener on every request), growing in-memory caches without eviction policies, unreleased database connections in error paths, and closures that capture large objects and prevent garbage collection.To diagnose, I enable --expose-gc and take heap snapshots in production using v8.writeHeapSnapshot() triggered by an admin endpoint. Comparing two snapshots taken 10 minutes apart reveals which objects are growing. Chrome DevTools can analyze these snapshots to find the retention path.Follow-up: “How do you set memory limits in Kubernetes to prevent one misbehaving service from affecting others?”Every pod gets resource requests and limits. Requests are the guaranteed amount (used for scheduling), limits are the maximum (OOM-killed if exceeded). I set requests to the P90 memory usage from production metrics and limits to 2x the requests. For a service that typically uses 512MB, I would set requests.memory: 512Mi and limits.memory: 1Gi. The requests ensure Kubernetes schedules the pod on a node with enough free memory. The limits prevent a memory leak from consuming the entire node. I also set up alerts when memory usage exceeds 80% of the limit — that gives the team time to investigate before the OOM kill happens.