Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Containerization with Docker
Docker provides consistent, isolated environments for microservices. Learn to build optimized containers and orchestrate multiple services.- Write production-ready Dockerfiles
- Implement multi-stage builds
- Optimize image size and build time
- Use docker-compose for local development
- Apply container security best practices
Docker Fundamentals for Microservices
Before we dive into Dockerfiles, it’s worth slowing down to ask: why did the industry settle on containers as the unit of deployment for microservices? The answer is that microservices fundamentally change the operational problem. When you had one monolith, you had one runtime to standardize. When you have 50 services written in 4 languages with different library versions, you either standardize the host (which freezes everyone to a single Node version) or you standardize the unit of deployment. Containers let you standardize the latter while keeping each team free to choose their own language, framework, and dependency versions. The deeper tradeoff is isolation versus overhead. A virtual machine gives you perfect isolation but boots in minutes and consumes gigabytes of RAM. Running 50 VMs on a single host is impractical. Containers share the host kernel and provide namespace-level isolation — which is “good enough” for most workloads — and boot in milliseconds with megabytes of overhead. The catch: containers are not a security boundary in the same way VMs are. A kernel exploit in one container can potentially escape to the host. For untrusted code, you still want VMs or micro-VMs like Firecracker underneath. If you skip containerization in a microservices architecture, you’ll feel the pain quickly: environment drift between dev and prod, dependency conflicts between services on the same host, slow and unreliable deployments, and no clean way to scale individual services independently. Docker solves all of these at once, which is why it became the de facto standard.Production-Ready Dockerfile
Dockerfile Stages: Why the Order Matters
A Dockerfile is not a shell script — it’s a declarative build recipe that the Docker daemon compiles into an immutable image. Each instruction creates a discrete layer that is content-addressed by its hash, and that hash is the basis for every caching decision Docker makes. This means the ordering of your instructions is not a stylistic choice; it’s the single most consequential factor in build performance and image correctness. The stages we’ll build through — base image, system packages, dependency manifest, dependency install, source code, runtime configuration — are deliberately sequenced from “rarely changes” to “changes on every commit.” What goes wrong when you ignore this? In the best case, your CI pipeline takes 10x longer than it should because dependencies reinstall on every build. In the worst case, you ship stale dependencies because a cached layer higher in the chain masks an update you thought you made. Teams routinely waste hundreds of engineer-hours per year on slow builds that a 30-second Dockerfile reorder would have prevented. The tradeoff to be aware of: aggressive caching can occasionally mask bugs where a build works in CI but fails with--no-cache, so every production image should be rebuilt without cache at least weekly to catch latent issues.
Basic Node.js Dockerfile
A Dockerfile is a build recipe. Every instruction produces a new layer in the image, and those layers are cached independently. The ordering of instructions is not cosmetic — it controls caching behavior, which in turn controls your build time and your CI pipeline cost. The basic Dockerfile below demonstrates several non-obvious principles: copy the dependency manifest before the source code (so dependency installation is cached), setNODE_ENV=production so libraries skip development-only code paths, create a dedicated non-root user (so a compromised process cannot modify the filesystem or escalate privileges), and declare a HEALTHCHECK so the container runtime can detect when the app is alive but not responsive.
If you skip the non-root user step, your container runs as UID 0 inside the namespace — and in some container escape scenarios, that maps to root on the host. This is one of the most consequential security mistakes you can make. Fixing it is trivially cheap; not fixing it can be catastrophic. Similarly, if you omit the healthcheck, orchestrators cannot distinguish a frozen process from a healthy one, and you end up routing traffic to dead pods.
requirements.txt is your package.json equivalent, pip install is your npm ci, and you want PYTHONUNBUFFERED=1 so stdout flushes immediately (critical for containerized logging). Setting PYTHONDONTWRITEBYTECODE=1 prevents Python from writing .pyc files at runtime, which are useless in an ephemeral container and just bloat the writable layer.
Multi-Stage Build (Optimized)
The single biggest lever for reducing image size and improving security is multi-stage builds. The idea is deceptively simple: you define several independent build stages in one Dockerfile and copy only the artifacts you need from earlier stages into the final image. The build stage can contain compilers, test tools, source code, and devDependencies — none of which ship to production. The final stage contains only the runtime, the compiled output, and production dependencies. Why does this matter in a microservices context? Because you’re building dozens of images per day across many services. A 1GB image versus a 150MB image means 10x more pull time during autoscaling events, 10x more registry storage cost, and 10x more attack surface. The build tools you use at build time (TypeScript compiler, webpack, test frameworks) are often the source of CVEs — shipping them to production is gratuitous risk. Multi-stage builds also enforce a clean separation between “what you need to build the app” and “what you need to run it,” which is a discipline that pays dividends as services grow. One gotcha worth flagging:dumb-init (or tini) is installed as PID 1 inside the container. Without it, Node.js becomes PID 1, and it doesn’t handle Unix signals the way init systems do — meaning docker stop sends SIGTERM, Node ignores it, and after 10 seconds Docker escalates to SIGKILL. Your in-flight requests are cut mid-response. With dumb-init, signals are forwarded correctly and graceful shutdown works.
TypeScript Specific Dockerfile
TypeScript adds an extra wrinkle: you need the compiler (tsc), type definitions, and your source files at build time, but none of them belong in production. The pattern below explicitly separates “build with types” from “run compiled JS.” Running npm run type-check && npm run build ensures the image fails fast if there are type errors, catching problems in CI before they reach production. Forgetting the type-check step is a subtle mistake — tsc with --noEmitOnError still emits by default in many setups, meaning broken code can ship.
Python FastAPI Multi-Stage with Wheel Trick
For production FastAPI workloads, the “wheel trick” is a step beyond a plain virtualenv copy. Instead of installing packages directly into the virtualenv during build, you first build wheels (pre-compiled binary distributions) in the builder stage, then install them in the runtime stage. This separates compilation (which needs gcc, make, and headers) from installation (which only needs pip). The benefit: the runtime stage never sees a C compiler, and the wheel cache can be reused across multiple services that share dependencies. For a service with packages likepydantic-core (Rust-compiled), uvloop (C extension), or asyncpg (C extension), this shaves significant time and image size.
Production Python services typically run under gunicorn with uvicorn workers rather than uvicorn directly. gunicorn gives you battle-tested process management, graceful restarts, and worker lifecycle hooks; uvicorn provides the ASGI event loop that FastAPI needs. The combination — gunicorn as the supervisor, uvicorn.workers.UvicornWorker as the worker class — is the standard production pattern.
Docker Optimization
Layer Caching: The Invisible Performance Lever
Layer caching is one of those topics that feels academic until the first time you watch a junior engineer wait 12 minutes for a CI build that should take 90 seconds. The reason it matters in microservices specifically: you are not building one image — you are building dozens, across multiple pipelines, many times per day. A 10-minute cache-miss build multiplied by 30 services multiplied by 15 daily merges is a full-time engineer’s worth of wall-clock time waiting on CI every week. That’s real money, and it’s recovered by getting the layer order right. What goes wrong without proper caching? CI pipelines become slow enough that developers batch changes instead of pushing small commits (because “builds are slow, I’ll wait”). Feedback loops lengthen from minutes to hours. Developers lose the flow state of “commit, see test result, iterate.” The downstream cost is not just raw time — it’s degraded engineering culture. The tradeoff to know: aggressive caching can hide transitive dependency issues (a bumped lockfile that wasn’t actually installed because the layer was cached against stale content), so scheduled--no-cache rebuilds are a hygiene practice worth scripting.
Layer Caching Strategy
Docker layer caching is one of those topics that feels academic until your CI pipeline takes 12 minutes instead of 90 seconds — then it becomes urgent. Each instruction in a Dockerfile creates a layer, and Docker caches each layer based on the instruction plus the files it depends on. If nothing above a layer has changed, Docker reuses the cached layer instead of re-executing the instruction. The implication: put the instructions that change rarely (base image, system packages) at the top, and the instructions that change frequently (source code) at the bottom. The classic mistake is copying the entire project withCOPY . . before running npm ci. Every time any source file changes — even a comment in a README — the cache invalidates for the npm ci layer and you reinstall all dependencies from scratch. On a large project with 500 dependencies, that’s 2-3 minutes wasted on every single build. By splitting COPY package*.json ./ and RUN npm ci into their own earlier layers, dependency installation is cached until package.json itself changes. This single reordering often cuts CI time by 80%.
Reducing Image Size
Image size is not just an aesthetic concern. In Kubernetes, a larger image means slower pod startup (more time to pull from the registry), which means slower autoscaling response to traffic spikes. It also means more egress bandwidth cost from your registry and more storage cost. For organizations running hundreds of services, a 500MB reduction per image compounds into meaningful savings. The four levers are: minimal base images, cleanup in the same layer,.dockerignore, and multi-stage builds.
The “same layer” rule is subtle: each RUN command creates a layer, and deleting files in a later layer doesn’t remove them from the earlier layer. If you download a 200MB tarball in one RUN and delete it in the next RUN, the tarball still lives in your image, just hidden. That’s why you see those long && chains in production Dockerfiles — everything that creates and deletes transient files must happen within a single layer.
.dockerignore
The.dockerignore file is the most underappreciated tool in the Docker ecosystem. It works like .gitignore but for the Docker build context — the set of files Docker sends to the daemon when you run docker build. Without a .dockerignore, every build ships your entire .git directory, local node_modules, .env files (which may contain secrets), test fixtures, and documentation to the daemon. This bloats the build context, slows down builds, and risks baking secrets into images. The fix takes five minutes; the cost of skipping it can be much higher.
A particularly nasty scenario: a developer has .env in their local directory with production database credentials for debugging. They run docker build without a .dockerignore. The .env file is included in the context, and a later COPY . . baked it into the image. The image is pushed to a public registry. The credentials leak. This has happened to real companies.
Docker Compose for Microservices
Why Compose Exists and What It Saves You
Compose is the answer to a very specific question: how do you bring up an entire microservices topology on a developer laptop without losing your mind? In production you have Kubernetes doing this job, but Kubernetes on a laptop (minikube, kind, k3d) is heavy — it adds a layer of etcd, a scheduler, a control plane, and requires you to understand Pods, Deployments, Services, and Ingresses just to see logs. Compose strips all of that away and gives you “a YAML file that starts all the containers.” For 99% of local development, that’s exactly right. What goes wrong without Compose? Teams invent fragile shell scripts that rundocker run for each service, with hardcoded IP addresses, race conditions on startup order, and broken cleanup when one service crashes. New developers spend their first week fighting “the setup” instead of writing code. With Compose, docker-compose up brings everything online, docker-compose down tears it all down cleanly, and the YAML file lives in version control as executable documentation of how services fit together. The tradeoff: Compose is not production-grade. Don’t try to run it on a server “just for staging” — you lose scheduling, self-healing, rolling updates, and every other feature that makes orchestration valuable. Use Compose for dev; use Kubernetes for anything shared.
Docker Compose is the pragmatic choice for local development in a microservices architecture. The problem it solves: you have 8 services, 3 databases, a message broker, and a cache — starting them individually with correct network configuration, environment variables, and startup order is tedious and error-prone. Compose lets you declare the entire topology in one YAML file and spin it up with docker-compose up. It’s not production-grade (that’s Kubernetes’s job), but for local development and simple CI environments, it’s dramatically simpler than Kubernetes.
The key capabilities Compose provides: a shared network where services can reach each other by service name (no IP addresses to hardcode), volume management for persistent data, dependency ordering via depends_on, and healthchecks to ensure dependencies are actually ready (not just started). The depends_on with condition: service_healthy pattern is particularly important — without it, services start before their databases are ready and fail on the first query. With it, Compose waits for the database’s healthcheck to pass before starting the dependent service.
If you try to replicate this setup without Compose (running each service manually with docker run), you’ll spend more time fighting the tooling than writing code. If you try to use Kubernetes locally (via minikube or kind), you get production-like behavior but with much higher mental overhead. Compose hits the sweet spot for local development.
Complete Microservices Stack
Development Dockerfile with Hot Reload
Developer ergonomics matter. If every code change requires rebuilding the container (30-60 seconds), developers will disengage from containerized workflows and go back to running services directly on their laptops — which reintroduces all the “works on my machine” problems. The solution is a development target in your Dockerfile that mounts source code as a volume and runs with a file watcher (nodemon for Node, uvicorn --reload for Python). Changes to source files trigger automatic reloads without rebuilding the container.
The key insight is the target keyword in Compose. One Dockerfile, two targets: development (with file watching) and production (compiled, locked down). In CI/CD, you build the production target. In local development, Compose builds the development target. Same Dockerfile, same base image, same dependencies — only the final layer differs. This is leagues better than maintaining two separate Dockerfiles.
uvicorn --reload as the file watcher during development and gunicorn with uvicorn workers in production. Two important mechanics: in development we mount the source code as a volume and install all requirements (including dev tooling); in production we bake the code into the image and install only runtime dependencies. The --reload flag uses watchfiles under the hood to detect file changes and restart the ASGI server.
Docker Compose Override for Development
docker-compose.override.yml is automatically merged with docker-compose.yml when you run docker-compose up. This lets you keep the base file production-like and layer in development-only settings (volume mounts, debug ports, hot reload commands) without duplicating the entire configuration. The same pattern works for environment-specific overrides: docker-compose.staging.yml, docker-compose.production.yml. Keep the base file as the source of truth and use overrides for the deltas.
Container Security
Security Hardening: Why Defaults Are Dangerous
Docker’s defaults were designed for ease of onboarding, not production security. Out of the box, your container runs as UID 0, has a writable root filesystem, retains most Linux capabilities, and binds to any network it can reach. Every one of those defaults is a liability in a microservice that may be exposed to untrusted input. The threat model is not theoretical: real production breaches in the last few years have started with a code-execution vulnerability in a service (deserialization bug, SSRF, prototype pollution) and escalated because the container defaults gave the attacker far more than they needed. What goes wrong if you skip hardening? Best case, you pass a compliance audit with a stern finding and spend the next quarter doing it properly under pressure. Worst case, a single RCE in a dependency becomes a lateral movement vector into your node, your cloud metadata service, or your Kubernetes API. The fixes — non-root user, read-only filesystem, capability drops, resource limits — are cheap to add at build time and dramatically expensive to add after a breach. The tradeoff: hardened containers are slightly harder to debug live (you can’t write files, you can’t install diagnostic tools), which is why we pair them with observability, remote debuggers, and ephemeral debug pods rather than shell access. Container security is an area where defaults betray you. Out of the box, a container runs as root, can modify its own filesystem, has all Linux capabilities, and can read/write anywhere in the image. Every one of those defaults is wrong for a production microservice. The threat model is straightforward: if an attacker achieves remote code execution in your container (via SQL injection, deserialization bug, or dependency vulnerability), you want the blast radius to be as small as possible. Security hardening is about shrinking that blast radius. The hardening below combines several techniques: non-root user (so the process cannot modify OS files), read-only filesystem (so the attacker cannot drop a malicious binary), dropped capabilities (so the process cannot perform privileged operations like raw socket access), and careful file permissions (so even if the attacker runs asappuser, they cannot modify application code). Each technique individually is cheap; combined, they dramatically raise the cost of exploitation.
The tradeoff is that hardened containers are slightly harder to debug. You cannot exec in and write diagnostic files if the filesystem is read-only. The fix is to mount a writable tmpfs at /tmp for transient files, and use remote debugging tools (kubectl port-forward to a debug endpoint) rather than shell access for production troubleshooting.
Scenario: Your container keeps getting OOMKilled in production with 2 GB memory limit, but runs fine locally at 512 MB. How do you diagnose and fix?
Scenario: Your container keeps getting OOMKilled in production with 2 GB memory limit, but runs fine locally at 512 MB. How do you diagnose and fix?
- Confirm it is actually an OOMKill, not just a crash.
kubectl describe podshowsLast State: Terminated, Reason: OOMKilled, Exit Code: 137.dmesgon the node shows the kernel OOM-killer event. If the exit code is 137 with OOMKilled reason, you have confirmed the memory-limit hit. - Understand why staging differs from production. Staging typically sees low concurrent load. Production has 100-1000x more concurrent requests, each allocating request/response objects, DB result sets, and connection buffers. Memory grows linearly with concurrency — 512 MB fine at 10 RPS becomes 2+ GB at 1000 RPS.
- Runtime-specific root causes:
- Node.js: V8’s default heap limit is ~1.5 GB regardless of container memory. Plus off-heap: Buffers, native bindings, worker threads. Fix:
NODE_OPTIONS=--max-old-space-size=1536for a 2 GB container, so V8 leaves room for Buffers and OS overhead. - Python: Each uvicorn worker is a full process (~100-300 MB baseline).
--workers 8is 800 MB before your app allocates anything. Fix: reduce workers, or switch to threads/async if appropriate. - JVM: Before JDK 10, the JVM didn’t respect cgroup memory limits. On modern JDKs, verify with
-XX:+PrintFlagsFinal | grep MaxHeapSizeinside the container.
- Node.js: V8’s default heap limit is ~1.5 GB regardless of container memory. Plus off-heap: Buffers, native bindings, worker threads. Fix:
- Check for memory leaks. Take two heap snapshots 10 minutes apart at steady load. Node:
v8.writeHeapSnapshot()triggered by admin endpoint, analyzed in Chrome DevTools. Python:tracemallocormemray. Growth of retained objects between snapshots is the leak. - Common leak culprits. Event listeners never removed, global caches with no eviction, unreleased DB connections on error paths, closures capturing request objects. Fix each systematically.
- Set the correct limits. Set
requests.memoryto observed p90 usage (e.g., 512 Mi) andlimits.memoryto ~2x that (1 Gi). Alert oncontainer_memory_working_set_bytes / limit > 0.8so you know before an OOMKill happens.
--max-old-space-size to 75% of the container limit. Every Node container in production should have this set explicitly.Senior Follow-up Questions:- “How do you distinguish a memory leak from normal load-based growth?” A leak keeps growing even at constant load; load-based growth plateaus at steady state. Plot
container_memory_working_set_bytesover 24 hours with constant traffic — if it grows linearly, it is a leak. If it plateaus after warm-up, it is working-set size (fix with limits, not leak hunting). - “What is the difference between RSS, working set, and heap?” RSS is everything the process has in physical RAM (heap + stack + shared libraries + file mappings). Working set is RSS minus cold/reclaimable pages — this is what Kubernetes compares against
limits.memory. Heap is just the V8/Python/JVM managed allocator — a subset of RSS. OOMKiller uses working set, so heap-only profiling misses things like Buffers, native memory, and mmap’d files. - “How would you design a canary deployment to catch memory issues before full rollout?” Deploy to 5% of traffic, watch
container_memory_working_set_bytesslope over 30 minutes, auto-rollback if slope exceeds a threshold (e.g., 10 MB/min growth at steady load). Argo Rollouts or Flagger can do this declaratively with Prometheus queries as success criteria.
- “Just increase the memory limit to 4 GB.” Works until next quarter when it OOMs again at 4 GB. You have deferred the problem and doubled the infrastructure cost.
- “Restart the pod every hour via a liveness probe hack.” Masks the leak, causes request drops during restarts, and leaves the actual bug in place.
- Heroku blog: “Node.js memory limits in containers”
- Kubernetes docs: Resource management for Pods and Containers
- Python:
memrayby Bloomberg for production-grade memory profiling
Scenario: A security scan flags 40 HIGH/CRITICAL CVEs in your production Node.js base image, but rebuilding hasn't happened in 6 months. How do you handle this without breaking production?
Scenario: A security scan flags 40 HIGH/CRITICAL CVEs in your production Node.js base image, but rebuilding hasn't happened in 6 months. How do you handle this without breaking production?
- Triage which CVEs are actually exploitable in your context. Not every CVE in a base image is a real risk.
libxml2CVE in an image that never parses XML is a paper tiger. Use Grype or Trivy with--ignore-unfixedand review the remaining list; cross-reference each HIGH/CRITICAL against actual usage. - Rebuild first, upgrade second. 80% of CVEs disappear just by rebuilding against the current base image tag (since the upstream image has been patched).
docker pull node:20-alpineand rebuild — many findings evaporate without any deliberate upgrade. - Pin by digest after rebuild. Now that you have a fresh, patched image, pin the digest so the build is reproducible:
FROM node:20-alpine@sha256:.... Future builds will not silently pick up new changes. - Canary the new image. Deploy to 5-10% of traffic for 30-60 minutes, watch RED metrics (Rate, Errors, Duration) plus CPU/memory, then roll forward. Unexpected base-image regressions (e.g., a glibc change breaking
dns.lookup()) have happened — always canary. - Automate from here on. Weekly scheduled rebuild + scan + canary deploy. Dependabot / Renovate watches the base image and opens PRs when upstream updates. The goal: no image in production is more than 14 days behind its base.
- Policy at admission. OPA Gatekeeper / Kyverno rule: no image older than 30 days can be admitted to the cluster. Combined with signed images, this enforces hygiene structurally.
log4j (CVE-2021-44228) incident was a textbook case — images that had not been rebuilt in months were exposed for weeks longer than necessary because teams did not have an automated rebuild pipeline. Teams with weekly rebuilds and Dependabot on base images patched within 24 hours; teams without it took 2-4 weeks.Senior Follow-up Questions:- “What do you do when a CVE has no fix yet (0-day)?” Three layers: (1) WAF / rate-limit rules that block the exploit pattern, (2) network-level segmentation so the vulnerable service has minimal lateral reach, (3) temporary feature flag to disable the vulnerable code path if it is isolatable. Accept residual risk and monitor for indicators of compromise until a patch ships.
- “How do you prevent alert fatigue from CVE scans?” Scan policy: fail build only on CRITICAL and HIGH with
fix-available: true. Warn (do not fail) on MEDIUM. Ignore LOW. Pair with a monthly triage where an engineer reviews the warning backlog so things do not silently rot. - “What about CVEs in the Node.js or Python packages themselves, not just OS packages?” Dependabot / Renovate / Snyk for
package.jsonandrequirements.txt/pyproject.toml. Prefernpm audit --production/pip-auditin CI. Lock the lockfile and usenpm ci/pip install -r requirements.txt --require-hashesto make the build reproducible and tamper-evident.
- “We’ll wait until the next release cycle to patch.” Not acceptable for HIGH/CRITICAL — you are leaving a known exploitable window open. Emergency patches get fast-tracked outside the regular cycle.
- “Ignore CVEs in base image and patch only app dependencies.” Base image CVEs are real —
glibc,openssl,zlibbugs affect everyone regardless of app code.
- Trivy: Aqua Security Trivy docs
- NIST: CVSS v3.1 scoring — understand what HIGH vs. CRITICAL actually means
- Chainguard: “Why minimal container images”
Scenario: You inherited a fleet of 30 services where every container runs as root and no pod has resource limits. How do you migrate to hardened containers without causing incidents?
Scenario: You inherited a fleet of 30 services where every container runs as root and no pod has resource limits. How do you migrate to hardened containers without causing incidents?
- Measure before changing. Collect 2 weeks of
container_memory_working_set_bytesandcontainer_cpu_usage_seconds_totalper service. This gives you p90/p99 usage to set reasonable initialrequestsandlimitsfrom data, not guesses. - Attack in waves by risk, not alphabetically. Start with one low-traffic, non-critical internal service as the pilot. Get the hardening pattern right there, document it, then roll out in waves of 3-5 services. Never do 30 services at once — one subtle issue (e.g.,
readOnlyRootFilesystembreaking a library that writes to/etc/hosts) will take down everything. - Non-root first, by itself. One PR per service that only changes: add
USER 1001, addrunAsNonRoot: trueandrunAsUser: 1001in pod spec. Deploy. Watch for 48 hours. Some services will break because they bind to port 80 (requires root) — switch to port 8080 + Kubernetes Service mapping 80→8080. Some will break because they write to/var/log— switch to stdout. - Resource limits next. Set
requeststo p90 observed + 20% headroom,limitsto 1.5x requests. Deploy canary, watch for OOMKills and CPU throttling (container_cpu_cfs_throttled_periods_total> 0 means your CPU limit is too tight). Tune based on data. - Read-only filesystem third. This is the most likely to break something. Identify writable paths first:
strace -e openatin a test pod to see what the app tries to write. MountemptyDirat those paths (/tmp,/var/cache, language-specific dirs). Then enablereadOnlyRootFilesystem: true. - Capability drop last. Add
capabilities.drop: ["ALL"], then add back only what is needed (typically nothing; a load balancer may needNET_BIND_SERVICEfor sub-1024 ports but you already moved off those). - Enforce structurally. After all services are migrated, enable the Kubernetes
PodSecurityadmissionrestrictedprofile at the namespace level. Now any new workload is required to be hardened from day one — no regression possible.
restricted pod security across thousands of services. Their key insight: per-service migration with metrics-based canaries took quarters, not weeks, but the gradual approach had zero incidents while the “big bang” approach at another org (they referenced anonymously) caused multi-hour outages.Senior Follow-up Questions:- “How do you handle services that genuinely need privileged operations, like a logging sidecar?” Narrow-scoped capabilities.
CAP_DAC_READ_SEARCHfor a log tailer that reads files owned by other users, notprivileged: true. Document the capability and why, so future reviewers understand. For truly privileged workloads (node-problem-detector, storage drivers), they get their own namespace with relaxed policy — not the application namespace. - “What about legacy services that cannot be changed (vendor container, no source)?” Run them in a dedicated namespace with relaxed
PodSecurity. Isolate via NetworkPolicy. Put a service mesh gateway in front. If the vendor’s container is a persistent security debt, that becomes a procurement signal to replace the vendor. - “How do you validate the migration actually improved security, not just moved things around?” Run a before/after pen-test or automated tool like
kube-benchandkube-hunter. Number of findings should drop dramatically. Also track: % of pods running as non-root (target: 100% in app namespaces), % with resource limits (target: 100%), % with read-only root FS (target: >90%).
- “Roll out hardening via a single admission policy change.” Breaks everything that was not migrated, incidents everywhere.
- “Only harden new services going forward.” The 30 existing services remain vulnerable indefinitely. Security debt is real debt with interest.
- Kubernetes: Pod Security Standards and Pod Security Admission
- CIS Kubernetes Benchmark — authoritative hardening checklist
- Shopify Engineering: Building resilient GraphQL APIs — various posts on their reliability/security journey
Security Best Practices
__pycache__ directories don’t cause issues with a read-only filesystem by setting PYTHONDONTWRITEBYTECODE=1 — otherwise Python tries to write .pyc files at import time and crashes on the read-only mount.
Healthcheck Script: What It Actually Needs to Do
A healthcheck is a contract between your process and the container runtime (Docker, Kubernetes, ECS). It answers exactly one question: “Is this process able to respond right now?” It should not answer “Is the entire dependency graph healthy?” — that conflation is the single most common mistake in production healthchecks. If your liveness check pings the database, the cache, and three downstream APIs, then a 30-second blip in any one of them causes cascading container restarts across your fleet, which typically makes the outage worse, not better. The right healthcheck is small, fast, and local. It calls the process’s own/health endpoint, which returns 200 if the process’s event loop is responsive. Deep dependency checks belong in a separate /ready endpoint that Kubernetes uses to decide whether to route traffic — a distinction we’ll explore in the Kubernetes chapter. The healthcheck script itself should use the language’s stdlib HTTP client (no curl, no extra binaries) so the image stays small and the check has no extra dependencies to break.
Healthcheck Script
The healthcheck script is what the container runtime calls periodically to ask “are you alive?” A well-written healthcheck is minimal, fast, and does not require additional binaries (which would bloat the image). Using the language’s built-in HTTP client is the standard pattern — nocurl, no wget. The healthcheck should call the app’s own /health endpoint, which in turn should check basic liveness (not deep dependencies — those belong in a separate readiness check).
A common mistake is making the healthcheck too ambitious: checking the database, the cache, and three downstream services. When any one of those is briefly unavailable, your container is marked unhealthy and restarted, even though the app itself is fine. Liveness checks should verify “can this process respond to HTTP?” — nothing more. Dependency health belongs in readiness checks, which we’ll cover in the Kubernetes chapter.
- Node.js
- Python
Security Scanning
Automated vulnerability scanning in CI is non-negotiable for any serious microservices deployment. The attack surface of a modern Node.js or Python service is primarily the transitive dependency tree — hundreds of third-party packages, any of which could have a known CVE. Trivy, Snyk, and Grype are the common tools; they compare your image’s installed packages against public vulnerability databases and fail the build if any HIGH or CRITICAL vulnerabilities are found. Running this check weekly (via scheduled rebuilds) catches newly disclosed CVEs that weren’t known when you built the image originally. The pitfall with automated scanning is alert fatigue. If your policy is “fail on any vulnerability,” you’ll soon have a backlog of findings that nobody addresses. A pragmatic policy: fail on CRITICAL and HIGH in production code paths, warn on MEDIUM, ignore LOW. Pair this with a scheduled rebuild job that rebuilds images weekly against the latest base image, so you automatically pick up OS-level patches.Programmatic Docker Management
Sometimes you need to manage containers from your own scripts or services — building images in CI runners, running integration tests against real databases in ephemeral containers, or managing developer sandbox environments. Both Node.js and Python have official SDKs that wrap the Docker Engine API. The pattern below shows a typical use case: programmatically building and running a container for integration testing. The key consideration: these scripts must handle cleanup carefully. Containers started by scripts that crash mid-execution will stick around forever, consuming disk space and port allocations. Always wrap container lifecycle intry/finally (or async with in Python) to guarantee cleanup even on failure.
- Node.js
- Python
Docker Commands Reference
Interview Questions
Q1: What is multi-stage build and why use it?
Q1: What is multi-stage build and why use it?
FROM statements to create intermediate images.Benefits:- Smaller final images (only runtime dependencies)
- Separate build and runtime environments
- Don’t expose build tools in production
- Better security (fewer attack vectors)
Q2: How do you optimize Docker layer caching?
Q2: How do you optimize Docker layer caching?
- Order instructions by change frequency (least → most)
- Copy dependency files before source code
- Combine RUN commands to reduce layers
- Use
.dockerignoreto exclude unnecessary files - Pin versions to ensure consistent builds
- Use
--no-cacheonly when needed
Q3: What are Docker security best practices?
Q3: What are Docker security best practices?
-
Don’t run as root
-
Use minimal base images
- Alpine, distroless, slim variants
-
Scan for vulnerabilities
- Trivy, Snyk, Clair
-
Drop capabilities
-
Read-only filesystem
-
No secrets in images
- Use environment variables or secrets managers
-
Keep images updated
- Regularly rebuild with patched base images
Summary
Key Takeaways
- Multi-stage builds for optimized images
- Layer caching for faster builds
- Run as non-root user
- docker-compose for local development
- Scan images for vulnerabilities
Next Steps
Interview Deep-Dive
'Your Docker image for a Node.js microservice is 1.2GB. Walk me through how you would reduce it and why size matters in a microservices context.'
'Your Docker image for a Node.js microservice is 1.2GB. Walk me through how you would reduce it and why size matters in a microservices context.'
node:20 base image (which includes Debian, build tools, and development dependencies) and including node_modules with devDependencies. In microservices, image size matters for three concrete reasons: pull time during scaling events (pulling 1.2GB across 10 new pods takes minutes, not seconds), registry storage costs (20 services x 1.2GB x 50 builds/week adds up), and security attack surface (every unnecessary package is a potential CVE).Step one: multi-stage build. The build stage uses node:20 (full image with build tools) to run npm ci and compile TypeScript. The production stage uses node:20-alpine (50MB base) and copies only the production node_modules and compiled output. This alone typically drops the image from 1.2GB to 200-300MB.Step two: production-only dependencies. Use npm ci --only=production in the production stage. Development dependencies (TypeScript, Jest, ESLint) are huge and not needed at runtime.Step three: .dockerignore to exclude .git, test/, docs/, local .env files, and node_modules (they get reinstalled in the container anyway). I have seen images that included the entire git history because there was no .dockerignore.Step four: if the application does not need native bindings, consider node:20-alpine with --production flag throughout. Alpine-based images are 50MB versus 900MB for Debian-based. The caveat: some npm packages with native bindings (bcrypt, sharp) need additional build tools on Alpine (apk add --no-cache python3 make g++).The result is typically a 100-150MB image. For extreme optimization, I have seen teams use distroless images (Google’s gcr.io/distroless/nodejs20) that contain only the Node.js runtime and nothing else — no shell, no package manager, no utilities. This is the most secure option but makes debugging harder because you cannot exec into the container.Follow-up: “How do you handle the security scanning of container images in a CI/CD pipeline with 20 microservices?”Every image gets scanned as part of the CI pipeline before it can be pushed to the registry. I use Trivy (open source) or Snyk Container integrated into the GitHub Actions workflow. The pipeline fails if any CRITICAL or HIGH severity CVE is detected in the base image or dependencies. The key insight is pinning base image digests, not just tags: node:20-alpine@sha256:abc123 ensures reproducibility, while node:20-alpine can silently change when the upstream tag is updated.'How do you handle configuration differences between local development (docker-compose), staging, and production (Kubernetes) without maintaining three separate sets of config?'
'How do you handle configuration differences between local development (docker-compose), staging, and production (Kubernetes) without maintaining three separate sets of config?'
.env file that mirrors the production variable names but with local values (DATABASE_URL=postgres://localhost:5432/mydb). The docker-compose.override.yml file adds development-specific settings (volume mounts for hot reload, debug ports) without modifying the base docker-compose.yml.For staging and production on Kubernetes, ConfigMaps hold non-sensitive config and Secrets hold sensitive values. Both are injected as environment variables into the pod spec. The same application binary reads the same environment variable names regardless of whether the value comes from .env, ConfigMap, or Vault.The common mistake I see is teams creating environment-specific Dockerfiles or building different images per environment. The image should be identical across all environments — only the configuration differs. Build once, deploy everywhere. This guarantees that the image you tested in staging is the exact same binary running in production.For complex configuration that does not fit in environment variables (feature flag rulesets, routing tables), I use a configuration service (Consul KV, AWS AppConfig) that the application polls at startup and watches for changes. The configuration service has its own environment hierarchy (dev -> staging -> production) with inheritance and overrides.Follow-up: “How do you handle secrets in the local docker-compose development environment without committing them to the repo?”I use a .env.example file in the repo with placeholder values, and the actual .env file is in .gitignore. New developers copy .env.example to .env and fill in their local values. For shared development secrets (a staging API key), I store them in a team password manager (1Password, Vault) and reference them in documentation. Some teams use docker-compose with Vault integration even locally, which is more secure but adds setup complexity. The pragmatic middle ground: use a tool like direnv that loads .env files automatically and warns if required variables are missing.'A container is using 2GB of memory and getting OOM-killed in production. The same service runs fine locally with 512MB. What is going on?'
'A container is using 2GB of memory and getting OOM-killed in production. The same service runs fine locally with 512MB. What is going on?'
--max-old-space-size to about 75% of the container memory limit. If the container has 2GB, set --max-old-space-size=1536 (1.5GB). This leaves 512MB for V8 internals, native buffers, and OS overhead. In the Dockerfile: CMD ["node", "--max-old-space-size=1536", "src/index.js"].If the memory issue persists after setting the heap limit, you have a memory leak. Common culprits in microservices: event listeners that are registered but never removed (adding a new listener on every request), growing in-memory caches without eviction policies, unreleased database connections in error paths, and closures that capture large objects and prevent garbage collection.To diagnose, I enable --expose-gc and take heap snapshots in production using v8.writeHeapSnapshot() triggered by an admin endpoint. Comparing two snapshots taken 10 minutes apart reveals which objects are growing. Chrome DevTools can analyze these snapshots to find the retention path.Follow-up: “How do you set memory limits in Kubernetes to prevent one misbehaving service from affecting others?”Every pod gets resource requests and limits. Requests are the guaranteed amount (used for scheduling), limits are the maximum (OOM-killed if exceeded). I set requests to the P90 memory usage from production metrics and limits to 2x the requests. For a service that typically uses 512MB, I would set requests.memory: 512Mi and limits.memory: 1Gi. The requests ensure Kubernetes schedules the pod on a node with enough free memory. The limits prevent a memory leak from consuming the entire node. I also set up alerts when memory usage exceeds 80% of the limit — that gives the team time to investigate before the OOM kill happens.