Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Docker Best Practices
Build secure, optimized, and production-ready Docker images and containers.1. Security Best Practices
Don’t Run as Root
By default, Docker containers run as root. This is a security risk because if an attacker escapes the container (through a kernel vulnerability, for instance), they land on the host with root privileges. Think of it like leaving the master key under the doormat — the lock on the door is irrelevant. Fix: Create a non-root user in your Dockerfile.Keep Images Minimal
Smaller images have a smaller attack surface — fewer binaries means fewer things an attacker can exploit. Think of it like packing for a trip: every item you leave behind is one less thing that can get lost or stolen.- Use Alpine (~7MB) or Distroless (~2MB) images instead of full Ubuntu (~77MB).
- Remove build tools (compilers, debuggers) after use via multi-stage builds. A Go compiler in your production image is a gift to any attacker who gets shell access.
- Avoid installing packages “just in case.” Every
apt-get installyou add is a potential CVE waiting to happen.
Scan for Vulnerabilities
Use tools like Docker Scout (replacing the olderdocker scan) or Trivy. Run these in CI/CD — not just locally — because new CVEs are published daily.
2. Optimization & Performance
Leverage Build Cache
Docker caches each layer and reuses it if nothing has changed. But once a layer is invalidated, every subsequent layer is rebuilt from scratch. This is like a row of dominoes — knock one over and everything after it falls. Order instructions from least changed to most changed:- Install OS dependencies (rarely changes).
- Copy dependency manifests (
package.json,go.mod). - Install language dependencies (changes when you add/remove packages).
- Copy source code (changes on every commit — put this last).
Use .dockerignore
Before a build starts, Docker sends the entire build context (your project directory) to the daemon. Without a .dockerignore, this can mean shipping hundreds of megabytes of irrelevant files — slowing builds and risking secret leakage.
node_modules— install fresh in the image for reproducibility; shipping host node_modules often causes platform-specific binary issues anyway.git— the full repo history can be tens of megabytes and is never needed at runtime.env,secrets.txt— never bake secrets into an image; anyone withdocker historycan extract them from layers
Multi-Stage Builds
Separate build environment from runtime environment. This is like cooking in a commercial kitchen (build stage) but serving from a clean plate (runtime stage) — the customer never sees the mess.3. Operational Best Practices
Healthchecks
A container can be “running” but completely broken — maybe the app hit an unrecoverable error but the process did not exit. Healthchecks let Docker (and orchestrators like Kubernetes) detect this and take action, like restarting the container.Logging
Log tostdout and stderr. Do not log to files inside the container. This is counterintuitive if you come from a traditional ops background where /var/log was sacred — but containers are ephemeral, and files inside them vanish when the container dies.
- Docker captures
stdoutautomatically viadocker logs. - Log drivers (Fluentd, Splunk, CloudWatch) can ship these logs to a central system without changing your application code.
- If you must write to files (legacy apps), use a sidecar pattern or volume-mount a log directory.
Graceful Shutdown
HandleSIGTERM signals in your application to shut down cleanly — close database connections, finish in-flight requests, flush caches. Without this, users experience dropped connections and you risk data corruption.
- Docker sends
SIGTERMfirst, waits 10 seconds (default--stop-timeout), then sendsSIGKILL. - The 10-second window is often too short for apps draining long-running requests. Override with
--stop-timeout=30or setstop_grace_periodin Compose. - Common mistake: If your Dockerfile uses shell form (
CMD node server.js), the shell process (PID 1) receives SIGTERM but does not forward it to your Node process. Use exec form (CMD ["node", "server.js"]) so your app is PID 1 and receives signals directly.
4. The “Golden Rules”
One Process Per Container
Immutable Infrastructure
Statelessness
Environment Config
Interview Deep-Dive
Your team just inherited a production Dockerfile that runs as root, has no healthcheck, and uses the latest tag. Walk me through how you would remediate this, and in what order.
Your team just inherited a production Dockerfile that runs as root, has no healthcheck, and uses the latest tag. Walk me through how you would remediate this, and in what order.
- The first priority is pinning the base image tag. Running
latestmeans any upstream change can break your build overnight without any code change on your side. I would check the current resolved digest withdocker inspect, find the exact version, and pin to that (e.g.,node:18.19.0-alpine3.19). This restores reproducibility immediately. - Second, I would tackle the root user issue because it is the highest security risk. I would add a dedicated non-root user with
addgroupandadduser, set file ownership during COPY with--chown, and switch with theUSERdirective. The gotcha here is that some legacy apps write to directories they no longer own after this change, so I would grep the codebase for file writes and ensure those paths are writable. - Third, I would add a HEALTHCHECK. The specific implementation depends on the runtime — for a Node.js app, I would use a lightweight HTTP check against a
/healthendpoint rather than installing curl in a minimal image. If the team uses Kubernetes, I would skip the Dockerfile HEALTHCHECK entirely and use liveness/readiness probes instead, since Kubernetes ignores the Dockerfile directive. - Throughout, I would run
docker scout cvesor Trivy after each change to confirm the image’s vulnerability profile is actually improving.
/tmp, log directories, or npm cache (/root/.npm). The fix depends on the situation: either chown the specific directory to the app user during build, or redirect writes to a directory you control (e.g., set npm_config_cache to a path under /app). The deeper lesson is that switching to non-root is not just one line — it requires auditing every filesystem write your application performs.Explain the difference between shell form and exec form for CMD and ENTRYPOINT, and describe a production incident that could result from using the wrong one.
Explain the difference between shell form and exec form for CMD and ENTRYPOINT, and describe a production incident that could result from using the wrong one.
- Shell form (
CMD node server.js) wraps the command in/bin/sh -c, meaning the shell process becomes PID 1 inside the container. Exec form (CMD ["node", "server.js"]) runs the binary directly as PID 1. - The critical difference is signal handling. When Docker stops a container, it sends SIGTERM to PID 1. If PID 1 is a shell, the shell receives SIGTERM but does not forward it to child processes by default. The Node.js process never gets the signal, never runs its graceful shutdown handler, and after the 10-second stop timeout, Docker sends SIGKILL — killing the process immediately.
- In production, this manifests as dropped in-flight HTTP requests during deployments, incomplete database transactions, or corrupted file writes. I saw this exact issue at a company where rolling deployments caused 502 errors on every deploy because containers took the full 10 seconds to die instead of shutting down gracefully in under a second.
- The fix is straightforward: always use exec form for CMD and ENTRYPOINT. If you need shell features (variable expansion, piping), use exec form with an explicit shell:
CMD ["sh", "-c", "exec node server.js"]— theexecreplaces the shell process with Node, so Node becomes PID 1 and receives signals directly.
--stop-timeout flag, and when would you increase it?The default is 10 seconds. For applications that drain long-running requests (e.g., WebSocket connections, batch jobs), 10 seconds is often too short. I would increase it to 30-60 seconds for APIs behind a load balancer that is already draining connections, and up to 300 seconds for background workers finishing in-progress tasks. The key insight is that the stop timeout should match your application’s drain time, not be an arbitrary number. In Kubernetes, this is controlled by terminationGracePeriodSeconds, which defaults to 30 seconds.A developer on your team argues that multi-stage builds are unnecessary overhead and they should just use one big Dockerfile. How do you respond?
A developer on your team argues that multi-stage builds are unnecessary overhead and they should just use one big Dockerfile. How do you respond?
- I would reframe the conversation around concrete numbers. A single-stage Go image with the full toolchain is roughly 800MB. A multi-stage build producing a distroless image is around 15MB. That is a 50x reduction. In a CI/CD pipeline pushing images on every commit, this translates to faster pushes, faster pulls across regions, and faster pod startup in Kubernetes. At scale (say, 50 microservices deployed 10 times a day), those savings compound into hours of pipeline time recovered per week.
- More importantly, the security argument is decisive. Every binary in your production image is a potential attack vector. A Go compiler, a package manager, source code, test files — none of these belong in production. If an attacker gets shell access (via a vulnerability in your app), the first thing they look for is tools to escalate. A distroless image has no shell, no package manager, nothing to work with.
- The overhead of multi-stage builds is essentially zero in terms of Dockerfile complexity — you add a second
FROMline and aCOPY --from=builderdirective. Build time is often faster because the runtime stage has fewer layers to push. - The one legitimate exception is rapid local development iteration where you want to
docker execinto the container for debugging. In that case, I would use a multi-stage build with adevtarget that includes debugging tools, and aprodtarget that strips them out. Same Dockerfile, different targets.
You discover that a secret (database password) was accidentally baked into a Docker image layer and pushed to your registry two weeks ago. What do you do?
You discover that a secret (database password) was accidentally baked into a Docker image layer and pushed to your registry two weeks ago. What do you do?
- First, rotate the credential immediately. The image has been pullable for two weeks, so you have to assume the secret is compromised regardless of whether anyone actually extracted it. This is non-negotiable and should happen within minutes.
- Second, audit access logs on the registry (ECR, GCR, Docker Hub) to determine who has pulled that image tag. This tells you the blast radius.
- Third, rebuild the image using BuildKit secret mounts (
RUN --mount=type=secret) so the secret is never written to any layer. Push the new clean image. - Fourth, delete the compromised image tags from the registry. However, this is not sufficient on its own — anyone who already pulled the image still has the secret locally, and Docker layer caches on CI runners may still contain it.
- Fifth, add a CI step that scans images for secrets before pushing (tools like
ggshieldortrufflehogcan do this). Also add a.dockerignoreentry for.envfiles and a pre-commit hook that blocks adding secrets to the build context. - The deeper systemic fix is to never pass secrets at build time. Production secrets should come from runtime injection (environment variables from a secrets manager like Vault, AWS Secrets Manager, or Kubernetes Secrets). Build-time secrets (like private npm tokens) should use
--mount=type=secret, which is ephemeral and never written to a layer.
.env file exists in the layer where it was COPY’d. The RUN rm .env creates a new layer that adds a “whiteout” marker hiding the file, but the original layer with the file contents still exists in the image. Anyone who runs docker history or extracts layers with docker save can recover the file. This is why multi-layer deletion is not a security measure — it is purely cosmetic.Congratulations! You’ve completed the Docker Crash Course. Next: RabbitMQ Crash Course →