> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Docker Best Practices

> Security, optimization, and production readiness

# Docker Best Practices

Build secure, optimized, and production-ready Docker images and containers.

***

## 1. Security Best Practices

### Don't Run as Root

By default, Docker containers run as root. This is a security risk because if an attacker escapes the container (through a kernel vulnerability, for instance), they land on the host with root privileges. Think of it like leaving the master key under the doormat -- the lock on the door is irrelevant.

**Fix**: Create a non-root user in your Dockerfile.

```dockerfile theme={null}
FROM node:18-alpine
WORKDIR /app

# Create a dedicated system group and user for the app.
# -S means "system" account -- no home directory, no login shell.
# This follows the principle of least privilege: the process
# only gets the permissions it actually needs.
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

# Set ownership DURING copy so files are readable by the non-root user.
# If you COPY first and chown later, you create an extra layer.
COPY --chown=appuser:appgroup . .

# From this point forward, every RUN, CMD, and ENTRYPOINT
# executes as appuser -- not root.
USER appuser
CMD ["node", "server.js"]
```

<Tip>
  **Production gotcha**: If your app needs to bind to a port below 1024 (like port 80), a non-root user cannot do that by default. The fix is to either use a higher port (like 3000 or 8080) and let your load balancer handle the mapping, or add the `NET_BIND_SERVICE` capability with `--cap-add=NET_BIND_SERVICE`.
</Tip>

### Keep Images Minimal

Smaller images have a smaller attack surface -- fewer binaries means fewer things an attacker can exploit. Think of it like packing for a trip: every item you leave behind is one less thing that can get lost or stolen.

* Use **Alpine** (\~7MB) or **Distroless** (\~2MB) images instead of full Ubuntu (\~77MB).
* Remove build tools (compilers, debuggers) after use via multi-stage builds. A Go compiler in your production image is a gift to any attacker who gets shell access.
* Avoid installing packages "just in case." Every `apt-get install` you add is a potential CVE waiting to happen.

### Scan for Vulnerabilities

Use tools like Docker Scout (replacing the older `docker scan`) or Trivy. Run these in CI/CD -- not just locally -- because new CVEs are published daily.

```bash theme={null}
# Docker Scout (built into Docker Desktop)
docker scout cves myapp:latest

# Trivy -- filter to only HIGH and CRITICAL to reduce noise
trivy image --severity HIGH,CRITICAL myapp:latest
```

<Tip>
  **Common mistake**: Teams scan images once during build but never re-scan running images. A base image that was clean last week might have a critical CVE today. Set up scheduled scans in your registry (ECR, GCR, and Harbor all support this).
</Tip>

***

## 2. Optimization & Performance

### Leverage Build Cache

Docker caches each layer and reuses it if nothing has changed. But once a layer is invalidated, **every subsequent layer is rebuilt from scratch**. This is like a row of dominoes -- knock one over and everything after it falls.

Order instructions from **least changed** to **most changed**:

1. Install OS dependencies (rarely changes).
2. Copy dependency manifests (`package.json`, `go.mod`).
3. Install language dependencies (changes when you add/remove packages).
4. Copy source code (changes on every commit -- put this last).

### Use `.dockerignore`

Before a build starts, Docker sends the entire build context (your project directory) to the daemon. Without a `.dockerignore`, this can mean shipping hundreds of megabytes of irrelevant files -- slowing builds and risking secret leakage.

* `node_modules` -- install fresh in the image for reproducibility; shipping host node\_modules often causes platform-specific binary issues anyway
* `.git` -- the full repo history can be tens of megabytes and is never needed at runtime
* `.env`, `secrets.txt` -- **never bake secrets into an image**; anyone with `docker history` can extract them from layers

### Multi-Stage Builds

Separate build environment from runtime environment. This is like cooking in a commercial kitchen (build stage) but serving from a clean plate (runtime stage) -- the customer never sees the mess.

```dockerfile theme={null}
# Build Stage -- has the full Go toolchain (~800MB)
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
# CGO_ENABLED=0 creates a statically linked binary
# so it can run on distroless/scratch with zero dependencies.
RUN CGO_ENABLED=0 go build -o myapp

# Runtime Stage -- only ~2MB, no shell, no package manager
FROM gcr.io/distroless/static-debian11
# Copy ONLY the compiled binary from the build stage.
# The Go compiler, source code, and dependencies stay behind.
COPY --from=builder /app/myapp /
CMD ["/myapp"]
```

The result: your production image drops from \~800MB to \~15MB, and you eliminate an entire class of vulnerabilities (no shell for attackers to exec into).

***

## 3. Operational Best Practices

### Healthchecks

A container can be "running" but completely broken -- maybe the app hit an unrecoverable error but the process did not exit. Healthchecks let Docker (and orchestrators like Kubernetes) detect this and take action, like restarting the container.

```dockerfile theme={null}
# Check every 30s, allow 3s for response, fail after 3 consecutive failures.
# curl -f returns a non-zero exit code on HTTP errors (4xx, 5xx).
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1
```

<Tip>
  **Production gotcha**: Using `curl` for healthchecks means `curl` must be installed in your image. For minimal images (Alpine, distroless), consider a tiny Go binary or use `wget -q --spider` on Alpine. In Kubernetes, prefer liveness/readiness probes over Dockerfile HEALTHCHECK since they offer more control.
</Tip>

### Logging

Log to `stdout` and `stderr`. Do not log to files inside the container. This is counterintuitive if you come from a traditional ops background where `/var/log` was sacred -- but containers are ephemeral, and files inside them vanish when the container dies.

* Docker captures `stdout` automatically via `docker logs`.
* Log drivers (Fluentd, Splunk, CloudWatch) can ship these logs to a central system without changing your application code.
* If you must write to files (legacy apps), use a sidecar pattern or volume-mount a log directory.

### Graceful Shutdown

Handle `SIGTERM` signals in your application to shut down cleanly -- close database connections, finish in-flight requests, flush caches. Without this, users experience dropped connections and you risk data corruption.

* Docker sends `SIGTERM` first, waits 10 seconds (default `--stop-timeout`), then sends `SIGKILL`.
* The 10-second window is often too short for apps draining long-running requests. Override with `--stop-timeout=30` or set `stop_grace_period` in Compose.
* **Common mistake**: If your Dockerfile uses shell form (`CMD node server.js`), the shell process (PID 1) receives SIGTERM but does not forward it to your Node process. Use exec form (`CMD ["node", "server.js"]`) so your app is PID 1 and receives signals directly.

***

## 4. The "Golden Rules"

<CardGroup cols={2}>
  <Card title="One Process Per Container" icon="1">
    Don't run a database and a web server in the same container. Use Compose.
  </Card>

  <Card title="Immutable Infrastructure" icon="lock">
    Never SSH into a container to patch it. Rebuild the image and redeploy.
  </Card>

  <Card title="Statelessness" icon="cloud">
    Containers should be ephemeral. Store state in Volumes or external DBs.
  </Card>

  <Card title="Environment Config" icon="gear">
    Use Environment Variables for config, not hardcoded files.
  </Card>
</CardGroup>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="Your team just inherited a production Dockerfile that runs as root, has no healthcheck, and uses the latest tag. Walk me through how you would remediate this, and in what order." icon="circle-question">
    **Strong Answer:**

    * The first priority is pinning the base image tag. Running `latest` means any upstream change can break your build overnight without any code change on your side. I would check the current resolved digest with `docker inspect`, find the exact version, and pin to that (e.g., `node:18.19.0-alpine3.19`). This restores reproducibility immediately.
    * Second, I would tackle the root user issue because it is the highest security risk. I would add a dedicated non-root user with `addgroup` and `adduser`, set file ownership during COPY with `--chown`, and switch with the `USER` directive. The gotcha here is that some legacy apps write to directories they no longer own after this change, so I would grep the codebase for file writes and ensure those paths are writable.
    * Third, I would add a HEALTHCHECK. The specific implementation depends on the runtime -- for a Node.js app, I would use a lightweight HTTP check against a `/health` endpoint rather than installing curl in a minimal image. If the team uses Kubernetes, I would skip the Dockerfile HEALTHCHECK entirely and use liveness/readiness probes instead, since Kubernetes ignores the Dockerfile directive.
    * Throughout, I would run `docker scout cves` or Trivy after each change to confirm the image's vulnerability profile is actually improving.

    **Follow-up: You added a non-root user, but now the container crashes on startup with "EACCES: permission denied." What is happening and how do you fix it?**

    The application is trying to write to a directory owned by root. Common culprits are `/tmp`, log directories, or npm cache (`/root/.npm`). The fix depends on the situation: either `chown` the specific directory to the app user during build, or redirect writes to a directory you control (e.g., set `npm_config_cache` to a path under `/app`). The deeper lesson is that switching to non-root is not just one line -- it requires auditing every filesystem write your application performs.
  </Accordion>

  <Accordion title="Explain the difference between shell form and exec form for CMD and ENTRYPOINT, and describe a production incident that could result from using the wrong one." icon="circle-question">
    **Strong Answer:**

    * Shell form (`CMD node server.js`) wraps the command in `/bin/sh -c`, meaning the shell process becomes PID 1 inside the container. Exec form (`CMD ["node", "server.js"]`) runs the binary directly as PID 1.
    * The critical difference is signal handling. When Docker stops a container, it sends SIGTERM to PID 1. If PID 1 is a shell, the shell receives SIGTERM but does not forward it to child processes by default. The Node.js process never gets the signal, never runs its graceful shutdown handler, and after the 10-second stop timeout, Docker sends SIGKILL -- killing the process immediately.
    * In production, this manifests as dropped in-flight HTTP requests during deployments, incomplete database transactions, or corrupted file writes. I saw this exact issue at a company where rolling deployments caused 502 errors on every deploy because containers took the full 10 seconds to die instead of shutting down gracefully in under a second.
    * The fix is straightforward: always use exec form for CMD and ENTRYPOINT. If you need shell features (variable expansion, piping), use exec form with an explicit shell: `CMD ["sh", "-c", "exec node server.js"]` -- the `exec` replaces the shell process with Node, so Node becomes PID 1 and receives signals directly.

    **Follow-up: How does this interact with the `--stop-timeout` flag, and when would you increase it?**

    The default is 10 seconds. For applications that drain long-running requests (e.g., WebSocket connections, batch jobs), 10 seconds is often too short. I would increase it to 30-60 seconds for APIs behind a load balancer that is already draining connections, and up to 300 seconds for background workers finishing in-progress tasks. The key insight is that the stop timeout should match your application's drain time, not be an arbitrary number. In Kubernetes, this is controlled by `terminationGracePeriodSeconds`, which defaults to 30 seconds.
  </Accordion>

  <Accordion title="A developer on your team argues that multi-stage builds are unnecessary overhead and they should just use one big Dockerfile. How do you respond?" icon="circle-question">
    **Strong Answer:**

    * I would reframe the conversation around concrete numbers. A single-stage Go image with the full toolchain is roughly 800MB. A multi-stage build producing a distroless image is around 15MB. That is a 50x reduction. In a CI/CD pipeline pushing images on every commit, this translates to faster pushes, faster pulls across regions, and faster pod startup in Kubernetes. At scale (say, 50 microservices deployed 10 times a day), those savings compound into hours of pipeline time recovered per week.
    * More importantly, the security argument is decisive. Every binary in your production image is a potential attack vector. A Go compiler, a package manager, source code, test files -- none of these belong in production. If an attacker gets shell access (via a vulnerability in your app), the first thing they look for is tools to escalate. A distroless image has no shell, no package manager, nothing to work with.
    * The overhead of multi-stage builds is essentially zero in terms of Dockerfile complexity -- you add a second `FROM` line and a `COPY --from=builder` directive. Build time is often faster because the runtime stage has fewer layers to push.
    * The one legitimate exception is rapid local development iteration where you want to `docker exec` into the container for debugging. In that case, I would use a multi-stage build with a `dev` target that includes debugging tools, and a `prod` target that strips them out. Same Dockerfile, different targets.

    **Follow-up: What is the trade-off of using a scratch base image versus distroless?**

    Scratch is literally empty -- 0 bytes, no libc, no CA certificates, nothing. Your binary must be fully statically linked. Distroless (\~2MB) includes CA certs, timezone data, and a minimal libc. In practice, the difference matters when your app makes HTTPS calls (needs CA certs) or handles timestamps (needs tzdata). Most teams find distroless hits the sweet spot of minimal size with just enough runtime support to avoid obscure failures.
  </Accordion>

  <Accordion title="You discover that a secret (database password) was accidentally baked into a Docker image layer and pushed to your registry two weeks ago. What do you do?" icon="circle-question">
    **Strong Answer:**

    * First, rotate the credential immediately. The image has been pullable for two weeks, so you have to assume the secret is compromised regardless of whether anyone actually extracted it. This is non-negotiable and should happen within minutes.
    * Second, audit access logs on the registry (ECR, GCR, Docker Hub) to determine who has pulled that image tag. This tells you the blast radius.
    * Third, rebuild the image using BuildKit secret mounts (`RUN --mount=type=secret`) so the secret is never written to any layer. Push the new clean image.
    * Fourth, delete the compromised image tags from the registry. However, this is not sufficient on its own -- anyone who already pulled the image still has the secret locally, and Docker layer caches on CI runners may still contain it.
    * Fifth, add a CI step that scans images for secrets before pushing (tools like `ggshield` or `trufflehog` can do this). Also add a `.dockerignore` entry for `.env` files and a pre-commit hook that blocks adding secrets to the build context.
    * The deeper systemic fix is to never pass secrets at build time. Production secrets should come from runtime injection (environment variables from a secrets manager like Vault, AWS Secrets Manager, or Kubernetes Secrets). Build-time secrets (like private npm tokens) should use `--mount=type=secret`, which is ephemeral and never written to a layer.

    **Follow-up: A teammate says "I deleted the file in a later layer with RUN rm .env, so it is fine." Why is this wrong?**

    Docker images are composed of immutable layers. The `.env` file exists in the layer where it was COPY'd. The `RUN rm .env` creates a new layer that adds a "whiteout" marker hiding the file, but the original layer with the file contents still exists in the image. Anyone who runs `docker history` or extracts layers with `docker save` can recover the file. This is why multi-layer deletion is not a security measure -- it is purely cosmetic.
  </Accordion>
</AccordionGroup>

***

**Congratulations!** You've completed the Docker Crash Course.

Next: [RabbitMQ Crash Course →](/courses/devops-tools/rabbitmq-overview)
