Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Building Docker Images

Learn to create efficient, secure, and production-ready Docker images using Dockerfiles.

The Dockerfile

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image.

Basic Structure

# 1. Base Image -- Alpine variant is ~50MB vs ~350MB for full node image.
#    Always pin a specific version; "node:latest" can break your build overnight.
FROM node:18-alpine

# 2. Working Directory -- all subsequent commands run from here.
#    Created automatically if it does not exist.
WORKDIR /app

# 3. Copy Dependencies FIRST -- this is a caching strategy.
#    package*.json changes rarely, so this layer stays cached across builds.
COPY package*.json ./

# 4. Install Dependencies -- npm ci is faster and more deterministic than npm install.
#    --only=production skips devDependencies (test frameworks, linters) to keep the image lean.
RUN npm ci --only=production

# 5. Copy Source Code LAST -- this layer is invalidated on every code change,
#    but since dependencies are cached above, rebuilds are fast.
COPY . .

# 6. Expose Port -- documentation only! This does NOT publish the port.
#    You still need -p 3000:3000 at runtime. Think of it as a hint for operators.
EXPOSE 3000

# 7. Define User (Security) -- "node" user is built into the node base image.
#    Running as root in production is a security risk.
USER node

# 8. Startup Command -- exec form (JSON array) is preferred over shell form.
#    Exec form runs the process directly as PID 1, so it receives SIGTERM for graceful shutdown.
CMD ["node", "server.js"]

Key Instructions

InstructionDescriptionExample
FROMBase image to start fromFROM ubuntu:22.04
WORKDIRSets working directoryWORKDIR /app
COPYCopies files from host to imageCOPY . .
RUNExecutes command during buildRUN apt-get update
ENVSets environment variablesENV NODE_ENV=production
EXPOSEDocuments listening portsEXPOSE 80
CMDDefault command to runCMD ["npm", "start"]
ENTRYPOINTMain executableENTRYPOINT ["python"]

Image Layers & Caching

Docker images are built from layers. Each Dockerfile instruction creates a new layer, and Docker caches every layer. When it encounters an instruction that has not changed (and all preceding layers are also cached), it reuses the cached version instead of re-executing the instruction. Order matters! Once a layer is invalidated, every layer after it must be rebuilt. Think of it like a stack of pancakes — you cannot swap one from the middle without removing everything above it.
# BAD: Re-installs ALL dependencies every time ANY source file changes.
# Changing a comment in app.js triggers a full npm install (~30-90 seconds).
COPY . .
RUN npm install

# GOOD: Copy dependency manifest first, install, then copy source code.
# Now npm install is only re-run when package.json actually changes.
# Day-to-day code changes rebuild in seconds, not minutes.
COPY package*.json ./
RUN npm install
COPY . .

Multi-Stage Builds

Drastically reduce image size by separating build tools from runtime artifacts.

Example: Go Application

# Stage 1: Builder -- full Go toolchain for compilation
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
# Download dependencies first (cached unless go.mod/go.sum change)
RUN go mod download
COPY . .
# CGO_ENABLED=0 produces a statically linked binary -- no libc dependency,
# so it can run on scratch or distroless without any shared libraries.
RUN CGO_ENABLED=0 go build -o myapp main.go

# Stage 2: Runtime -- only the compiled binary, nothing else
FROM alpine:latest
WORKDIR /root/
# Copy ONLY the binary from the builder stage.
# The Go compiler, source code, test files, and build cache stay behind.
COPY --from=builder /app/myapp .
CMD ["./myapp"]
Result:
  • Builder image: ~800MB (contains Go compiler, source code, all modules)
  • Runtime image: ~15MB (contains only the binary and minimal Alpine OS)
Going further: Replace alpine:latest with gcr.io/distroless/static or even scratch for the smallest possible image (~2MB). The trade-off is no shell for debugging — you cannot docker exec into the container. In production this is often acceptable (and actually desirable from a security standpoint).

Building & Tagging

# Build with default tag (latest)
docker build -t myapp .

# Build with specific tag
docker build -t myapp:1.0 .

# Build with multiple tags
docker build -t myapp:1.0 -t myapp:latest .

# Build from specific file
docker build -f Dockerfile.prod -t myapp:prod .

# Build without cache (if needed)
docker build --no-cache -t myapp:clean .

Managing Images

# List images
docker images

# Remove image
docker rmi myapp:1.0

# Remove dangling images (untagged, <none>)
docker image prune

# Save image to tarball
docker save -o myapp.tar myapp:1.0

# Load image from tarball
docker load -i myapp.tar

Best Practices

Start with alpine based images (e.g., node:alpine, python:alpine) to keep images small and secure.
Create a non-root user and switch to it with USER instruction.
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
Exclude files like node_modules, .git, and secrets from the build context.
# .dockerignore
node_modules
.git
.env
Dockerfile

Advanced Dockerfile Techniques

BuildKit Features

Enable BuildKit for modern features:
DOCKER_BUILDKIT=1 docker build -t myapp .

Cache Mounts (Speed Up Builds)

Cache mounts persist data between builds without baking it into the image layer. This is like having a shared tool shed between builds — packages downloaded once are reused on the next build, but the cache never ships in the final image.
# Node.js -- mount npm's cache directory so previously downloaded packages
# are reused across builds. Cuts install time by 50-80% on repeated builds.
RUN --mount=type=cache,target=/root/.npm \
    npm ci --only=production

# Python -- pip downloads wheels to ~/.cache/pip. Mounting this directory
# means "pip install" skips the download for packages already cached.
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Go -- module cache lives at /go/pkg/mod. Without this mount, every build
# re-downloads all dependencies even if go.mod has not changed.
RUN --mount=type=cache,target=/go/pkg/mod \
    go build -o app .

Secret Mounts (Don’t Bake Secrets!)

Access secrets during build without storing them in any image layer. This is critical because anyone with docker history or access to your registry can inspect layer contents. A secret mount is like a temporary sticky note that disappears after the build step.
# syntax=docker/dockerfile:1.4
# The .npmrc file (containing your private registry token) is mounted
# only during this RUN step. It is NOT written into the image layer.
RUN --mount=type=secret,id=npmrc,target=/root/.npmrc \
    npm ci --only=production
# Pass the secret at build time -- it is never stored in the image
docker build --secret id=npmrc,src=.npmrc .
Common mistake: Using COPY .npmrc . or ENV NPM_TOKEN=xxx instead of secret mounts. Both approaches bake the secret into the image history permanently, even if you delete the file in a later layer. Once pushed to a registry, anyone who can pull the image can extract the secret.

SSH Mounts (Clone Private Repos)

RUN --mount=type=ssh \
    git clone git@github.com:private/repo.git
docker build --ssh default .

Image Security Scanning

Docker Scout

# Analyze local image
docker scout cves myapp:latest

# View recommendations
docker scout recommendations myapp:latest

Trivy

# Install trivy
brew install trivy  # macOS

# Scan image
trivy image myapp:latest

# Scan with severity filter
trivy image --severity HIGH,CRITICAL myapp:latest

Best Practices for Secure Images

# 1. Use specific versions
FROM node:18.19.0-alpine3.19

# 2. Run as non-root
RUN addgroup -S app && adduser -S app -G app
USER app

# 3. Minimize attack surface - use distroless
FROM gcr.io/distroless/nodejs18-debian12
COPY --from=builder /app /app
CMD ["server.js"]

# 4. Don't install unnecessary packages
RUN apk add --no-cache curl  # --no-cache reduces size

# 5. Use COPY instead of ADD (ADD has extra features you rarely need)
COPY package*.json ./

Distroless Images

Minimal images containing only your app and runtime dependencies. No shell, no package manager.
# Build stage
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -o server

# Runtime stage - distroless
FROM gcr.io/distroless/static-debian12
COPY --from=builder /app/server /server
CMD ["/server"]
Base ImageSizeAttack Surface
ubuntu:22.04~77MBHigh
alpine:3.19~7MBMedium
distroless/static~2MBMinimal
scratch0MBNone (just your binary)

Image Optimization Checklist

Combine RUN commands to reduce layers:
# Bad: 3 layers
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

# Good: 1 layer
RUN apt-get update && \
    apt-get install -y curl && \
    rm -rf /var/lib/apt/lists/*
Put least-changing commands first:
# Package files change less often than code
COPY package*.json ./
RUN npm ci

# Code changes frequently - cache invalidates here
COPY . .
Keep build tools out of final image:
FROM node:18 AS builder
RUN npm ci && npm run build

FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html

Interview Questions & Answers

Multi-stage builds use multiple FROM statements:
  • Build stage: Has compilers, dev dependencies
  • Runtime stage: Has only the built artifact
Benefits:
  • Smaller final image (MBs vs GBs)
  • Fewer vulnerabilities (no build tools)
  • Single Dockerfile for build and runtime
Each instruction creates a layer. Docker caches layers and reuses them if:
  • The instruction hasn’t changed
  • All previous layers are cached
Cache busting: If a layer changes, all subsequent layers are rebuilt.Optimize by:
  • Putting changing content (COPY .) last
  • Copying dependency files separately before code
FeatureCOPYADD
Copy local files
Auto-extract tar
Download URLs
Preferred
Best Practice: Always use COPY unless you need tar extraction.
  1. Use Alpine/distroless base images
  2. Multi-stage builds to exclude build tools
  3. Combine RUN commands to reduce layers
  4. Clean up in the same layer (rm -rf /var/cache/*)
  5. Use .dockerignore to exclude unnecessary files
  6. Don’t install debugging tools in production
An image with no tag (shows as <none>:<none>).Causes:
  • Rebuilding with same tag (old image becomes dangling)
  • Intermediate build stages
Clean up:
docker image prune        # Remove dangling only
docker image prune -a     # Remove all unused
Never do this:
ENV API_KEY=secret123  # Stored in image history!
COPY .env .            # Baked into layer!
Do this instead:
  • Use BuildKit secret mounts
  • Pass at runtime: docker run -e API_KEY=secret
  • Use Docker secrets (Swarm) or Kubernetes secrets

Common Pitfalls

1. COPY . . Before Dependencies: Invalidates cache on every code change. Copy package.json first.2. Not Cleaning Up in Same Layer: RUN apt-get install && rm cache saves space; separate RUN commands don’t.3. Secrets in Build Args: Build args are visible in image history. Use secret mounts.4. Using latest Base Image: Builds aren’t reproducible. Pin specific versions.5. Large Build Context: .dockerignore should exclude node_modules, .git, etc.6. Root User in Production: Security risk. Always create and use a non-root user.

Interview Deep-Dive

Strong Answer:
  • First, I would examine the Dockerfile instruction order. The most common cause of slow builds is cache invalidation too early in the layer chain. If COPY . . comes before RUN npm install, then every source code change triggers a full npm install (~2-4 minutes). Moving COPY package*.json ./ and RUN npm ci above COPY . . means npm only re-runs when dependencies actually change.
  • Second, I would check the .dockerignore file. Without one, Docker sends the entire build context to the daemon, including node_modules (potentially 300MB+), .git history, test fixtures, and IDE files. Adding these to .dockerignore can cut context transfer from minutes to seconds.
  • Third, I would enable BuildKit (DOCKER_BUILDKIT=1) and use cache mounts for npm: RUN --mount=type=cache,target=/root/.npm npm ci --only=production. This persists npm’s download cache between builds, so packages that were already downloaded are not re-fetched.
  • Fourth, I would check if the CI runner is using Docker layer caching. Many CI systems (GitHub Actions, GitLab CI) discard the layer cache between runs by default. Using --cache-from with a registry-based cache (e.g., docker build --cache-from myapp:cache --build-arg BUILDKIT_INLINE_CACHE=1) can recover cached layers across CI runs.
  • Fifth, if this is a multi-stage build, I would check if independent stages are building in parallel. BuildKit builds independent stages concurrently, which can cut wall-clock time significantly for builds with separate frontend and backend stages.
Follow-up: After optimizing, builds are fast when dependencies have not changed, but a new developer on the team reports builds always take 8 minutes on their machine. Why?The new developer has no local layer cache. Their first build will always be slow. The fix is to pre-populate their cache: either by pulling a cache image from the registry (docker pull myapp:cache) before building, or by using BuildKit’s registry-based cache (--cache-from type=registry,ref=myapp:cache). Some teams also distribute a “warm cache” image in onboarding docs. Once the first build completes, subsequent builds will be fast because the cache is populated locally.
Strong Answer:
  • Docker’s layer model is a content-addressable stack. Each layer’s identity (its hash) is computed from both its own content and the identity of all layers below it. This means a layer’s hash is determined by its instruction, its inputs, and its parent layer’s hash.
  • When a layer changes (e.g., you modify a file that is COPY’d), its hash changes. Since the next layer’s hash depends on the previous layer’s hash, the next layer’s identity also changes — even if its own instruction and inputs are identical. This cascades through every subsequent layer like a chain reaction.
  • Concretely: if you have COPY . . followed by RUN npm install, and you change a source file, the COPY layer gets a new hash. The npm install layer’s cache key includes the parent layer hash, so it no longer matches the cached version, and npm install runs again from scratch — even though package.json did not change.
  • This is an intentional design choice for correctness. If Docker reused a cached layer whose parent changed, you could get inconsistent builds where the upper layer was built against different content than what is currently below it. The trade-off is build speed, which you recover by ordering instructions from least-frequently-changed to most-frequently-changed.
Follow-up: How does BuildKit’s cache mount differ from regular layer caching?A cache mount (--mount=type=cache,target=/path) is orthogonal to the layer cache. It persists a directory across builds without baking it into any layer. Even when a layer is invalidated and re-executes, the cache mount still contains data from the previous run. For package managers (npm, pip, go mod), this means downloaded packages survive layer cache invalidation. The key insight is that the layer cache answers “did this instruction change?” while the cache mount answers “do I still have the downloaded artifacts?” They solve different problems and work best together.
Strong Answer:
  • The trade-off is security surface versus debuggability. scratch is a zero-byte base image — no shell, no package manager, no libc, no CA certificates, nothing. The attack surface is effectively zero because there are no tools for an attacker to use even if they achieve code execution. Alpine adds a shell (/bin/sh), a package manager (apk), musl libc, and base utilities — roughly 7MB of additional attack surface.
  • For production, I would push back on switching the runtime image and instead propose alternatives for debugging. First, ephemeral debug containers in Kubernetes (kubectl debug -it pod-name --image=nicolaka/netshoot) let you attach a debug container to a running pod’s network and process namespace without modifying the production image. Second, a multi-stage Dockerfile with a debug target: FROM alpine AS debug that includes tools, and FROM scratch AS prod that does not. CI builds the prod target; developers can build the debug target locally.
  • Third, for Go specifically, you can compile with debug symbols and use Delve remotely — the debugger runs on the developer’s machine and connects to the Go process over a port. No shell needed in the container.
  • If the team absolutely needs Alpine in production, I would add --cap-drop=ALL, run as a non-root user, and use a read-only filesystem to mitigate the increased surface area. But my strong preference is to keep production images minimal and debug through external tooling.
Follow-up: The Go binary compiled with CGO_ENABLED=0 runs fine on scratch, but a new dependency requires CGO. What changes?CGO means the binary dynamically links against libc. On scratch, there is no libc, so the binary fails at startup with a cryptic error about missing shared libraries. The options are: use Alpine (which includes musl libc) or Distroless (which includes glibc). Alternatively, you can statically link with musl using CC=musl-gcc CGO_ENABLED=1 go build -ldflags '-linkmode external -extldflags "-static"', which produces a static binary that still runs on scratch. The musl approach requires the musl toolchain in the build stage but keeps the runtime minimal.

Next: Docker Networking →