Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Docker Interview Questions (70+ Detailed Q&A)

Senior vs Staff — What This Section Tests At Each LevelSenior Engineer: Builds and ships production containers daily. Writes efficient multi-stage Dockerfiles, debugs networking and OOM issues, sets up CI image builds, and configures health checks. Understands layer caching, security basics (non-root, minimal images), and can troubleshoot a failing container at 3 AM.Staff Engineer: Defines container standards and image governance for the entire organization. Owns the base image strategy (approved bases, scanning policies, digest pinning), registry architecture (pull-through caches, replication, RBAC), build infrastructure (BuildKit remote builders, cross-platform strategy), and runtime security posture (seccomp profiles, rootless enforcement, image signing with cosign). Makes decisions that affect 50+ engineers and 100+ services.

1. Fundamentals & Architecture

Answer:This is one of the most fundamental questions in Docker interviews, and the interviewer wants to see that you understand the isolation boundary difference — not just “containers are lighter.”
FeatureVirtual MachineContainer
Virtualization levelHardware (hypervisor emulates CPU, memory, I/O)OS-level (shares host kernel directly)
Isolation mechanismHypervisor (Type 1: bare-metal like ESXi, KVM; Type 2: hosted like VirtualBox)Linux namespaces (PID, NET, MNT, UTS, IPC, USER) + cgroups (resource limits)
Guest OSFull OS per VM (kernel + userland) — each VM boots its own kernelNo guest OS — containers share the host kernel and only package userland (libs + app)
SizeTypically 1-40 GB per VM imageTypically 5-500 MB per container image
Boot time30 seconds to several minutesMilliseconds to a few seconds
Density~10-50 VMs per host (limited by memory for each guest OS)~100-1000+ containers per host
Security isolationStronger — separate kernel per VM means kernel exploits are containedWeaker by default — a kernel vulnerability affects all containers on the host
The deeper story: Containers achieve their lightweight nature by leveraging two Linux kernel features. Namespaces give each container its own view of the system — its own process tree (PID namespace), network stack (NET namespace), filesystem mounts (MNT namespace), hostname (UTS namespace), and user IDs (USER namespace). Cgroups (control groups) limit and account for resource usage — CPU, memory, disk I/O, and network bandwidth per container.When VMs are still the right call: Multi-tenant environments where you cannot trust workloads (different customers on the same hardware), running different OS kernels (Windows containers on a Linux host need a VM), or regulatory requirements that mandate hardware-level isolation (some financial and healthcare compliance standards).Real-world hybrid: In production, most companies run containers inside VMs. AWS ECS runs your Docker containers on EC2 instances (VMs). GKE runs Kubernetes pods on Compute Engine VMs. The VM provides the security boundary between tenants; containers provide the density and speed within that boundary.What interviewers are really testing: Whether you understand that the isolation trade-off is the fundamental difference — containers sacrifice some isolation for speed and density. Senior candidates mention namespaces and cgroups by name.Red flag answer: “Containers are lightweight VMs.” They are not — they share the host kernel, which is a fundamentally different architecture with different security implications.Follow-up chain:
  1. “You said containers share the host kernel. What specific attack vector does that create that VMs do not have?” (A kernel exploit like Dirty Pipe (CVE-2022-0847) in the shared kernel affects every container on the host. In a VM, the kernel is per-VM so a guest kernel exploit is contained. This is why multi-tenant SaaS platforms run containers inside VMs — the VM is the trust boundary, containers are the density mechanism.)
  2. “If containers share the kernel, how can you run a Linux container on macOS or Windows?” (You cannot — Docker Desktop runs a lightweight Linux VM (using Apple’s Virtualization.framework on macOS or WSL2/Hyper-V on Windows) and runs containers inside that VM. The container still uses a Linux kernel, just one provided by the hidden VM.)
  3. “When would you choose Kata Containers or gVisor over standard runc-based containers?” (When you need stronger isolation than namespaces but lighter weight than full VMs. gVisor intercepts syscalls in userspace via a custom kernel. Kata Containers spawn a lightweight VM per pod. Both are used in multi-tenant environments — GKE Sandbox uses gVisor, AWS Fargate uses Firecracker microVMs.)
  4. “Your security team says containers are not secure enough for PCI-DSS workloads. How do you respond?” (Defense-in-depth: non-root user + read-only root filesystem + seccomp profiles + AppArmor/SELinux + USER namespace remapping + running containers inside VMs. Many PCI-DSS compliant systems run containers — the requirement is demonstrating equivalent isolation, not using VMs specifically. Reference the CIS Docker Benchmark.)
Answer:Docker uses a client-server architecture with several distinct components, each with a specific role in the container lifecycle. Understanding this layering is critical for debugging and for understanding why Docker can be replaced piece by piece (e.g., swapping runc for gVisor, or Docker CLI for nerdctl).
  • Daemon (dockerd): The long-running background process that manages all Docker objects (containers, images, networks, volumes). It exposes a REST API on a Unix socket (/var/run/docker.sock) or optionally over TCP. Every docker CLI command translates to an API call to this daemon.
  • Client (docker): The CLI binary. It serializes your command into an HTTP request and sends it to the daemon. The client and daemon can run on different machines — this is the basis for Docker contexts and remote management.
  • Registry: A stateless server that stores and distributes Docker images. Docker Hub is the default public registry; private options include AWS ECR, GCP Artifact Registry, Azure ACR, and self-hosted Harbor. Images are stored as layers (blobs) plus a manifest that describes how to assemble them.
  • Containerd: The high-level container runtime. It manages the complete container lifecycle — image pull/push, container creation, storage, and networking setup. Containerd is a graduated CNCF project and is used directly by Kubernetes (bypassing the Docker daemon entirely since Kubernetes 1.24).
  • Runc: The low-level OCI runtime. It is a small binary that takes a container configuration (OCI spec) and uses Linux kernel APIs (namespaces, cgroups) to actually create the isolated process. Alternatives to runc include gVisor (Google’s sandboxed runtime) and Kata Containers (lightweight VMs).
Image Pull Flow:
  1. docker pull nginx — CLI sends REST request to Daemon
  2. Daemon queries Registry for the image manifest (JSON doc listing layers and their SHA256 digests)
  3. Downloads layers in parallel, skipping any that already exist in local cache (content-addressable storage)
  4. Stores layers in /var/lib/docker/overlay2/ (on overlay2 storage driver)
Container Creation Flow:
  1. docker run — Daemon creates a container config (OCI spec JSON)
  2. Containerd prepares the filesystem — assembles the union mount from image layers + writable layer
  3. Runc creates namespaces (PID, NET, MNT, UTS, IPC, USER) and cgroup for resource limits
  4. Runc executes the entrypoint process as PID 1 inside the new namespaces
Why this layering matters: Since Kubernetes 1.24, the “dockershim” was removed. Kubernetes talks directly to containerd (or CRI-O), bypassing dockerd entirely. This means your production containers in K8s are not running “Docker” — they are running containerd + runc. Docker is a development tool on top of the same runtime Kubernetes uses.What interviewers are really testing: Whether you understand that Docker is not a monolith — it is a stack of components, each replaceable. Senior candidates explain the containerd/runc split and mention that Kubernetes dropped Docker as a runtime (but still uses the same underlying tech).Red flag answer: Describing Docker as a single program that “runs containers.” Not knowing that containerd and runc exist, or confusing the Docker daemon with the container runtime.Follow-up:
  1. “Kubernetes removed dockershim in 1.24. Does that mean Docker images stop working in Kubernetes?” (No. Docker images are OCI-compliant. Kubernetes removed the Docker daemon dependency, not the image format. containerd pulls and runs the same images.)
  2. “If the Docker daemon crashes, do running containers die?” (With containerd’s --live-restore and Docker’s live-restore daemon option, containers continue running even if dockerd restarts. This is critical for production upgrades.)
  3. “When would you replace runc with an alternative runtime like gVisor?” (Multi-tenant environments where you need stronger isolation than Linux namespaces — gVisor intercepts syscalls in userspace, adding a security boundary without the overhead of full VMs.)
Answer:Docker images are built as a stack of read-only layers, where each Dockerfile instruction (RUN, COPY, ADD) creates a new layer. These layers are stored and managed using a Union File System (OverlayFS on modern Linux, formerly AUFS).How layers work internally:
  1. Each layer is a filesystem diff — it contains only the files that changed from the previous layer.
  2. When you docker pull, layers are downloaded independently and cached by their content hash (SHA256). If two images share the same base layer (e.g., both use node:18-alpine), that layer is stored only once on disk.
  3. When a container starts, Docker stacks all image layers (read-only) and adds a thin writable layer (also called the container layer) on top.
Copy-on-Write (CoW): If a running container modifies a file that exists in a read-only image layer, the file is first copied up to the writable layer, then modified there. The original image layer is untouched. This means 100 containers from the same image share all read-only layers — only their individual writes consume additional disk space.Practical implications:
  • Layer ordering matters for cache: Docker caches layers by instruction. If you change line 5 of a Dockerfile, layers 1-4 are cached but lines 5+ are rebuilt. This is why you COPY package.json and RUN npm install before COPY . . — so dependency installation is cached when only source code changes.
  • Layer count affects pull time: Each layer is a separate download. Combining RUN commands with && reduces layer count and total image size (intermediate files created and deleted in the same RUN are never stored).
  • Size debugging: docker history <image> shows each layer’s size, which helps identify bloated layers.
What interviewers are really testing: Whether you understand that images are not monolithic blobs — they are composable, cacheable layer stacks. This understanding directly impacts how you write efficient Dockerfiles.Red flag answer: Not knowing that each Dockerfile instruction creates a layer, or not understanding why layer ordering affects build speed.Follow-up chain:
  1. “You mentioned Copy-on-Write. A container writes 1 byte to a 500MB file that exists in the base image layer. How much additional disk space does this consume?” (The entire 500MB file is copied to the writable layer, then the 1 byte is modified. CoW operates at the file level on overlay2, not at the block level. This is why writing to large files inside containers is expensive — and why databases should always use volumes, not the container’s writable layer.)
  2. “How does docker history differ from docker image inspect for debugging layer sizes?” (docker history shows each layer’s instruction and size but uses the compressed size. docker image inspect shows the full config and layer digests. For the most accurate size breakdown, use dive — an open-source tool that shows file-level changes per layer and detects wasted space.)
  3. “Two teams both use FROM node:18-alpine. Team A’s image is 400MB, Team B’s is 180MB. Same base, same app framework. What is the most likely cause?” (Team A is probably installing dev dependencies (npm install instead of npm ci --production), copying test fixtures or documentation into the image, or running apt-get/apk installs without cleanup. The base image is shared but everything above it differs.)
Answer:Both instructions copy files into the image, but they have different capabilities and the best practice is to default to COPY unless you specifically need ADD’s extra features.
  • COPY: Copies files or directories from the build context (your local filesystem) into the image. Does exactly what it says — nothing more. Predictable and transparent.
    COPY package.json /app/
    COPY src/ /app/src/
    
  • ADD: Does everything COPY does, plus two extra behaviors:
    1. Auto-extracts compressed archives: ADD app.tar.gz /app/ automatically extracts the tarball into /app/. Supports tar, gzip, bzip2, and xz.
    2. Downloads from URLs: ADD https://example.com/file.txt /app/ fetches the file. However, this is discouraged because it creates a layer that cannot be cached reliably, and you cannot verify checksums inline.
Why COPY is preferred: Docker’s own best practices documentation recommends COPY because its behavior is explicit and predictable. ADD’s implicit extraction can cause surprises — if you ADD archive.tar.gz /data/ intending to place the archive file itself, you get extracted contents instead. For URL downloads, RUN curl -O gives you more control (you can verify checksums, set permissions, and clean up in the same layer).When ADD is appropriate: When you specifically want auto-extraction of a local tarball into the image. This is the one legitimate use case.What interviewers are really testing: Whether you follow Docker best practices and understand the principle of least surprise in Dockerfile instructions.Follow-up chain:
  1. “A developer uses ADD https://example.com/config.json /app/config.json in a Dockerfile. Beyond the caching issue, what security concern does this create?” (No checksum verification — if the URL is compromised or MITM’d, you bake a malicious file into the image with no audit trail. With RUN curl, you can verify a SHA256 checksum inline: curl -o file.tar.gz URL && echo "expected_sha256 file.tar.gz" | sha256sum -c -.)
  2. “Does COPY --from=builder /app/dist /html work for copying between multi-stage build stages? Is it COPY or ADD?” (Only COPY supports --from. This is another reason to default to COPY — it has capabilities that ADD does not in the multi-stage context.)
Answer:These two instructions work together to define what runs when a container starts, but they serve different roles and interact in specific ways.
  • ENTRYPOINT: Defines the executable that always runs. It is the fixed part of the command. To override it, you must use docker run --entrypoint. In practice, this means ENTRYPOINT defines what program runs.
  • CMD: Provides default arguments to the ENTRYPOINT. These are easily overridden by appending arguments to docker run. CMD defines how the program runs by default.
The interaction pattern:
ENTRYPOINT ["python"]
CMD ["app.py"]
  • docker run myimage executes python app.py
  • docker run myimage test.py executes python test.py (CMD overridden)
  • docker run --entrypoint bash myimage executes bash (ENTRYPOINT overridden)
Shell form vs Exec form: Always use the exec form (JSON array syntax) for both. The shell form (CMD npm start) wraps your command in /bin/sh -c, which means your process runs as a child of the shell. This causes PID 1 signal-handling issues — SIGTERM from docker stop goes to the shell, not your app, leading to a 10-second hard kill instead of graceful shutdown.
# BAD: Shell form -- signals don't reach node process
CMD npm start

# GOOD: Exec form -- node process is PID 1 and receives signals
CMD ["node", "server.js"]
Common production patterns:
  • Web server: ENTRYPOINT ["node"] + CMD ["server.js"] — lets you run docker run myimage --inspect server.js for debugging
  • CLI tool: ENTRYPOINT ["aws"] + CMD ["help"] — container acts like the AWS CLI itself
  • Only CMD: Many images skip ENTRYPOINT entirely and use only CMD. This is fine for simple cases where you want the entire command to be easily overridden.
What interviewers are really testing: Whether you understand the exec vs shell form distinction and can explain signal handling implications. The PID 1 issue is a real production concern that separates candidates who have actually debugged container shutdown behavior.Red flag answer: “ENTRYPOINT cannot be overridden.” It can, with --entrypoint. Also, not knowing the shell form vs exec form distinction.Follow-up chain:
  1. “Your container takes exactly 10 seconds to stop with docker stop. What is happening?” (The process is not handling SIGTERM. Docker sends SIGTERM, waits the grace period (default 10s), then sends SIGKILL. This is the shell form vs exec form issue — if CMD uses shell form, SIGTERM goes to /bin/sh, not your app. Or the app simply has no SIGTERM handler.)
  2. “Can you have both ENTRYPOINT and CMD specified, where ENTRYPOINT comes from the base image and CMD from your Dockerfile?” (Yes. This is a common pattern — the base image sets the ENTRYPOINT and your Dockerfile overrides CMD to provide different default arguments. docker inspect shows both, and you can trace which layer set each with docker history.)
  3. “What happens if you specify both CMD and ENTRYPOINT in shell form?” (This is a trap. In shell form, CMD is completely ignored when ENTRYPOINT is set. Only the exec form (JSON array) allows CMD to append arguments to ENTRYPOINT. Shell form ENTRYPOINT runs /bin/sh -c <entrypoint> and CMD never gets invoked.)
Answer:When you type docker run nginx, a surprisingly complex chain of events fires in rapid succession. Understanding this sequence is essential for debugging containers that fail to start or behave unexpectedly.The complete sequence:
  1. CLI parses the command — The Docker client validates flags, serializes the request, and sends it to the Docker daemon via the REST API (POST /containers/create followed by POST /containers/{id}/start).
  2. Image resolution — The daemon checks if nginx:latest (or whatever tag) exists in the local image cache (/var/lib/docker/overlay2/). If not, it pulls from the configured registry (Docker Hub by default), downloading the manifest and then each layer in parallel.
  3. Container config creation — The daemon generates an OCI runtime specification: a JSON document defining the root filesystem, environment variables, namespace configuration, cgroup limits, mount points, and the entrypoint command.
  4. Filesystem assembly — Containerd creates a union mount: all image layers stacked read-only, with a thin writable (copy-on-write) layer on top. This is the container’s root filesystem.
  5. Network setup — Docker creates a veth pair (virtual ethernet cable). One end goes inside the container’s network namespace as eth0, the other attaches to the bridge network (docker0 or a user-defined bridge). The embedded IPAM assigns an IP address. If -p was specified, iptables DNAT rules are added for port forwarding.
  6. Namespace creation — Runc creates Linux namespaces: PID (isolated process tree), NET (isolated network stack), MNT (isolated filesystem), UTS (isolated hostname), IPC (isolated shared memory), and optionally USER (UID remapping).
  7. Cgroup setup — Runc creates a cgroup for the container and applies resource limits (--memory, --cpus, --pids-limit).
  8. Process execution — Runc executes the entrypoint/CMD as PID 1 inside the new namespaces. The container is now running.
Time breakdown: Steps 1-8 take milliseconds to a few seconds (excluding image pull). This is why containers boot 100x faster than VMs — there is no kernel to boot, no init system to start.What interviewers are really testing: Whether you can trace the full lifecycle from CLI to kernel, not just parrot “it starts a container.” Strong candidates mention namespaces, cgroups, the veth pair, and the union filesystem. This question also tests whether you can debug startup failures — knowing the sequence tells you where to look.Red flag answer: Only listing 3-4 high-level steps without any mention of namespaces, the writable layer, or network setup. Saying “Docker creates a VM” is an immediate red flag.Follow-up:
  1. “Your docker run hangs for 45 seconds before starting. The image is already cached locally. What could cause this?” (DNS resolution timeout if the daemon is trying to resolve the image tag against a slow/unreachable registry. Also possible: slow storage driver, exhausted IP pool on the bridge network, or iptables lock contention.)
  2. “What is the difference between docker create and docker run?” (docker create does steps 1-4 only — creates the container but does not start it. docker start then does steps 5-8. docker run is create + start combined.)
  3. “If you run docker run --rm, at what point is the container removed?” (After the main process exits and the container stops. The --rm flag registers a cleanup hook that removes the container and its writable layer upon exit.)
Answer:
  • -d (Detached): Runs the container in the background. The container starts and your terminal is immediately returned. Use this for long-running services (web servers, databases). You interact with it via docker logs, docker exec, or docker attach.
  • -it (Interactive + TTY): Two flags combined. -i keeps STDIN open (you can type input), -t allocates a pseudo-TTY (gives you a formatted terminal). Together, they give you an interactive shell session inside the container. Use this for debugging, running one-off commands, or exploring an image’s filesystem.
Practical usage:
# Run a web server in the background
docker run -d -p 8080:80 nginx

# Get a shell inside an Alpine image for debugging
docker run -it alpine /bin/sh

# Attach to a running container for debugging
docker exec -it <container_id> /bin/bash
The subtlety: docker run -d and then docker exec -it is the standard production debugging workflow. You never run production containers in interactive mode — they always run detached. Interactive mode is a development and troubleshooting tool.What interviewers are really testing: Whether you know the practical debugging workflow and understand that -i and -t are separate flags with distinct purposes. Bonus points for explaining when you would use -i without -t (piping data into a container) or -t without -i (getting formatted output without needing input).Red flag answer: Confusing -d with backgrounding a process inside the container (like CMD node server.js &). Detached mode runs the container in the background; the process inside still runs in the foreground of its namespace.Follow-up:
  1. “You run docker run -d myapp but the container exits immediately. docker ps shows nothing. How do you debug?” (Use docker ps -a to see stopped containers, then docker logs <id> to see what the process printed before exiting. Check docker inspect -f '{{.State.ExitCode}}' <id> for the exit code.)
  2. “What is the difference between docker attach and docker exec -it bash?” (attach connects to PID 1’s STDIN/STDOUT — if you press Ctrl+C, you send SIGINT to the main process and may stop the container. exec spawns a new process — exiting the exec shell does not affect the main container process.)
Answer:A Docker context is a named configuration that tells the Docker CLI which Docker daemon to communicate with. By default, the CLI talks to the local daemon via a Unix socket (/var/run/docker.sock), but contexts let you switch targets seamlessly.Use cases:
  • Remote server management: docker context create prod --docker "host=ssh://user@prod-server" lets you run docker ps against your production host without SSH-ing in.
  • Minikube/Kind: Switch between local Kubernetes clusters and the default Docker daemon.
  • Docker Desktop vs Colima: On macOS, switch between different container runtimes.
docker context ls                          # List all contexts
docker context create staging --docker "host=ssh://deploy@staging.example.com"
docker context use staging                 # All subsequent commands target staging
docker ps                                  # Shows containers on staging server
docker context use default                 # Switch back to local
Production relevance: Contexts replace the older pattern of setting DOCKER_HOST environment variables, which was error-prone (forgetting you had it set could lead to accidentally running commands against production).What interviewers are really testing: Whether you have managed Docker across multiple environments and understand the operational tooling beyond basic docker run. This question separates developers who only run Docker locally from those who manage remote Docker hosts.Red flag answer: Not knowing Docker contexts exist, or confusing Docker contexts with Kubernetes contexts (kubectl config use-context). They are similar concepts but for different tools.Follow-up:
  1. “You accidentally ran docker rm -f $(docker ps -q) while your context was set to production. How do you prevent this in the future?” (Use context naming conventions with color-coded terminal prompts, require confirmation for destructive operations on production contexts, or use read-only socket proxies for production.)
  2. “How do Docker contexts differ from Kubernetes contexts, and can you use both simultaneously?” (Docker contexts switch the Docker daemon target; K8s contexts switch the cluster/namespace. They are independent — you can have Docker pointing at production while kubectl points at staging.)
Answer:The class/object analogy is the simplest way to think about it, but the reality is more nuanced:
  • Image: A read-only, layered filesystem template. It contains everything needed to run an application — OS libraries, runtime, application code, and configuration. Images are immutable and identified by a content-addressable hash (SHA256). You can think of it as a compiled binary that captures an entire runtime environment.
  • Container: A running (or stopped) instance of an image. It adds a thin writable layer on top of the image layers (using Copy-on-Write), plus an isolated process space with its own PID namespace, network stack, and filesystem view. Multiple containers can share the same image layers, each with their own writable layer.
Key distinction: An image is a build artifact; a container is a runtime entity. You docker build an image, docker push it to a registry, and docker run it to create a container. Images are portable and reproducible; containers are ephemeral and disposable.Practical implication: This is why you should never store important data in a container’s writable layer. When the container is removed, that layer is gone. Use volumes for persistent data. The container should be cattle (replaceable), not a pet (irreplaceable).What interviewers are really testing: Whether you understand the immutability model — images are immutable artifacts, containers are ephemeral processes. This mental model drives correct decisions about data persistence, deployment strategies, and debugging approaches.Red flag answer: “An image is a stopped container” or “You save a container to create an image.” While docker commit exists, using it is an anti-pattern — images should be built from Dockerfiles for reproducibility, not by capturing container state.Follow-up:
  1. “Can you modify a running container’s filesystem and then create a new image from it? Should you?” (Yes, docker commit does this. No, you should not — it creates unreproducible images with no audit trail. Always use Dockerfiles.)
  2. “Two containers are running from the same image. Container A writes a file. Can Container B see it?” (No. Each container has its own writable layer. The image layers are shared and read-only. To share data between containers, use a shared volume.)
Answer:With the rise of ARM-based servers (AWS Graviton, Apple M-series), building images for multiple CPU architectures is now a production requirement, not a niche concern.docker buildx is the tool that makes this possible. It extends the standard docker build with multi-platform support.How it works:
  1. You create a buildx builder instance: docker buildx create --use
  2. Build for multiple platforms in one command: docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push .
  3. Docker uses QEMU emulation to build for architectures different from your host. An amd64 machine can build arm64 images (slowly) via QEMU user-mode emulation.
  4. The result is a manifest list (also called a multi-arch manifest) — a single tag (myapp:latest) that points to multiple platform-specific images. When a user pulls the image, Docker automatically selects the correct variant for their architecture.
Performance note: QEMU emulation is 5-10x slower than native builds. For CI pipelines building large images, use native ARM runners (GitHub Actions has ubuntu-latest on ARM, GitLab offers ARM runners) or cross-compilation in your build stage (Go’s GOARCH=arm64 compiles natively without emulation).Why this matters: AWS Graviton instances are 20-40% cheaper than equivalent x86 instances. If your images only support amd64, you cannot take advantage of this cost saving. Multi-arch builds are table stakes for modern cloud deployments.

2. Networking & Storage

Answer:Docker provides five network drivers, each serving a different isolation and connectivity model:
DriverScopeUse CasePerformanceIsolation
Bridge (default)Single hostStandard container-to-container communicationGood (slight NAT overhead)Containers are on a private subnet
HostSingle hostPerformance-critical apps (eliminate NAT overhead)Best (no network translation)None — container shares host’s network stack entirely
NoneSingle hostContainers that need no network (batch jobs, security-sensitive compute)N/AComplete network isolation
OverlayMulti-hostSwarm services or multi-host container communicationModerate (VXLAN encapsulation overhead)Encrypted cross-host communication
MacvlanSingle hostContainers that need to appear as physical devices on the LAN (legacy app integration)Excellent (no NAT, no bridge)Each container gets a real MAC and IP on the physical network
When to use each in practice: Bridge is the default for 90% of single-host workloads. Host mode is for performance-critical services where NAT overhead matters. Overlay is for multi-node Swarm deployments. Macvlan is rare but useful for legacy apps that expect real network interfaces.What interviewers are really testing: Whether you can recommend the right network driver for a given scenario, not just list them.Follow-up chain:
  1. “Your application needs the absolute lowest possible network latency. You are currently using bridge networking. What do you change and what do you lose?” (Switch to --network host. You eliminate NAT/bridge overhead, but lose port isolation — two containers cannot both bind to port 80. You also lose the embedded DNS service for container name resolution. Measure the actual improvement; for most apps, the bridge overhead is <0.1ms.)
  2. “You need containers across 3 hosts to communicate without Kubernetes. What network driver do you use and how?” (Overlay network with Docker Swarm. docker swarm init on one host, join the others, create an overlay: docker network create -d overlay mynet. Under the hood, VXLAN encapsulates L2 frames in UDP for cross-host communication. Alternative: use a third-party CNI like Weave or Flannel without Swarm.)
  3. “A legacy application expects to have its own MAC address and appear directly on the physical LAN. Can Docker do this?” (Yes — Macvlan driver. Each container gets a unique MAC and IP on the physical network. The trade-off: the host cannot communicate with its own macvlan containers without a separate macvlan sub-interface, and you need the physical switch to accept multiple MACs per port.)
What weak candidates say: Memorize the five drivers as a list without understanding when or why to pick each one. Cannot explain what VXLAN is or why overlay has performance overhead.What strong candidates say: “The way I think about it is: bridge is your default for single-host, overlay for multi-host Swarm, host for latency-sensitive workloads where you trade isolation for speed, and macvlan for the rare legacy case. In practice, 95% of my containers run on user-defined bridges.”
Answer:Understanding bridge networking internals is what separates someone who uses Docker from someone who can debug Docker networking issues at 2 AM.Step by step, what Docker creates:
  1. docker0 bridge: When Docker starts, it creates a virtual Ethernet bridge called docker0 (or a custom name for user-defined networks). Think of this as a virtual network switch that all containers on this network plug into.
  2. veth pair: For each container, Docker creates a virtual Ethernet pair — a “virtual cable” with two ends. One end (eth0) is placed inside the container’s network namespace. The other end (vethXXXXX) is attached to the docker0 bridge on the host.
  3. IP allocation: Docker’s built-in IPAM (IP Address Management) assigns each container an IP from the bridge’s subnet (default: 172.17.0.0/16).
  4. iptables NAT: For outbound traffic, Docker adds masquerade rules so container traffic appears to come from the host’s IP. For inbound traffic with port mapping (-p 8080:80), Docker adds DNAT rules that forward packets from the host port to the container IP.
Why user-defined bridges are better than the default: The default docker0 bridge does not provide DNS resolution between containers — you must use IP addresses or the deprecated --link flag. User-defined bridges (docker network create mynet) include an embedded DNS server that resolves container names to IPs automatically, which is essential for service discovery.Follow-up chain:
  1. “You run iptables -L -t nat on the host and see dozens of DNAT rules from Docker port mappings. A new team member is confused about where these come from. Explain.” (Every -p host:container flag creates two iptables rules: a DNAT rule in the nat table that rewrites incoming packets’ destination IP/port to the container’s IP/port, and a masquerade rule for outbound traffic. docker-proxy also listens on the host port as a userspace fallback for hairpin NAT. This is why Docker requires root or NET_ADMIN capability.)
  2. “Two containers on the same bridge can communicate. Can a container on bridge A reach a container on bridge B?” (No, not by default. Each bridge is an isolated L2 domain. To allow it: either connect a container to both networks (docker network connect bridgeB mycontainer) or route between bridges at the host level. The isolation is the entire point of separate networks.)
  3. “How does Docker handle DNS for containers on a user-defined bridge?” (Docker runs an embedded DNS server at 127.0.0.11 inside each container. It resolves container names and network aliases to their internal IPs. For external DNS, it forwards to the host’s configured DNS servers. The 127.0.0.11 address is intercepted by iptables rules in the container’s network namespace.)
Answer:
  • Same user-defined bridge network: Containers can reach each other by container name (DNS resolution is automatic) or by IP address. This is the recommended approach. Example: a Node.js app connects to mongodb://mongo:27017 where mongo is the container name on the same network.
  • Same default bridge: Containers can communicate by IP address only. No DNS resolution. You would need the deprecated --link flag for name resolution — avoid this.
  • Different bridge networks: Containers cannot communicate by default. Network isolation is the whole point. To allow it, connect a container to multiple networks: docker network connect network2 mycontainer.
  • Host network: Containers on the host network communicate via localhost like regular processes.
Production pattern: Create a dedicated network per application stack. Your API, database, and cache share one network. A separate monitoring stack uses its own network. This provides network-level isolation between unrelated services without any firewall rules.
docker network create app-net
docker run -d --name api --network app-net myapi
docker run -d --name db --network app-net postgres
# api can reach db at hostname "db" -- no IP addresses needed
Answer:Port mapping connects the outside world to your containerized application. The syntax is -p [host_ip:]host_port:container_port[/protocol].Common patterns:
-p 8080:80            # Map host 8080 to container 80 (all interfaces)
-p 127.0.0.1:8080:80  # Bind to localhost only (not accessible from network)
-p 8080:80/udp        # UDP instead of TCP
-p 8080-8090:80-90    # Port range mapping
EXPOSE vs -p: The EXPOSE instruction in a Dockerfile is documentation only — it does not actually publish the port. The -p flag at runtime is what actually creates the port mapping. This distinction confuses many beginners.Security note: -p 8080:80 binds to 0.0.0.0 by default, meaning the port is accessible from any network interface, including public IPs. In production, either use -p 127.0.0.1:8080:80 to bind to localhost only, or place containers behind a reverse proxy and do not publish ports directly.
Answer:Both allow containers to persist data beyond the container lifecycle, but they differ in management and use case:
FeatureVolumeBind Mount
Managed byDocker engine (/var/lib/docker/volumes/)Host filesystem (any path)
PortabilityCan be shared across containers, backed up with docker volume commandsTied to host directory structure
PerformanceOptimized by Docker (especially on macOS/Windows where Docker uses a VM)Native filesystem speed on Linux; can be slow on Docker Desktop
Best forProduction data persistence (databases, uploads)Development workflows (live code reloading)
Production recommendation: Always use named volumes for databases and persistent state. Bind mounts are a development tool.
# Named volume (production)
docker run -d -v pgdata:/var/lib/postgresql/data postgres

# Bind mount (development -- hot reload)
docker run -d -v $(pwd)/src:/app/src myapp-dev
Docker Desktop performance gotcha: On macOS and Windows, bind mounts go through a filesystem-sharing layer that adds latency. A Node.js project with 50,000 files in node_modules can see operations take 5-10x longer in a bind-mounted container vs. a volume. The fix: mount node_modules as a separate anonymous volume so it stays in the Linux VM.Follow-up chain:
  1. “Your Postgres container’s data volume is 200GB. You need to migrate it to a new host. What is your approach?” (Use pg_dump/pg_restore for a logical backup (portable, can change schema), not a filesystem copy. For large volumes, pg_basebackup with streaming replication is faster. Never docker cp or tar a running database’s volume — you risk inconsistent state.)
  2. “A developer says ‘I deleted a file in my container but the image is still the same size.’ Explain why.” (Deleting a file in the writable layer does not affect the read-only image layers. The deletion is recorded as a “whiteout” file in the writable layer. The original file still exists in its image layer. This is Union FS semantics — layers are additive, never modified.)
  3. “When would you use a volume driver other than the default local driver?” (Remote storage backends: rexray/ebs for AWS EBS volumes, netapp for NFS, portworx for distributed storage across nodes. In Swarm or when containers move between hosts, volumes need to follow them. Kubernetes handles this with PersistentVolumes and CSI drivers instead.)
Answer:A tmpfs mount stores data in the host machine’s RAM only — nothing is ever written to disk. When the container stops, the tmpfs mount is removed and the data is gone.When to use it:
  • Secrets and sensitive data: Temporary credentials or encryption keys that should never touch persistent storage.
  • High-speed scratch space: Intermediate computation results or caches that benefit from memory-speed I/O and do not need to survive a restart.
  • Security compliance: Some compliance frameworks require that certain data never be written to a persistent filesystem.
docker run -d --tmpfs /tmp:rw,noexec,nosuid,size=100m myapp
Answer:Dangling resources are leftovers that consume disk space without serving any purpose.
  • Dangling Image (<none>:<none>): Created when you rebuild an image with the same tag. The old layers lose their tag and become “dangling.” Also created by interrupted multi-stage builds.
  • Dangling Volume: A volume that is not referenced by any container (including stopped containers). This happens when you docker rm a container without the -v flag.
Cleanup commands (in order of aggressiveness):
docker image prune          # Remove dangling images only
docker image prune -a       # Remove ALL unused images (not just dangling)
docker volume prune         # Remove dangling volumes
docker system prune         # Remove dangling images + stopped containers + unused networks
docker system prune -a      # Nuclear option: remove everything not actively in use
Production tip: Schedule docker system prune -f --filter "until=168h" as a weekly cron job on build servers. Without this, disk usage grows unbounded and eventually causes builds to fail with “no space left on device.”
Answer:Docker runs an embedded DNS server at 127.0.0.11 inside every container connected to a user-defined network. This server resolves container names and network aliases to their internal IP addresses, enabling service discovery without hardcoding IPs.Important limitation: The default bridge network (docker0) does not provide DNS resolution. Containers on the default bridge can only communicate by IP address. This is one of the most common “why can’t my containers talk to each other” debugging issues. The fix is always to use a user-defined bridge: docker network create mynet.Network aliases: You can give a container multiple DNS names using --network-alias. Multiple containers with the same alias create round-robin service discovery.
Answer:IPv6 is disabled by default in Docker. To enable it, configure daemon.json with "ipv6": true and a "fixed-cidr-v6" subnet, then restart the Docker daemon. User-defined networks also need --ipv6 and --subnet flags.Why this matters: As IPv4 addresses become scarcer and cloud providers charge for public IPv4 (AWS began charging $0.005/hour per public IPv4 address in 2024), running dual-stack or IPv6-only container networks is increasingly relevant for cost optimization.
Answer:Since Docker volumes live inside Docker-managed directories, you cannot simply cp them. The standard approach is to use a temporary container as a bridge:Backup:
docker run --rm -v myvolume:/data -v $(pwd):/backup alpine \
  tar czf /backup/myvolume-backup.tar.gz -C /data .
Restore:
docker run --rm -v myvolume:/data -v $(pwd):/backup alpine \
  tar xzf /backup/myvolume-backup.tar.gz -C /data
Production consideration: For database volumes, always use the database’s native backup tool (pg_dump, mongodump, mysqldump) rather than filesystem-level copies. Filesystem backups of a running database can capture inconsistent state. Native tools ensure transactional consistency.

3. Best Practices & Optimization

What interviewers are really testing: Whether you have actually optimized Docker images in production and understand the cascading effects of image size on pull times, storage costs, cold start latency, and attack surface.Answer:The optimization ladder (from easiest to most aggressive):
  1. Use minimal base images: node:18 is ~900MB. node:18-slim is ~200MB. node:18-alpine is ~180MB. gcr.io/distroless/nodejs18 is ~120MB. For Go/Rust, you can use scratch (literally 0 bytes) since the binary is statically compiled.
  2. Multi-stage builds: Build in a full image (compilers, dev tools), copy only the artifact to a minimal runtime image. A Go service goes from ~800MB to ~15MB with alpine or ~5MB with scratch.
  3. Combine RUN commands: Files deleted in a later RUN still exist in earlier layers. This is the most common mistake:
    # BAD: apt cache exists in layer 1 even though layer 2 deletes it
    RUN apt-get update && apt-get install -y curl
    RUN rm -rf /var/lib/apt/lists/*
    
    # GOOD: Created and deleted in same layer -- never stored
    RUN apt-get update && apt-get install -y --no-install-recommends curl \
        && rm -rf /var/lib/apt/lists/*
    
  4. .dockerignore: Exclude .git (can be 100MB+), node_modules, dist, *.log, .env. These are never needed in the image.
  5. Install only production deps: npm ci --production instead of npm install. pip install --no-cache-dir.
  6. --no-install-recommends with apt-get skips suggested packages, cutting 50-100MB from Debian images.
Production impact: A 1.2GB image takes ~25s to pull on 100Mbps. A 50MB image takes ~1s. In Kubernetes autoscaling, this directly impacts how fast new pods can serve traffic during a spike.Red flag answer: Only saying “use alpine” without discussing multi-stage builds or the layer deletion trap. Not understanding that RUN rm in a separate layer does not reduce image size.
Senior vs Staff perspective
  • Senior: Writes multi-stage Dockerfiles, uses .dockerignore, picks slim/alpine base images, and knows about the same-layer delete trick.
  • Staff: Designs the org-wide image strategy — curated golden base images with security patches, automated size regression checks in CI that fail builds over a threshold, image scanning gates (Trivy/Snyk), signed base images (cosign), and a policy that sets image size SLOs per runtime (<100MB for Go, <300MB for Node, <500MB for Python). Also thinks about registry cost: 10K images x 500MB = 5TB storage billed monthly.
Follow-up chain:
  1. “You have a Python ML service with a 3GB image due to PyTorch. How do you reduce it?” — Multi-stage: copy only needed .so files. Use pytorch/pytorch:*-runtime. Store model weights in a volume or S3 instead of baking into image. Switch to python:3.11-slim base. Use BuildKit cache mounts for pip to avoid re-downloading wheels.
  2. “What is the difference between docker image ls reported size and actual disk usage?” — Shared layers are counted once on disk but shown fully per image. docker system df shows true usage. Registries (ECR/GCR) also deduplicate layers, so pushing 10 images that share a 500MB base layer only costs 500MB once.
  3. “Alpine images broke your production app because of musl vs glibc. How do you handle this?” — (a) Switch to debian:bookworm-slim or gcr.io/distroless/base-debian12 — similar size, glibc-compatible. (b) Rebuild the problematic dependency for musl (often impractical for ML libs). (c) Use distroless which keeps glibc but strips the shell/package manager. Common culprits: requests+certifi, numpy/scipy wheels, DNS resolution differences.
  4. “At 10x scale (10,000 services), what operational problems emerge from image size?” — Registry storage cost, node disk pressure (Docker’s overlay2 graph driver caps at ~80% of disk), pull bandwidth during a stampede (e.g., Kubernetes scale-out pulling 10GB from 500 nodes = registry overload), and image GC latency. Mitigations: image pre-pulling via DaemonSet, registry mirrors per region, aggressive image garbage collection, and signed base image policies.
Work-sample scenario: Your team’s Node.js service Docker image is 2GB. Walk through reducing it to under 200MB.
  • Step 1: docker history <image> — identify the fat layers. Usually COPY . . pulling in node_modules (1GB+) or base image (node:18 = 900MB).
  • Step 2: Switch base to node:18-alpine (180MB) or gcr.io/distroless/nodejs18 (120MB).
  • Step 3: Multi-stage build — first stage installs devDependencies and builds; second stage copies only dist/ and runs npm ci --omit=dev.
  • Step 4: Add .dockerignore excluding node_modules, .git, tests, docs, .env*.
  • Step 5: Use BuildKit cache mount for npm: RUN --mount=type=cache,target=/root/.npm npm ci.
  • Step 6: Verify with docker image inspect that the final image is <200MB and contains only runtime artifacts.
What weak candidates say: “Just use alpine” — ignores multi-stage, .dockerignore, layer ordering, and the trap of deleting in a later layer.What strong candidates say: “Image size is a symptom of Dockerfile discipline. I treat every COPY and RUN as a layer that lives forever — there is no ‘undoing’ in a later layer. I start with distroless or alpine, multi-stage everything, and measure image size in CI as a first-class metric. Small images are cheaper, faster to pull, and have a smaller attack surface — three wins from one discipline.”
What interviewers are really testing: Whether you can write Dockerfiles that build fast in CI. Layer caching is the single biggest lever for CI build speed, and getting it wrong means your team waits 10 minutes for every build instead of 30 seconds.Answer:Docker caches each layer by the instruction’s hash and the filesystem state from previous layers. If anything changes in a layer, that layer AND every subsequent layer is rebuilt from scratch.The golden rule: Order your Dockerfile instructions from least frequently changing to most frequently changing.
# GOOD: Dependencies cached when only source code changes
FROM node:18-alpine
WORKDIR /app
COPY package.json package-lock.json ./    # Changes rarely (only when deps update)
RUN npm ci                                 # Cached when package files unchanged
COPY . .                                   # Changes on every commit -- but npm ci is cached
RUN npm run build

# BAD: npm ci reruns on every code change
FROM node:18-alpine
WORKDIR /app
COPY . .                                   # Changes on every commit
RUN npm ci                                 # Cache busted EVERY time
RUN npm run build
Cache busters to watch for:
  • COPY . . before dependency install is the most common mistake
  • ARG or ENV changes invalidate all subsequent layers
  • RUN apt-get update without pinning creates a layer that Docker considers “unchanged” even when the package index is stale — combine it with apt-get install in one layer
  • Build arguments like --build-arg BUILD_DATE=$(date) bust the cache by design
CI-specific optimization: Use --cache-from with a registry-cached image so CI runners can pull previous layers instead of rebuilding from scratch. GitHub Actions and GitLab CI support this natively with BuildKit cache backends.Red flag answer: Not knowing why COPY . . should come after dependency installation, or thinking that Docker caches based on file timestamps (it uses content hashes).Follow-up chain:
  1. “Your CI builds take 12 minutes. Locally they take 30 seconds. Same Dockerfile. What is wrong?” (CI runners start with an empty Docker cache on every run (ephemeral runners). Locally, you have the cache from previous builds. Fix: use --cache-from type=registry,ref=myapp:cache to pull the previous build’s layers from the registry. BuildKit also supports --cache-to to push the cache after building.)
  2. “You change an ENV value in the middle of your Dockerfile. What happens to caching?” (All layers after the ENV change are invalidated. ENV changes modify the layer’s metadata hash. This is why ENV instructions should be near the top (rarely changing) or near the bottom (application-specific config). Same applies to ARG.)
  3. “How does BuildKit’s cache differ from the legacy builder’s cache?” (BuildKit can: export/import cache from registries, use content-aware caching for COPY (checks file content hashes, not just filenames), build independent stages in parallel, and use cache mount types that persist between builds without being stored in layers. The legacy builder was linear and local-only.)
Answer:
FROM node AS builder
RUN npm run build
FROM nginx
COPY --from=builder /app/dist /html
Result: Tiny generic Nginx image with just static files.Real-World Example with Size Comparison:
# Single-stage (BAD): 1.2GB
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install  # Includes dev dependencies!
COPY . .
RUN npm run build
CMD ["npm", "start"]

# Multi-stage (GOOD): 150MB
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

# Production stage
FROM node:18-alpine  # Smaller base
WORKDIR /app
COPY package*.json ./
RUN npm install --production  # Only prod dependencies
COPY --from=builder /app/dist ./dist
USER node  # Security: non-root
CMD ["node", "dist/server.js"]
Benefits:
  1. Size: 87% smaller (1.2GB → 150MB)
  2. Security: No build tools in production image
  3. Speed: Faster pulls and deployments
  4. Secrets: Build-time secrets don’t leak to final image
Advanced Pattern (Multiple Outputs):
# Build stage
FROM golang:1.20 AS builder
WORKDIR /app
COPY . .
RUN go build -o server .

# Test stage (not included in final)
FROM builder AS tester
RUN go test ./...

# Production
FROM alpine:3.18
COPY --from=builder /app/server /server
ENTRYPOINT ["/server"]

# Build with: docker build --target=tester .  (runs tests)
# Build with: docker build .  (production image)
What interviewers are really testing: Whether you can write Dockerfiles that produce lean, secure production images while keeping the build process efficient. Multi-stage is the single most impactful Dockerfile technique.Red flag answer: “I just use one stage and delete the build tools with RUN rm -rf.” This does not work — deleted files still exist in earlier layers. The only way to exclude build tools from the final image is to never include them in the final stage.Follow-up chain:
  1. “Your multi-stage build copies node_modules from the builder but the final image is still 600MB. Where is the size coming from?” (Likely copying dev dependencies. The builder ran npm install (all deps). The final stage should run npm ci --production or use npm prune --production before copying. Alternatively, copy only dist/ and run a fresh npm ci --production in the final stage.)
  2. “Can you use more than two stages? When would you?” (Yes. Common pattern: Stage 1 (deps) — install dependencies. Stage 2 (builder) — compile/build. Stage 3 (tester) — run tests. Stage 4 (production) — final image. Each stage can be built independently with --target. CI runs --target=tester for tests, production deploys build without --target to get the last stage.)
  3. “How do you handle build-time secrets (like an NPM token for private packages) in a multi-stage build without leaking them?” (Use BuildKit’s --mount=type=secret. The secret is mounted as a tmpfs file during the RUN instruction and never written to any layer. Even if the builder stage is leaked, the secret is not in any layer. Never use ARG for secrets — they appear in docker history.)
What interviewers are really testing: Whether you understand Unix signal handling in containers and have debugged slow shutdowns in production. This is a subtle but impactful issue — it causes Kubernetes pod terminations to timeout at 30 seconds instead of shutting down gracefully in milliseconds.Answer:In a container, your application process is PID 1 — the init process. In a normal Linux system, PID 1 is systemd or init, which has special signal-handling behavior: it only responds to signals it explicitly handles. Most application processes are not designed to be PID 1 and do not set up signal handlers for SIGTERM.What goes wrong: When Kubernetes or Docker sends SIGTERM to stop a container, the signal goes to PID 1. If your app does not handle SIGTERM, the signal is ignored. Docker waits for the grace period (default 10 seconds), then sends SIGKILL (force kill). This means every container shutdown takes a full 10 seconds instead of being instant, and your app does not get a chance to close database connections, flush logs, or finish in-flight requests.The zombie process problem: PID 1 is also responsible for reaping zombie (orphaned) child processes. If your app forks child processes (common in Python, Ruby, or shell scripts), dead children accumulate as zombies because your app does not call wait().Solutions:
# Option 1: Docker's built-in tini (simplest)
# Tini becomes PID 1, forwards signals, reaps zombies
ENTRYPOINT ["tini", "--"]
CMD ["node", "server.js"]

# Option 2: Use --init flag at runtime (no Dockerfile change)
docker run --init myimage

# Option 3: Handle signals in your application code (best for graceful shutdown)
process.on('SIGTERM', async () => {
  console.log('Graceful shutdown initiated');
  await server.close();
  await db.disconnect();
  process.exit(0);
});
Red flag answer: Not knowing what PID 1 means in a container, or saying “just use docker stop -t 0” (which sends SIGKILL immediately, which is the opposite of graceful).Follow-up questions:
  • “Your Kubernetes pods take exactly 30 seconds to terminate (the terminationGracePeriodSeconds default). What is likely happening?” (The app is not handling SIGTERM. Kubernetes sends SIGTERM, waits 30s, then sends SIGKILL. Fix: add a signal handler or use tini.)
  • “What is the difference between shell form CMD and exec form CMD in terms of signal handling?” (Shell form wraps in /bin/sh -c, so SIGTERM goes to the shell, not your app. Exec form runs your app directly as PID 1.)
What interviewers are really testing: Whether you understand container security beyond the application level. Running as root in a container is one of the most common security misconfigurations in production, and it is a direct vector for container escape vulnerabilities.Answer:By default, containers run as root (UID 0). This means if an attacker exploits your application, they have root privileges inside the container. Combined with a kernel vulnerability, this can lead to container escape — gaining root on the host machine.Best practice — create and use a non-root user:
FROM node:18-alpine

# Create a system group and user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup

WORKDIR /app
COPY --chown=appuser:appgroup . .
RUN npm ci --production

# Switch to non-root user BEFORE CMD
USER appuser

CMD ["node", "server.js"]
Key details:
  • Place USER after RUN commands that need root (installing packages, creating directories) but before CMD.
  • Use COPY --chown=appuser:appgroup to set ownership during copy, avoiding a separate RUN chown layer.
  • In Kubernetes, enforce non-root with securityContext: { runAsNonRoot: true } at the pod level. This will reject any container that tries to run as root.
  • Some base images (like node:18) include a built-in node user (UID 1000) — use it instead of creating your own.
Production enforcement: Use OPA Gatekeeper or Kyverno policies in Kubernetes to block any pod that runs as root or has privileged: true. This is a standard security baseline at companies with SOC 2 or ISO 27001 compliance.Red flag answer: “Containers are already isolated, so root inside a container is fine.” This ignores container escape vulnerabilities and shows no awareness of defense-in-depth security principles.
What interviewers are really testing: Whether you understand the difference between liveness and readiness semantics, and how health checks integrate with orchestrators like Docker Swarm and Kubernetes to enable self-healing infrastructure.Answer:Dockerfile HEALTHCHECK:
HEALTHCHECK --interval=30s --timeout=5s --retries=3 --start-period=60s \
  CMD curl -f http://localhost:3000/health || exit 1
Parameters explained:
  • --interval=30s: Check every 30 seconds
  • --timeout=5s: If the check takes longer than 5s, it is a failure
  • --retries=3: After 3 consecutive failures, container status becomes unhealthy
  • --start-period=60s: Grace period after container start (for slow-starting apps like JVM services) during which failures do not count
How orchestrators use health status:
  • Docker Swarm: Unhealthy containers are killed and replaced automatically.
  • Docker standalone: The unhealthy status is visible in docker ps but Docker does NOT automatically restart the container. You need restart policies (--restart=unless-stopped) combined with health checks for self-healing.
  • Kubernetes: Does NOT use Dockerfile HEALTHCHECK. It uses its own probe system: livenessProbe (restart if failed), readinessProbe (remove from load balancer if failed), and startupProbe (disable other probes until app has started).
Best practice for the health endpoint itself:
app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1');  // Check database connectivity
    // Optionally check Redis, external APIs
    res.status(200).json({ status: 'healthy', uptime: process.uptime() });
  } catch (err) {
    res.status(503).json({ status: 'unhealthy', error: err.message });
  }
});
Common mistake: Using curl in HEALTHCHECK requires curl to be installed in the image. Alpine images do not include it by default. Use wget -q --spider instead, or better yet, write a small health-check binary or use the application’s own health endpoint.Red flag answer: “Just use HEALTHCHECK CMD curl localhost” without understanding the parameters, or not knowing that Kubernetes ignores Dockerfile HEALTHCHECK entirely.
Answer:.dockerignore works like .gitignore but for the Docker build context — the directory tree sent to the Docker daemon before building. Without it, everything in your project directory is uploaded.What to exclude and why:
  • .git/ — can be 100MB+ and is never needed inside an image. One team I worked with shaved 600MB from their build context just by adding this.
  • node_modules/ — your Dockerfile should RUN npm ci inside the image. Copying the host’s node_modules causes platform-mismatch bugs (Linux container, macOS host).
  • .env files — secrets should never be baked into images. Use runtime env vars or secrets management.
  • Dockerfile and docker-compose.yml — meta files, not application code.
  • Test directories, documentation, IDE configs (.vscode/, .idea/).
  • Build artifacts (dist/, build/, *.log).
# .dockerignore
.git
node_modules
.env*
Dockerfile*
docker-compose*
*.md
.vscode
.idea
coverage
dist
The security angle: Without .dockerignore, COPY . . copies .env files containing API keys and database passwords into the image layer. That layer is stored permanently in the image — even if you RUN rm .env in a later layer. Anyone with access to the registry can extract it with docker save | tar.What interviewers are really testing: Whether you understand that the build context is a security boundary, not just a performance optimization.Red flag answer: Not knowing .dockerignore exists, or saying “I just copy the files I need with multiple COPY commands” (which is fragile and misses the point).
Answer:Image tagging directly impacts deployment reliability, rollback speed, and auditability. The wrong tagging strategy causes “works on my machine but not in production” at the registry level.Why latest is dangerous:
  • latest is not a special tag — it is just the default when no tag is specified. It does not mean “most recent.” If you push v2.0 without also pushing latest, latest still points to whatever it pointed to before.
  • Two developers pull latest an hour apart and get different images. Debugging becomes impossible.
  • Kubernetes imagePullPolicy: Always with latest tags causes non-deterministic deployments.
Recommended strategies:
  1. Semantic versioning: myapp:1.2.3, myapp:1.2, myapp:1. The more specific tag is immutable; the less specific floats. Users who pin to 1.2.3 get determinism. Users who pin to 1.2 get patch updates.
  2. Git commit SHA: myapp:a1b2c3d. Every image is traceable to exact source code. Combined with CI, this creates a full audit trail. This is the strategy used by most mature CI/CD pipelines.
  3. Hybrid: Tag with both semver and SHA: myapp:1.2.3 and myapp:a1b2c3d pointing to the same manifest. Semver for human readability, SHA for automation.
Production enforcement: In Kubernetes, use admission policies (Kyverno/OPA) to reject any pod using :latest or untagged images. Require digest pinning (@sha256:...) for the highest security.What interviewers are really testing: Whether you think about image identity as a deployment concern, not just a naming convention.Red flag answer: “I always use latest and just rebuild.” No awareness of immutable tags or deployment traceability.
Answer:Both set variables in a Dockerfile, but their scope and lifecycle differ fundamentally:
FeatureARGENV
Available duringBuild time onlyBuild time AND run time
Persists in imageNo — gone after build completesYes — baked into the image metadata
Override at build--build-arg KEY=valueCannot override at build time
Override at runN/A (does not exist at runtime)-e KEY=value or --env-file
Layer cachingChanging an ARG value busts cache for all subsequent layersChanging ENV busts cache similarly
Common pattern — combining both:
ARG NODE_VERSION=18
FROM node:${NODE_VERSION}

ARG BUILD_DATE
ENV APP_BUILD_DATE=${BUILD_DATE}
The ARG controls which base image to use (build-time decision). The ENV bakes metadata into the image that the running container can read.Security gotcha: Never use ARG for secrets. ARG values are visible in docker history even though they do not persist at runtime. Use BuildKit’s --mount=type=secret instead:
# syntax=docker/dockerfile:1
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci
What interviewers are really testing: Whether you understand the build-time vs run-time boundary and the security implications of each.Red flag answer: “ARG and ENV are the same thing.” Or using ARG to pass API keys into the build.
Answer:Flattening merges all layers into a single layer using docker export (which exports a container’s filesystem) and docker import (which creates a new image from that filesystem).
docker export mycontainer | docker import - myimage:flat
Trade-offs: Flattening reduces image size by eliminating intermediate layer overhead and removes duplicate files that exist in multiple layers. However, you lose all layer caching (future builds start from scratch), Dockerfile history (docker history shows nothing useful), and the ability to share base layers with other images. In practice, multi-stage builds are a better optimization than flattening.

4. Troubleshooting & Operations

Answer:docker exec spawns a new process inside an already-running container. This is the primary tool for debugging containers in real time.
docker exec -it <container_id> /bin/sh      # Interactive shell
docker exec <container_id> cat /etc/hosts   # Run a single command
docker exec -u root <container_id> bash     # Exec as root even if container runs as non-root
Important: exec does not restart the container or affect the main process. It creates an additional process that shares the container’s namespaces (filesystem, network, PID space). This means you can inspect files, run curl localhost:3000/health, check environment variables, or install debugging tools — all without disrupting the running application.Production tip: If your image is distroless (no shell), you cannot exec into it. Use ephemeral debug containers in Kubernetes (kubectl debug -it pod/myapp --image=busybox) or build a debug variant of your image for staging environments.
Answer:Docker captures everything written to STDOUT and STDERR by the container’s PID 1 process. This is the foundation of the “12-Factor App” logging principle — applications should not manage their own log files; they write to stdout and the platform handles routing.
docker logs <container_id>            # All logs
docker logs -f <container_id>         # Follow (tail -f equivalent)
docker logs --since 10m <container_id> # Last 10 minutes
docker logs --tail 100 <container_id>  # Last 100 lines
Logging drivers determine where logs are stored and forwarded:
  • json-file (default): Logs stored as JSON in /var/lib/docker/containers/<id>/. Supports docker logs. Can grow unbounded — set max-size and max-file to prevent disk exhaustion.
  • syslog: Forward to a syslog server.
  • awslogs: Forward directly to CloudWatch (no agent needed).
  • fluentd: Forward to Fluentd/Fluent Bit for aggregation.
Production gotcha: If you switch to a non-default logging driver (like fluentd), docker logs no longer works for that container. This trips up many operators during incident debugging. The workaround is to use the dual-logging feature (Docker 20.10+) or keep a local json-file driver alongside the remote driver.
Answer:docker inspect returns a comprehensive JSON blob with every detail about a container, image, network, or volume. It is the single most useful debugging command.
docker inspect <container_id>

# Extract specific fields with Go templates:
docker inspect -f '{{.NetworkSettings.IPAddress}}' <container_id>
docker inspect -f '{{.State.OOMKilled}}' <container_id>
docker inspect -f '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{end}}' <container_id>
Key fields to look for during debugging:
  • State.OOMKilled: Was the container killed for exceeding memory limits?
  • State.ExitCode: Non-zero means the process crashed. 137 = OOMKilled/SIGKILL, 143 = SIGTERM.
  • NetworkSettings.IPAddress: The container’s IP on the bridge network.
  • Config.Env: All environment variables (check for misconfiguration).
  • Mounts: Volume and bind mount mappings.
  • HostConfig.RestartPolicy: Current restart policy.
Answer:A container runs only as long as its main process (PID 1) is running. When that process exits, the container stops. This is the most common confusion for Docker beginners.Common causes:
  1. CMD runs a short-lived command: CMD ["echo", "hello"] prints “hello” and exits. The container stops immediately.
  2. Application crashes on startup: Missing env vars, bad config, port conflict. Check docker logs.
  3. Shell form without foreground process: CMD node server.js & backgrounds the process, so the shell has nothing to wait for and exits.
Fix: Ensure your CMD/ENTRYPOINT runs a foreground process that stays alive. Use docker logs <id> to see what the process printed before exiting, and docker inspect -f '{{.State.ExitCode}}' to check the exit code.
Answer:This is the most common networking mistake in Docker. Your application inside the container listens on 127.0.0.1:3000, but when you try to reach it from the host via -p 3000:3000, you get “connection refused.”Why it happens: 127.0.0.1 means “only accept connections from this network namespace.” Since the host is in a different namespace than the container, connections from the host are rejected. Inside the container, localhost is the container’s own loopback, not the host’s.Fix: Configure your application to listen on 0.0.0.0 (all interfaces), which accepts connections from any network namespace — including the host via the bridge network.
# Node.js
server.listen(3000, '0.0.0.0');

# Python Flask
app.run(host='0.0.0.0', port=3000)

# Rails
rails server -b 0.0.0.0
This catches experienced developers too — frameworks like Rails and Django often default to 127.0.0.1 in development mode.
What interviewers are really testing: Can you distinguish between the container-level OOM (cgroup hit limit) and the node-level OOM (host out of memory), and do you know how to debug a real memory leak through layered observability?Answer: Exit code 137 = 128 + 9 (SIGKILL), signalling that the kernel OOM killer (or its cgroup-aware cousin, memory.oom_control) terminated the process. Inside Docker, this almost always means the container hit its memory cgroup limit; occasionally it means the whole host is OOM and the kernel picked your container as a victim based on oom_score_adj.Diagnosis workflow:
  1. docker inspect -f '&#123;&#123;.State.OOMKilled&#125;&#125;' <id> — returns true if the cgroup killed it. If false but exit code is 137, it was host-level OOM or an external kill -9.
  2. docker stats --no-stream <id> (while running) or docker events --filter container=<id> — real-time memory and OOM events.
  3. Inside Kubernetes: kubectl describe pod <pod> -> Last State: Terminated, Reason: OOMKilled, Exit Code: 137.
  4. dmesg | grep -i "killed process" on the node — kernel log of which PID was killed and why.
  5. Check cgroup memory stats: cat /sys/fs/cgroup/memory/docker/<id>/memory.max_usage_in_bytes (cgroup v1) or memory.peak (v2).
Common causes, in order of frequency:
  1. Memory leak in app code — heap grows unboundedly. Fix: heap profiling (pprof, jemalloc, memray for Python).
  2. JVM heap not sized for container-Xmx was set without considering off-heap (metaspace, threads, direct buffers, JIT code cache). Rule of thumb: -Xmx should be ~70-75% of container limit. Use -XX:MaxRAMPercentage=75.0 so the JVM auto-detects container limits.
  3. Loading large files into memory — reading a 2GB CSV with pandas, not streaming. Fix: chunked iteration.
  4. Unbounded caches — in-process LRU with no size cap, or Redis without maxmemory.
  5. Page cache pressure — a file-heavy workload on a tight container can cause the working set to exceed the limit. memory.oom.group may kill the whole cgroup.
Senior vs Staff perspective
  • Senior: Identifies the cause through docker stats and logs, fixes the leak or raises the limit.
  • Staff: Designs the memory management strategy — mandatory memory requests/limits in CI, JVM Kubernetes-aware flags templated into base images, Prometheus alerts on container_memory_working_set_bytes / container_spec_memory_limit_bytes &gt; 0.85, a runbook linking exit 137 -> heap dump collection, and a post-incident process where “OOMKilled in prod” auto-creates a Jira with the heap dump attached.
Follow-up chain:
  1. “Your container was killed with exit 137 but docker stats shows memory was well under the limit at the time of death. What is happening?” — docker stats samples every few seconds; a momentary spike can kill the container between samples. Or the kill came from host-level OOM (check /var/log/kern.log). Or a subprocess forked and its RSS counted against the cgroup briefly.
  2. “How is Java’s ‘Container-aware JVM’ different in Java 10+ vs older versions?” — Pre-Java 10, the JVM read /proc/meminfo directly, seeing the host’s memory, and would set heap to a fraction of host RAM — blowing past the container limit. Java 10+ respects cgroup limits by default; flags like -XX:MaxRAMPercentage target container RAM. Upgrading old JVMs is often the fastest OOM fix.
  3. “What is the difference between memory and memory.swap limits in Docker?” — --memory=512m caps RAM. --memory-swap=1g caps RAM+swap total (so 512MB swap allowed). --memory-swappiness=0 disables swap. Most container platforms disable swap entirely because swap defeats the point of memory limits — you just degrade into thrashing before OOM.
  4. “How do you capture a heap dump from a container that is OOMKilling in a loop?” — Add a preStop or automatic dump on OOM: JVM -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap with a PVC mounted at /tmp. For Python, use faulthandler + tracemalloc snapshots on SIGUSR1. For Go, pprof.WriteHeapProfile on SIGUSR1. The key is dumping before the kill, since SIGKILL is non-catchable.
Work-sample scenario: Your Node.js API is OOMKilled every ~3 hours in production. Memory limit is 1GB. Walk through your diagnosis.
  • Step 1: kubectl top pod over time — is memory growing linearly (leak) or spiking (load-related)?
  • Step 2: If linear, classic leak. Take a heap snapshot: kill -SIGUSR2 <pid> with Node --heapsnapshot-signal=SIGUSR2, then chrome://inspect to analyze.
  • Step 3: Common Node leaks: unbounded event listeners (emitter.on in a hot path without off), closures capturing large scope, global Map or Set used as cache, promise chains retaining references.
  • Step 4: Short-term: raise the limit to 2GB and add autoscaling to reduce per-pod pressure. Medium-term: fix the leak identified in heap diff.
  • Step 5: Guardrail: add --max-old-space-size=768 (75% of 1GB) to Node so V8 does GC pressure before cgroup kills.
What weak candidates say: “Increase the memory limit.” — Treats the symptom. If it is a leak, the container will just take longer to die.What strong candidates say: “Exit 137 is the container saying ‘I hit my memory limit.’ My first question is: was this a leak, a spike, or a mis-sized limit? Each has a different fix. I use metrics to distinguish — linear growth = leak, sawtooth pattern = GC doing its job, sudden spike = load issue. I never just raise the limit without knowing which category it falls in.”
Answer:Docker does not automatically clean up stopped containers, dangling images, or unused networks. Over time, this accumulates significant disk usage.
CommandWhat it removes
docker container pruneStopped containers
docker image pruneDangling images (untagged)
docker image prune -aAll unused images
docker volume pruneVolumes not attached to any container
docker network pruneUnused networks
docker system pruneContainers + images + networks (not volumes)
docker system prune -a --volumesEverything unused
Safety note: docker volume prune is the most dangerous — it deletes data volumes with no confirmation beyond the initial prompt. Always verify with docker volume ls first. In production, schedule automated pruning with the --filter "until=168h" flag to preserve recent resources.
Answer:docker events streams a real-time feed of actions happening on the Docker daemon — container lifecycle events (create, start, die, destroy), image events (pull, push, tag), volume and network events.
docker events                                           # Stream all events
docker events --filter 'event=die'                      # Only container death events
docker events --filter 'container=myapp' --since 1h     # Events for specific container
Use case: Integration with monitoring systems. You can pipe docker events to a log aggregator to track container restarts, OOM kills, and image pulls. This is how some teams detect crashlooping containers before Kubernetes restarts mask the problem.
Answer:docker stats provides a live-updating view of resource consumption per container, similar to top for Docker.
docker stats                     # All running containers
docker stats myapp mydb          # Specific containers
docker stats --no-stream         # Single snapshot (useful for scripting)
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"  # Custom columns
Columns: Container name/ID, CPU %, memory usage/limit, network I/O, disk I/O, PIDs. This is your first stop when investigating performance issues. If a container shows 95%+ CPU, you know to profile the application. If memory usage steadily climbs toward the limit, you likely have a leak.
Answer:Restart policies determine what happens when a container’s main process exits.
PolicyBehavior
no (default)Never restart. Container stays stopped.
on-failure[:max-retries]Restart only if exit code is non-zero. on-failure:5 retries up to 5 times.
alwaysAlways restart, including after daemon restart. Does NOT restart if manually stopped with docker stop.
unless-stoppedLike always, but does not restart if it was manually stopped before the daemon restarted.
Production recommendation: Use unless-stopped for services you want to survive host reboots (the Docker daemon restarts automatically via systemd). Use on-failure:5 for batch jobs where infinite restarts would be harmful. Never use always in Kubernetes — the kubelet handles restarts via the pod spec’s restartPolicy.Backoff behavior: Docker applies an exponential backoff delay between restarts, starting at 100ms and doubling up to a cap of 1 minute. This prevents a crashlooping container from consuming all system resources.

5. Security & Ecosystem

Answer:Namespaces are the Linux kernel feature that provides isolation for containers. Each namespace type gives the container its own isolated view of a specific system resource:
NamespaceIsolatesEffect
PIDProcess IDsContainer sees its own process tree starting at PID 1. Cannot see or signal host processes.
NETNetwork stackContainer gets its own IP address, routing table, and port space. Two containers can both listen on port 80.
MNTFilesystem mountsContainer has its own root filesystem. Cannot see host files unless explicitly mounted.
UTSHostnameContainer can have its own hostname distinct from the host.
IPCInter-process communicationShared memory and semaphores are isolated per container.
USERUser/Group IDsRoot (UID 0) inside the container can map to a non-root UID on the host. This is the foundation of rootless containers.
The security story: Namespaces create the illusion of a dedicated machine, but containers still share the host kernel. A kernel vulnerability (like Dirty Pipe, CVE-2022-0847) can potentially escape namespace isolation. This is why defense-in-depth (non-root user + seccomp + AppArmor/SELinux + read-only filesystem) matters.Follow-up chain:
  1. “What is the USER namespace and why is it not enabled by default in Docker?” (USER namespace remaps UIDs: root (UID 0) inside the container maps to an unprivileged UID (e.g., 100000) on the host. Even if an attacker escapes the container as “root,” they are nobody on the host. Docker does not enable it by default because it breaks volume permissions — files created by the remapped UID are not owned by the expected host user. Enable with "userns-remap": "default" in daemon.json.)
  2. “Can you create a container that shares the host’s PID namespace? When would you want this?” (Yes, --pid=host. The container can see all host processes. Useful for monitoring/debugging tools like strace, process managers, and sidecar containers that need to send signals to host processes. Security risk: the container can kill -9 any host process.)
  3. “How do namespaces interact with capabilities? If a container has NET_ADMIN capability, does it affect the host network?” (Only if the container shares the host’s NET namespace. With a separate NET namespace (the default), NET_ADMIN only allows modifying the container’s own network stack. Capabilities are scoped to the namespace they operate in — this is the layered security model.)
Answer:While namespaces provide isolation (what a container can see), cgroups provide resource limiting (how much a container can use). They are the other half of the container security and performance model.What cgroups control:
  • CPU: Limit to N cores or a percentage of host CPU. Throttling (not killing) when exceeded.
  • Memory: Hard limit — kernel OOM-kills the container’s process if exceeded. Soft limit — triggers reclaim but does not kill.
  • Disk I/O: Limit read/write bandwidth to storage devices (BPS and IOPS).
  • PIDs: Limit the number of processes a container can create (prevents fork bombs).
docker run --cpus="1.5" --memory="512m" --pids-limit=100 myapp
cgroup v1 vs v2: Linux distributions are migrating to cgroup v2 (unified hierarchy), which provides better resource accounting and the ability to limit resources on a per-thread basis. Docker and Kubernetes both support cgroup v2 on modern kernels. If you are on an older kernel with cgroup v1, be aware that memory accounting can be slightly inaccurate.Follow-up chain:
  1. “You set --cpus=2 on a container. The host has 8 cores. Does the container see 2 cores or 8?” (It sees 8 cores — cgroups limit time, not visibility. /proc/cpuinfo inside the container shows all host CPUs. The container gets 200% of a single core’s CPU time, distributed across any available cores. This confuses JVM and Go runtime auto-detection — they may spawn 8 threads thinking they have 8 cores, then contend for 2 cores’ worth of CPU time. Use GOMAXPROCS or -XX:ActiveProcessorCount to override.)
  2. “What is the difference between CPU --cpus (quota) and --cpuset-cpus (pinning)?” (--cpus=2 gives you 2 cores’ worth of time on any core (scheduler decides). --cpuset-cpus="0,1" pins the container to physical cores 0 and 1 only. Pinning is useful for latency-sensitive workloads (avoids cache misses from core migration) but reduces scheduling flexibility. In practice, combine them: --cpuset-cpus="0,1" --cpus=2.)
  3. “A container is set to --memory=1g but free -m inside the container shows 32GB (the host’s RAM). Why?” (/proc/meminfo is not namespaced in cgroup v1 — it shows the host’s memory. This breaks applications that auto-tune based on available memory (JVM, Node.js, Python ML libraries). cgroup v2 with lxcfs or setting MALLOC_ARENA_MAX can mitigate this. Modern JVMs (11+) read from /sys/fs/cgroup/memory.max instead.)
Answer:The Docker socket (/var/run/docker.sock) is the API endpoint for the Docker daemon. Mounting it inside a container gives that container full control over Docker on the host — it can create, delete, and inspect any container, pull images, mount host filesystems, and effectively gain root access to the host machine.Why teams mount it: CI/CD agents (Jenkins, GitLab Runner), monitoring tools (Portainer, cAdvisor), and log collectors sometimes need Docker API access.The risk: A compromised container with socket access can run docker run -v /:/host --privileged alpine to mount the entire host filesystem with root access. This is a complete host compromise.Mitigations:
  • Use Docker-in-Docker (DinD) instead of socket mounting for CI/CD. DinD runs a separate Docker daemon inside the container.
  • Use Podman or Kaniko for building images without a daemon.
  • If you must mount the socket, use a TCP proxy (like tecnativa/docker-socket-proxy) that restricts which API endpoints the container can access.
  • In Kubernetes, avoid mounting the socket entirely — use Kaniko for in-cluster image builds.
Follow-up chain:
  1. “A monitoring tool requires Docker API access to list containers. The security team vetoes socket mounting. What alternatives exist?” (1. Use tecnativa/docker-socket-proxy — a HAProxy-based proxy that exposes only safe read-only endpoints. 2. Use the Docker REST API over TLS instead of the socket. 3. Use cAdvisor or the Prometheus node exporter which read from cgroups and /proc directly, no Docker socket needed. 4. In Kubernetes, use the Kubelet API or metrics-server instead.)
  2. “Can you detect if a container has the Docker socket mounted?” (Yes. docker inspect -f '{{range .Mounts}}{{.Source}}{{end}}' <id> shows all mounts. In Kubernetes, OPA Gatekeeper or Kyverno policies can block any pod mounting /var/run/docker.sock. At the host level, audit inotify watches on the socket or use Falco for runtime detection.)
Answer:docker run --privileged disables virtually all security features: it grants the container all Linux capabilities, access to all host devices (/dev), and disables seccomp, AppArmor, and SELinux profiles. The container can do anything the host root can do, including loading kernel modules, modifying iptables rules, and mounting filesystems.When it is legitimately needed: Running Docker-in-Docker, running certain system monitoring tools, or managing host networking. These cases are rare.Better alternatives: Instead of --privileged, grant only the specific capabilities needed with --cap-add. For example, a container that needs to modify network settings only needs --cap-add=NET_ADMIN, not full privileged mode. This follows the principle of least privilege.Red flag in production: Any container running with --privileged in production is a security audit failure. Use Kubernetes PodSecurityStandards or OPA Gatekeeper to block privileged containers at the admission level.
Answer:Docker Content Trust (DCT) provides image signing and verification using Notary. When enabled, Docker only pulls and runs images that have been signed by a trusted publisher.
export DOCKER_CONTENT_TRUST=1
docker pull myregistry/myimage:latest   # Fails if image is not signed
docker push myregistry/myimage:latest   # Automatically signs the image
Why this matters: Without DCT, a compromised registry or a man-in-the-middle attack could serve a tampered image. DCT ensures cryptographic verification that the image you pull is exactly what the publisher pushed. In regulated industries (finance, healthcare), image signing is often a compliance requirement.Alternative: Cosign (from Sigstore) is increasingly preferred over Notary for image signing because it integrates with OCI registries, supports keyless signing via OIDC, and has better Kubernetes integration via policy controllers.Follow-up chain:
  1. “What is the difference between Docker Content Trust (Notary v1) and cosign? Which would you recommend for a new project?” (Cosign. Notary v1 requires running your own Notary server, has a complex key management model, and is tightly coupled to Docker. Cosign stores signatures as OCI artifacts in the same registry as the image, supports keyless signing via GitHub Actions OIDC (no long-lived keys to manage), and integrates with Kyverno/OPA for Kubernetes admission control. Notary v2 (now called Notation) is a newer standard but cosign has broader adoption.)
  2. “How does keyless signing with cosign work? Where is the private key?” (There is no persistent private key. Cosign uses the Sigstore transparency log (Rekor) and a short-lived certificate from Fulcio. The CI job authenticates via OIDC (e.g., GitHub Actions identity token), Fulcio issues an ephemeral signing certificate, the image is signed, and the signature is recorded in the Rekor transparency log. Verification checks the Rekor log entry and the OIDC identity. The key exists only for milliseconds.)
  3. “An attacker pushes a malicious image with the same tag to your registry. How does signing prevent this from reaching production?” (Your Kubernetes admission controller (Kyverno or OPA) verifies the cosign signature before allowing a pod to run. The attacker’s image is not signed by your CI pipeline’s OIDC identity, so the signature check fails and the pod is rejected. Without signing, tag-based pulls are vulnerable to tag overwriting attacks.)
Answer:Docker Compose is a tool for defining and running multi-container applications using a declarative YAML file. Instead of running multiple docker run commands with complex flags, you describe your entire application stack in docker-compose.yml and bring it up with one command.
services:
  api:
    build: ./api
    ports: ["3000:3000"]
    environment:
      DATABASE_URL: postgres://db:5432/myapp
    depends_on:
      db:
        condition: service_healthy
  db:
    image: postgres:15
    volumes: ["pgdata:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
volumes:
  pgdata:
Compose v2: The docker-compose Python-based CLI has been replaced by docker compose (a Go plugin built into Docker CLI). Compose v2 is faster and supports profiles, service dependencies with health checks, and watch mode for development (docker compose watch).Production note: Compose is excellent for development, testing, and single-host deployments. For multi-host production orchestration, use Kubernetes. Some teams use Compose for local dev and generate Kubernetes manifests from the same service definitions using tools like Kompose.Follow-up chain:
  1. “What is the difference between depends_on and health-check-based ordering in Compose v2?” (depends_on without conditions only waits for the container to start, not for the service to be ready. Your database container might start in 100ms but take 5 seconds to accept connections. Compose v2 supports depends_on: { db: { condition: service_healthy } } which waits until the healthcheck passes. This eliminates the need for hacky wait-for-it.sh scripts.)
  2. “When would you use Compose profiles?” (Profiles let you define optional services that only start when explicitly requested. Example: a debug profile that includes a pgAdmin container and a Redis Commander UI. docker compose --profile debug up starts everything including debug tools. Without the flag, only the core services start. This keeps your default startup fast.)
  3. “Your team uses Compose for local dev. How do you keep the Compose file and Kubernetes manifests in sync?” (Three approaches: 1. Use kompose convert to generate K8s manifests from Compose files (good for simple cases, breaks on Compose-specific features). 2. Maintain both separately with CI checks that compare exposed ports, env vars, and volume mounts. 3. Use a shared values layer — Helm chart values and Compose .env files sourced from the same config. In practice, most teams maintain both separately because the environments have fundamentally different needs.)
What weak candidates say: “Compose is for production.” It is not designed for multi-host production. Or confuse Compose v1 (docker-compose) with v2 (docker compose) syntax differences.What strong candidates say: “I use Compose as my local development contract — it defines the services, networks, and volumes my app needs. The same service topology is replicated in Kubernetes manifests for staging and production, but with different resource limits, health checks, and scaling policies.”
Answer:Docker Swarm is Docker’s built-in orchestration tool. It turns a pool of Docker hosts into a single virtual host with service discovery, load balancing, rolling updates, and encrypted overlay networking.Why it lost to Kubernetes: Swarm was simpler to set up (a single docker swarm init) but lacked the extensibility, ecosystem, and community momentum that Kubernetes built. By 2020, most cloud providers had dropped or deprioritized Swarm support in favor of managed Kubernetes. Swarm is still maintained and included in Docker, but new projects overwhelmingly choose Kubernetes.When Swarm still makes sense: Very small teams (1-3 people) running fewer than 10 services who want orchestration without the operational complexity of Kubernetes. The learning curve is hours, not weeks.
Answer:
FeatureDockerPodman
ArchitectureClient-server (requires a daemon running as root)Daemonless (each container is a child process of Podman)
Root requirementDaemon runs as root by default (rootless mode available since 20.10)Rootless by default (no daemon = no root process)
OCI complianceOCI-compatibleFully OCI-compliant
CLI compatibilityN/ADrop-in replacement (alias docker=podman works for most commands)
Systemd integrationRequires separate configurationGenerates systemd unit files natively (podman generate systemd)
ComposeNative supportSupports docker-compose.yml via podman-compose or podman compose
Why Podman is gaining traction: The daemonless architecture eliminates the security risk of a privileged daemon process. Red Hat, SUSE, and other enterprise Linux distributions ship Podman instead of Docker by default. In RHEL 8+, Docker is not even available in the default repositories.Practical impact: For most developers, the choice between Docker and Podman is transparent — the CLI and Dockerfile format are identical. The difference matters most for security-conscious environments and enterprise Linux deployments.
Answer:Distroless images (from Google, gcr.io/distroless/) contain only your application and its runtime dependencies — no shell, no package manager, no ls, no curl, no bash. The image has the bare minimum to run your application.Available base images: distroless/static (for statically compiled binaries like Go/Rust), distroless/base (with glibc), distroless/java, distroless/nodejs, distroless/python3.Why use them:
  • Security: No shell means an attacker who gains code execution inside the container cannot easily pivot — no wget to download tools, no bash to run scripts. CVE scanners typically find 80-90% fewer vulnerabilities in distroless vs. Debian-based images.
  • Size: distroless/static is ~2 MB vs. alpine at ~5 MB vs. debian at ~120 MB.
Trade-off: Debugging is significantly harder. You cannot docker exec -it ... /bin/sh. The workaround is Kubernetes debug containers (kubectl debug) or building a debug variant image for staging environments that includes a shell.
Answer:Seccomp (Secure Computing Mode) is a Linux kernel feature that filters which system calls a process can make. Docker applies a default seccomp profile that blocks ~44 of the ~300+ available syscalls — dangerous ones like reboot(), swapon(), mount(), kexec_load(), and ptrace().How it works: When a container process makes a blocked syscall, the kernel immediately terminates it with SIGKILL. The process never gets to execute the dangerous operation.Custom profiles: You can create stricter profiles for security-sensitive containers. For example, a stateless API server that only needs network I/O and file reads can be restricted to a very narrow set of syscalls. The Docker default profile is a good baseline, but production-grade security uses custom profiles generated by tools like strace (to capture which syscalls your application actually uses) or OCI runtime spec generators.
docker run --security-opt seccomp=custom-profile.json myapp
Kubernetes integration: Kubernetes supports seccomp profiles via the securityContext.seccompProfile field. Since Kubernetes 1.27, the RuntimeDefault seccomp profile is applied by default when using the restricted Pod Security Standard.

5. Docker Medium Level Questions

Answer:
version: '3.8'
services:
  web:
    image: nginx
    ports:
      - "8080:80"
    depends_on:
      - db
  db:
    image: postgres:14
    environment:
      POSTGRES_PASSWORD: secret
    volumes:
      - db-data:/var/lib/postgresql/data

volumes:
  db-data:
Answer:
services:
  frontend:
    networks:
      - frontend-net
  backend:
    networks:
      - frontend-net
      - backend-net
  database:
    networks:
      - backend-net

networks:
  frontend-net:
  backend-net:
Answer:
# docker-compose.yml
services:
  app:
    environment:
      - NODE_ENV=production
      - API_KEY=${API_KEY}
    env_file:
      - .env
# .env file
API_KEY=secret123
DATABASE_URL=postgres://localhost/db
Answer:
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD curl -f http://localhost/ || exit 1
# docker-compose.yml
services:
  web:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost"]
      interval: 30s
      timeout: 3s
      retries: 3
Answer:
# ARG: build-time only
ARG NODE_VERSION=18
FROM node:${NODE_VERSION}

# ENV: runtime available
ENV NODE_ENV=production
ENV PORT=3000
Answer:
# Tag image
docker tag myapp:latest registry.example.com/myapp:v1.0

# Push
docker push registry.example.com/myapp:v1.0

# Pull
docker pull registry.example.com/myapp:v1.0
Answer:
# Remove unused images
docker image prune -a

# Remove stopped containers
docker container prune

# Remove unused volumes
docker volume prune

# Remove everything
docker system prune -a --volumes
Answer:
# View logs
docker logs container-name

# Follow logs
docker logs -f container-name

# Last 100 lines
docker logs --tail 100 container-name

# With timestamps
docker logs -t container-name
Answer:
# Real-time stats
docker stats

# Specific container
docker stats container-name

# Format output
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Answer:
# Full details
docker inspect container-name

# Specific field
docker inspect -f '{{.NetworkSettings.IPAddress}}' container-name

# Multiple containers
docker inspect $(docker ps -q)

6. Docker Advanced Level Questions

Answer:
# Enable buildx
docker buildx create --use

# Build for multiple platforms
docker buildx build --platform linux/amd64,linux/arm64 \
  -t myapp:latest --push .
Answer:BuildKit is Docker’s next-generation build engine, enabled by default since Docker 23.0. It replaces the legacy builder with parallel build execution, better caching, and new Dockerfile features that are impossible with the old builder.Cache mounts — persistent build caches that survive between builds but are never stored in the final image:
# syntax=docker/dockerfile:1
FROM node:18

# Cache npm packages across builds -- /root/.npm persists between builds
RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Cache apt packages -- avoids re-downloading on every build
RUN --mount=type=cache,target=/var/cache/apt \
    --mount=type=cache,target=/var/lib/apt \
    apt-get update && apt-get install -y python3
Secret mounts — inject secrets at build time without leaking them into image layers:
# syntax=docker/dockerfile:1
RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm ci --registry=https://npm.example.com

# Build with: docker build --secret id=npm_token,src=.npmrc .
The secret is mounted as a tmpfs file. It never appears in docker history or any layer.SSH mounts — forward SSH agent for private git repo access during build:
RUN --mount=type=ssh git clone git@github.com:private/repo.git
# Build with: docker build --ssh default .
Parallel builds: BuildKit analyzes the Dockerfile DAG and builds independent stages in parallel. In a multi-stage build with 3 independent builder stages, BuildKit runs all 3 simultaneously. The legacy builder ran them sequentially.What interviewers are really testing: Whether you know BuildKit exists, use its features (especially secrets and cache mounts), and understand why it replaced the legacy builder. This separates engineers who wrote Dockerfiles in 2019 from those who write them today.Follow-up chain:
  1. “How do you enable BuildKit in CI if the runner has an older Docker version?” (Set DOCKER_BUILDKIT=1 environment variable before Docker 23.0. On 23.0+, it is the default. In GitHub Actions, the docker/build-push-action uses BuildKit by default.)
  2. “What is the difference between --mount=type=cache and --cache-from?” (Cache mounts persist build-time dependencies (npm cache, apt cache) across builds locally. --cache-from imports layer cache from a remote registry image, enabling cache sharing across CI runners. They solve different problems — use both together for maximum speed.)
  3. “A developer adds RUN --mount=type=secret but forgets to pass --secret at build time. What happens?” (The build fails with an error about missing secret. This is the safe behavior — builds fail closed, not open. You can make secrets optional with RUN --mount=type=secret,id=token,required=false.)
  4. “Your CI builds still take 8 minutes after enabling BuildKit. What are the likely remaining issues?” — Cache mounts not configured for npm/pip/go modules. --cache-from/--cache-to not pointing at a registry mirror. Sequential stages that could be parallel (missing independent FROM lines). Large COPY . . before dependency install. Fix each: docker buildx bake with a BuildKit registry cache backend usually shaves 60-80% off CI build time.
Senior vs Staff perspective
  • Senior: Uses BuildKit cache mounts, secret mounts, and --cache-from appropriately. Knows BuildKit is the default on modern Docker.
  • Staff: Designs the build cache topology for the org — regional registry mirrors for cache, CI-wide cache-key conventions (git SHA vs semver vs branch), build-time attestations (SBOM, provenance via --provenance=mode=max), and a multi-arch build strategy (QEMU vs native ARM runners). Also enforces: no secrets in Dockerfile (pre-commit + PR checks), SBOM attached to every pushed image, and parallel builds with buildx bake for monorepos. Tracks build-time SLO as a team metric.
Work-sample scenario: Your monorepo has 30 services. Each CI build takes 10-15 minutes. Walk through the optimization plan.
  • Measure: identify the bottleneck. Usually: (a) no layer cache reuse across CI runs, (b) sequential builds where parallelism is possible, (c) repeated downloads of the same dependencies.
  • Fix 1: BuildKit + --cache-from type=registry,ref=ghcr.io/org/service:buildcache. CI pulls the previous layers before build. First-time cold: 12 min; warm: 90 seconds.
  • Fix 2: Cache mounts for package managers (npm, pip, go mod). Saves 1-3 minutes per service.
  • Fix 3: docker buildx bake with a docker-bake.hcl file — parallelizes up to N builds across CI runners.
  • Fix 4: only build services whose code changed (git diff --name-only -> map to service directories). Unchanged services skip entirely.
  • Expected outcome: per-service build time ~60s warm, and only rebuild affected services. CI wall-clock for typical PR: 2-3 minutes.
Answer:Image signing ensures that the image you deploy is the exact image your CI pipeline built — no tampering, no substitution.Docker Content Trust (Notary v1):
export DOCKER_CONTENT_TRUST=1
docker push myapp:latest   # Signs automatically
docker pull myapp:latest   # Fails if not signed
Limited adoption because it requires running your own Notary server and managing long-lived signing keys.Cosign (modern standard):
# Sign after CI build
cosign sign --key cosign.key registry.example.com/myapp:v1.0

# Verify before deploy
cosign verify --key cosign.pub registry.example.com/myapp:v1.0

# Keyless signing with GitHub Actions OIDC (recommended)
cosign sign --yes registry.example.com/myapp@sha256:abc123
Notation (Notary v2 / OCI standard):
notation sign registry.example.com/myapp:v1.0
notation verify registry.example.com/myapp:v1.0
Notation stores signatures as OCI reference artifacts. It is backed by AWS (used in ECR) and Microsoft (used in ACR).Enforcement in Kubernetes:
# Kyverno policy -- reject unsigned images
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: verify-image-signature
spec:
  rules:
  - name: check-cosign
    match:
      resources:
        kinds: [Pod]
    verifyImages:
    - imageReferences: ["registry.example.com/*"]
      attestors:
      - entries:
        - keyless:
            issuer: "https://token.actions.githubusercontent.com"
            subject: "https://github.com/myorg/*"
What interviewers are really testing: Whether you understand supply chain security beyond “use official images.” Staff-level candidates explain the full chain: CI builds image, signs it with cosign (keyless via OIDC), attaches an SBOM, and Kubernetes admission control rejects unsigned images.Red flag answer: “We trust Docker Hub because they have official images.” Official images have had CVEs and supply chain issues. Trust but verify.
Answer:Rootless containers eliminate the single biggest Docker security risk: the Docker daemon running as root on the host. There are two distinct approaches, and strong candidates explain both.Approach 1 — User Namespace Remapping (Docker daemon still runs as root):
// /etc/docker/daemon.json
{
  "userns-remap": "default"
}
Root (UID 0) inside the container maps to an unprivileged UID (e.g., 100000) on the host. If an attacker escapes the container as “root,” they land as UID 100000 on the host — no privileges. The daemon still runs as root, so host compromise through the daemon is still possible.Approach 2 — Rootless Docker (daemon runs as non-root user):
# Install rootless Docker
curl -fsSL https://get.docker.com/rootless | sh

# Run Docker as your unprivileged user
export DOCKER_HOST=unix:///run/user/1000/docker.sock
docker run -d nginx
The entire Docker daemon, containerd, and runc all run as your unprivileged user. No root process anywhere. This is the strongest Docker security posture available.Rootless limitations:
  • Cannot bind to privileged ports (<1024) without sysctl net.ipv4.ip_unprivileged_port_start=0
  • Overlay2 storage driver requires kernel 5.11+ for rootless. Older kernels fall back to fuse-overlayfs (slower).
  • Cannot use --network host (requires CAP_NET_ADMIN which non-root does not have)
  • AppArmor and some cgroup features may not work in rootless mode
Podman’s advantage: Podman is rootless by default with no daemon, making it the simplest path to rootless containers. alias docker=podman and you get rootless containers without changing any workflows.What interviewers are really testing: Whether you understand that “running as non-root inside the container” (USER instruction) is different from “running the Docker daemon as non-root” (rootless Docker). Both are important; they address different threat vectors.Follow-up chain:
  1. “You enable userns-remap and your volume mounts break. Files are owned by UID 100000 instead of UID 0. How do you fix it?” (The remapped UID does not match the host UID. Fix: chown the volume directory to the remapped UID range, or use named volumes instead of bind mounts. In Kubernetes, fsGroup in the security context handles this.)
  2. “What is the difference between rootless Docker, rootless Podman, and rootless Kubernetes (usernetes)?” (Rootless Docker: daemon as non-root. Rootless Podman: no daemon at all, each container is a direct child process. Usernetes: Kubernetes components (kubelet, containerd) run as non-root. All aim to eliminate root processes from the container stack, but at different layers.)
Answer:These are Linux kernel security mechanisms that add layers of defense beyond namespaces and cgroups. They are the “what the container is allowed to do” controls, complementing the “what the container can see” (namespaces) and “how much it can use” (cgroups).Seccomp (System Call Filtering): Docker applies a default seccomp profile that blocks ~44 dangerous syscalls out of 300+. This prevents a compromised container from calling reboot(), mount(), kexec_load(), ptrace(), etc.
# Use Docker's default profile (applied automatically)
docker run nginx

# Use a custom stricter profile
docker run --security-opt seccomp=custom.json nginx

# Disable seccomp entirely (dangerous, needed for some debugging tools)
docker run --security-opt seccomp=unconfined nginx
Generating a custom seccomp profile: Use strace or seccomp-profiler to record which syscalls your app actually uses, then whitelist only those. A Node.js HTTP server needs ~60 syscalls. A Go static binary needs ~30. Everything else can be blocked.AppArmor (Mandatory Access Control): AppArmor restricts which files, capabilities, and network operations a process can use. Docker applies the docker-default AppArmor profile automatically.
# Run with Docker's default AppArmor profile
docker run --security-opt apparmor=docker-default nginx

# Run with a custom profile
docker run --security-opt apparmor=my-custom-profile nginx

# Disable AppArmor (dangerous)
docker run --security-opt apparmor=unconfined nginx
SELinux (alternative to AppArmor): Used on RHEL/CentOS/Fedora instead of AppArmor. docker run --security-opt label=type:my_container_t nginx. SELinux uses labels and type enforcement; AppArmor uses path-based rules. Same goal, different mechanism.Defense-in-depth stack for production containers:
  1. Non-root user (USER appuser)
  2. Read-only root filesystem (--read-only)
  3. Drop all capabilities, add back only what is needed (--cap-drop=ALL --cap-add=NET_BIND_SERVICE)
  4. Custom seccomp profile (whitelist only needed syscalls)
  5. AppArmor/SELinux profile (restrict file and network access)
  6. No new privileges (--security-opt no-new-privileges:true)
What interviewers are really testing: Whether you understand that container security is layered and each mechanism addresses a different attack vector. Senior candidates mention at least 3 of these layers. Staff candidates can design the enforcement policy for an organization.Red flag answer: “Namespaces isolate containers, so we do not need seccomp or AppArmor.” Namespaces isolate visibility, not capability. A container with access to dangerous syscalls can still exploit kernel vulnerabilities.
Answer:
docker run \
  --cpus="1.5" \
  --memory="512m" \
  --memory-swap="1g" \
  --pids-limit=100 \
  nginx
Answer:
# Initialize swarm
docker swarm init

# Deploy stack
docker stack deploy -c docker-compose.yml myapp

# Scale service
docker service scale myapp_web=5

# Update service
docker service update --image nginx:latest myapp_web
Answer:
# Create secret
echo "my-secret" | docker secret create db_password -

# Use in service
docker service create \
  --secret db_password \
  --env DB_PASSWORD_FILE=/run/secrets/db_password \
  myapp
Answer:Distroless images are the logical conclusion of “minimize your attack surface.” They contain your application runtime and nothing else — no shell, no package manager, no coreutils.Available bases from Google (gcr.io/distroless/):
ImageContentsSizeUse case
distroless/staticCA certs, timezone data~2 MBStatically compiled Go, Rust binaries
distroless/baseglibc, libssl, CA certs~20 MBDynamically linked C/C++ binaries
distroless/cclibstdc++ on top of base~25 MBC++ applications
distroless/nodejs18Node.js runtime~120 MBNode.js applications
distroless/java17OpenJDK 17 runtime~220 MBJava applications
distroless/python3Python 3 runtime~50 MBPython applications
Multi-stage build with distroless:
# Build stage -- full toolchain
FROM golang:1.21 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o app .

# Production stage -- distroless, no shell
FROM gcr.io/distroless/static:nonroot
COPY --from=builder /app/app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]
Why :nonroot tag matters: Distroless images come in :latest (runs as root) and :nonroot (runs as UID 65534) variants. Always use :nonroot unless your app specifically needs root.The debugging trade-off and how to work around it:
# You CANNOT do this with distroless:
docker exec -it myapp /bin/sh   # Error: no shell

# Kubernetes debug container workaround:
kubectl debug -it pod/myapp --image=busybox --target=myapp

# Build a debug variant for staging:
FROM gcr.io/distroless/static:debug   # Includes busybox shell
Chainguard Images — the next generation of distroless: Chainguard provides cgr.dev/chainguard/ images with similar philosophy but better supply chain guarantees (signed with cosign, daily CVE updates, SBOM attached). Many teams are migrating from Google distroless to Chainguard for better security posture.CVE comparison (real numbers):
  • node:18 (Debian): ~280 known vulnerabilities (Trivy scan)
  • node:18-alpine: ~15 known vulnerabilities
  • cgr.dev/chainguard/node:18: 0-3 known vulnerabilities
What interviewers are really testing: Whether you understand the full trade-off spectrum from Debian to Alpine to distroless, and can make the right choice for each service type. Staff candidates also explain how they enforce distroless as an org-wide standard via base image policies.Red flag answer: “Distroless is always better.” It is not — some applications need a shell for health checks (curl), runtime configuration (sed/envsubst), or debugging. The right answer is “distroless for production, debug variant for staging, and Alpine as a pragmatic middle ground.”
Senior vs Staff perspective
  • Senior: Uses distroless for Go/Rust services, understands the debug container pattern, picks :nonroot variants.
  • Staff: Mandates distroless (or Chainguard) as the org default via base image policy, builds a curated internal registry of approved base images with CVE SLAs, wires cosign signature verification into admission control, tracks SBOM diff between image versions for supply-chain auditing, and owns the migration plan from Debian-based base images to distroless fleet-wide. Also makes the build-vs-buy call on Chainguard vs maintaining internal distroless images.
Follow-up chain:
  1. “Your app needs to run curl for health checks. Can you still use distroless?” — Replace curl-based health checks with in-app HTTP endpoints (/health). Or use Kubernetes httpGet probes which call from outside the container. Distroless forces you to do this correctly; “I need curl inside the container” is usually a sign of a fragile health check pattern.
  2. “How do you scan distroless images for vulnerabilities when they have no package manager?” — Trivy and Grype scan based on the SBOM and binary manifests, not the package manager. They still find CVEs in Go/Rust binaries (by analyzing the Go module versions compiled in) and in any OS-level libraries. Chainguard images ship SBOMs via cosign attestations — even better.
  3. “What breaks when you go from Alpine to distroless for a Node.js service?” — (a) Entrypoints that rely on shell features ($VAR expansion) — rewrite to use Node-native arg parsing. (b) npm not available at runtime — ensure npm install happens in the build stage. (c) Native modules may need distroless/nodejs18 (glibc) not -alpine variants because glibc vs musl differ.
Work-sample scenario: Mandate: every production image in the org must be vulnerability-free (no high/critical CVEs) and signed. Walk through the rollout plan.
  • Phase 1: Pick the base image library. Chainguard for paid, distroless + hardened internal variants for free.
  • Phase 2: Build CI template (Docker, Buildx, cosign sign) that any team can adopt. Provide golden Dockerfile examples per language.
  • Phase 3: Admission control — Kyverno or Gatekeeper policies in Kubernetes that reject unsigned images or images with critical CVEs (via Trivy operator scan results).
  • Phase 4: Migration — start with non-prod clusters, allow images to remain Debian with warnings for 30 days, then fail CI builds that don’t use the approved bases.
  • Phase 5: SLO — “time to patch a critical CVE in all production images” as a team KPI. Use Trivy daily scans feeding into a ticketing pipeline.
Answer:
# Avoid mounting Docker socket
# BAD:
docker run -v /var/run/docker.sock:/var/run/docker.sock app

# GOOD: Use Docker API with TLS
docker run --env DOCKER_HOST=tcp://docker:2376 \
  --env DOCKER_TLS_VERIFY=1 app
See question 43 for the full deep dive on Docker socket risks and mitigations including socket proxies, Kaniko, and authorization plugins.
Answer:The Open Container Initiative (OCI) defines three specifications that make containers portable across runtimes, registries, and orchestrators. Understanding OCI is what separates “I use Docker” from “I understand containers.”The three OCI specs:
  1. OCI Image Spec: Defines the format for container images — the manifest (JSON metadata), config (runtime settings), and layer blobs (filesystem diffs). This is why a Docker-built image works in Podman, containerd, CRI-O, or any OCI-compliant runtime.
  2. OCI Runtime Spec: Defines how to configure and run a container from an image. It specifies the JSON config file format that runc (or any OCI runtime) reads: root filesystem path, environment variables, namespaces, cgroups, mounts, capabilities. This is the config.json you see when you runc spec.
  3. OCI Distribution Spec: Defines the HTTP API for pushing and pulling images to/from registries. This is why Docker images work in ECR, GCR, ACR, GHCR, Harbor, and any OCI-compliant registry.
Why OCI matters practically:
  • Docker images work in Kubernetes because both follow OCI Image Spec. Kubernetes does not need Docker — it needs an OCI-compliant runtime (containerd, CRI-O).
  • You can switch runtimes without changing images. Run the same image with runc (default), gVisor (sandboxed), or Kata Containers (micro-VM).
  • Registry portability: Push to Docker Hub, pull from a Harbor mirror. The wire protocol is the same.
  • OCI Artifacts: The Distribution Spec now supports storing arbitrary artifacts (Helm charts, Wasm modules, SBOMs, cosign signatures) alongside images. oras push lets you push anything to an OCI registry.
What interviewers are really testing: Whether you understand that Docker is an implementation of a standard, not the standard itself. Staff-level candidates explain OCI when discussing vendor portability, runtime selection, and registry architecture.Follow-up chain:
  1. “If OCI standardizes images, why do some images work with Docker but not Podman, or vice versa?” (They almost always work. The rare exceptions involve Docker-specific extensions not in the OCI spec, like Docker Compose labels or Docker’s legacy v1 manifest format. OCI v2 manifests are universal.)
  2. “What is an OCI manifest list (index), and how does it enable multi-architecture images?” (A manifest list is a JSON document that points to multiple platform-specific manifests under a single tag. When a client pulls, it sends its platform in the Accept header. The registry returns the matching manifest. This is how docker pull nginx gets arm64 on Graviton and amd64 on Intel — same tag, different layers.)
  3. “How do OCI Artifacts differ from container images, and what would you store as an artifact?” (OCI Artifacts use the same registry API but with different media types. Store: Helm charts, Wasm modules, SBOMs, cosign signatures, policy bundles. Benefit: single registry for all your supply chain artifacts, with the same auth, replication, and scanning infrastructure.)
Answer:Most applications are not designed to be PID 1. Understanding why, and knowing the solutions, is a critical production skill.Why PID 1 is special in Linux:
  • PID 1 does not receive default signal disposition. If your app does not explicitly handle SIGTERM, the signal is silently ignored (unlike any other PID, which would be terminated).
  • PID 1 is responsible for reaping orphaned child processes. If it does not call wait(), zombies accumulate.
  • PID 1 receives SIGCHLD for every orphan, regardless of who spawned the child.
The spectrum of solutions (from simplest to most capable):
SolutionSizeZombie reapingSignal forwardingProcess supervisionUse case
tini30 KBYesYesNo90% of containers (recommended default)
dumb-init (Yelp)50 KBYesYes (rewrites signals)NoSimilar to tini, slightly different signal behavior
s6-overlay~2 MBYesYesYes (full supervisor)Multi-process containers (app + cron + sidecar)
supervisord~50 MB (Python)YesLimitedYesLegacy, not recommended for new containers
When you need more than tini:
# Single-process container (most common): tini is sufficient
ENTRYPOINT ["tini", "--"]
CMD ["node", "server.js"]

# Multi-process container (app + nginx + cron): use s6-overlay
FROM alpine
ADD https://github.com/just-containers/s6-overlay/releases/download/v3.1.6.2/s6-overlay-noarch.tar.xz /tmp
RUN tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz
ENTRYPOINT ["/init"]
# Define services in /etc/s6-overlay/s6-rc.d/
The Docker --init flag: Injects tini at runtime without modifying the Dockerfile. Kubernetes equivalent: shareProcessNamespace: true with a sidecar that reaps zombies, or simply handle SIGTERM in your application code.What interviewers are really testing: Whether you have debugged slow container shutdowns or zombie process accumulation in production. This is a practical problem, not theoretical — it manifests as Kubernetes pods stuck in “Terminating” for 30 seconds on every deploy.Follow-up chain:
  1. “Your container needs to run both nginx and a background worker process. Is a multi-process container the right approach, or should you use separate containers?” (Separate containers is the Docker/Kubernetes best practice — one process per container. But sometimes a tightly-coupled pair (like envoy sidecar + app) benefits from sharing a container for performance. If you must go multi-process, use s6-overlay, never bare supervisord or shell scripts with &.)
  2. “What is the difference between tini’s signal forwarding and dumb-init’s signal rewriting?” (Tini forwards signals directly to PID 2 (your app). Dumb-init rewrites SIGTERM to the child process group, which ensures all descendants receive the signal even if your app does not propagate it. For process groups (forking servers), dumb-init’s behavior is more correct.)

Advanced Scenario-Based Questions

Scenario: Your team’s Node.js API image ballooned from 180MB to 2.1GB over six months. Deploys now take 14 minutes instead of 3. The Dockerfile has 22 RUN instructions, installs build-essential for native modules, copies the entire repo, and nobody has touched it since the original author left. You are asked to fix it this sprint. Walk me through your approach.What weak candidates say:
  • “Just switch to Alpine.” (Breaks native modules like bcrypt and sharp that depend on glibc.)
  • “Delete the node_modules folder.” (Shows no understanding of layers vs runtime filesystem.)
  • Cannot explain why layers accumulate size or how docker history works.
What strong candidates say:
  • Step 1 — Diagnose before cutting. Run docker history --no-trunc myapp:latest to see per-layer sizes. In my experience, 80% of the bloat comes from 2-3 layers: the apt-get install build-essential layer (400MB+), a COPY . . that drags in .git (sometimes 500MB+), and leftover npm cache inside a RUN npm install layer.
  • Step 2 — Add .dockerignore immediately. Excluding .git, node_modules, dist, *.log, and test fixtures often shaves 30-50% of build context size. I once cut build context from 1.8GB to 90MB just with this file.
  • Step 3 — Multi-stage build. Stage 1 (node:18) installs build-essential, runs npm ci (not npm install — deterministic, respects lockfile), and compiles native addons. Stage 2 (node:18-slim or node:18-alpine if musl-compatible) copies only node_modules and dist from the builder. This eliminates gcc, make, python, and all build artifacts from the final image.
  • Step 4 — Collapse and clean in the same layer. RUN apt-get update && apt-get install -y build-essential && npm ci && apt-get purge -y build-essential && rm -rf /var/lib/apt/lists/* — if you split these across layers, the deleted files still exist in earlier layers due to Union FS semantics.
  • Step 5 — Use BuildKit cache mounts. RUN --mount=type=cache,target=/root/.npm npm ci avoids re-downloading the npm cache on every build while keeping it out of the final layer entirely.
  • Metrics from real cleanup: Took a 2.1GB image down to 220MB. Deploy time went from 14 minutes to 2.5 minutes. ECR storage costs dropped by ~$180/month across 40 image tags.
Follow-up:
  1. After switching to Alpine, bcrypt throws Error: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.28 not found. What happened and how do you fix it without abandoning Alpine?
  2. You have a monorepo with 12 services sharing a root package.json. How do you structure the Dockerfile to avoid invalidating layer cache for Service A when Service B’s code changes?
  3. Your CI is rebuilding the entire image from scratch on every push despite no dependency changes. What is wrong with your layer caching strategy?
Scenario: A Go microservice builds fine locally on your M1 Mac. In CI (GitHub Actions, ubuntu-latest), the multi-stage build succeeds but the final container crashes on startup with exec format error or not found (for a statically linked binary that clearly exists in the image). The Dockerfile looks correct. What is happening?What weak candidates say:
  • “The binary must not have been copied correctly.” (Does not investigate architecture or linking.)
  • “Try rebuilding with --no-cache.” (Cargo cult debugging.)
  • Cannot explain the difference between static and dynamic linking in the context of containers.
What strong candidates say:
  • exec format error is almost always an architecture mismatch. Building on an M1 Mac produces linux/arm64 binaries. If CI runs on linux/amd64 or the final image’s platform is amd64, you get this error. The fix: explicitly set --platform=linux/amd64 in the FROM line of your builder stage, or use docker buildx build --platform linux/amd64.
  • not found on a binary that exists is a dynamic linking issue. The Go binary was compiled with CGO_ENABLED=1 (the default when cgo dependencies like net or os/user are imported). It linked against glibc in the builder stage (golang:1.20 is Debian-based). The final stage uses alpine (musl libc) or scratch (no libc). The dynamic linker (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) does not exist.
  • Fix for the linking issue:
    # Force static compilation
    FROM golang:1.20 AS builder
    WORKDIR /app
    COPY . .
    RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o server .
    
    FROM scratch
    COPY --from=builder /app/server /server
    ENTRYPOINT ["/server"]
    
    The -ldflags="-s -w" strips debug info and DWARF symbols, reducing binary size by 30-40%.
  • If you genuinely need CGO (e.g., SQLite via mattn/go-sqlite3), your final stage must have a compatible libc. Use alpine with RUN apk add --no-cache libc6-compat, or use gcr.io/distroless/base which includes glibc.
  • War story: We had a service that built fine for 8 months. Then a developer added import "os/user" which silently enabled CGO. Builds still passed in CI but the container crashed in staging at 2 AM. We added CGO_ENABLED=0 as a mandatory linter check in CI after that.
Follow-up:
  1. Your Go binary needs to make HTTPS calls but runs in a scratch container. It fails with x509: certificate signed by unknown authority. Why, and what is the minimal fix?
  2. How would you set up a CI pipeline that builds and pushes both linux/amd64 and linux/arm64 images from a single Dockerfile, and what does the manifest list look like in the registry?
Scenario: You have three containers in a docker-compose.yml: api, worker, and redis. The api container tries to connect to redis:6379 and gets Connection refused. You have verified Redis is running inside its container. docker ps shows all three containers are up. What do you check?What weak candidates say:
  • “Expose port 6379 to the host with -p 6379:6379.” (Misunderstands container-to-container networking entirely — you do not need host port mapping for inter-container communication on the same Docker network.)
  • “Use the container’s IP address instead of hostname.” (Fragile, misses the point of Docker DNS.)
  • Cannot explain how Docker DNS resolution works.
What strong candidates say:
  • Step 1 — Verify they are on the same network. docker network inspect <network_name> and confirm all three containers appear in the Containers section. Compose creates a default network named <project>_default, but if someone defined custom networks and forgot to attach a service, that container is isolated.
  • Step 2 — Check if Redis is binding to 127.0.0.1 vs 0.0.0.0. This is the number one cause of “Connection refused” between containers. Redis 7+ defaults to bind 127.0.0.1 -::1 with protected-mode yes. Inside the Redis container, it only accepts connections from its own loopback. Fix: set bind 0.0.0.0 in redis.conf or pass --bind 0.0.0.0 as a command argument. This is the exact same problem as Q35 in this doc but people forget it applies to every service, not just web servers.
  • Step 3 — DNS resolution. Exec into the api container: docker exec -it api sh -c "getent hosts redis". If it does not resolve, the containers may be on the default bridge network (which does not support DNS — only user-defined bridges do). Compose normally creates a user-defined bridge, but if someone used network_mode: bridge explicitly, they bypassed this.
  • Step 4 — Check depends_on race condition. depends_on only waits for the container to start, not for the service to be ready. Redis might not be accepting connections yet when api tries to connect. Fix: use depends_on with condition: service_healthy and define a healthcheck for Redis: test: ["CMD", "redis-cli", "ping"].
  • Step 5 — Firewall / iptables. On production Linux hosts, iptables rules or firewalld can silently block Docker bridge traffic. Run iptables -L -n and check for DROP rules on the docker0 or br-* interfaces.
  • Debugging toolkit: docker exec api ping redis, docker exec api nc -zv redis 6379, docker network inspect, docker logs redis.
Follow-up:
  1. You now need api to communicate with a container running in a different Compose project on the same host. How do you set this up without host networking?
  2. Your containers can talk to each other by IP but not by hostname. What specific Docker network type causes this, and why?
  3. In production, you switch from Compose to Kubernetes. How does service discovery change, and what breaks if you hardcode container hostnames?
Scenario: Your alerting fires at 3 AM: a Java Spring Boot container running in production (ECS Fargate, 2GB memory limit) is in a restart loop. docker inspect shows "OOMKilled": true and exit code 137 on every restart. The application was running fine for weeks. No recent code deployments. Diagnose and fix.What weak candidates say:
  • “Just increase the memory limit to 4GB.” (Treats the symptom, not the cause. The leak will consume 4GB too, just slower.)
  • “Java has garbage collection so it cannot have memory leaks.” (Fundamentally wrong — GC handles heap, but off-heap memory, thread stacks, metaspace, and native allocations can all leak.)
  • Cannot distinguish between container memory limit and JVM heap.
What strong candidates say:
  • Understand the memory stack. Container memory limit (2GB via --memory or ECS task definition) caps the total RSS of the process, which for Java includes: JVM Heap (-Xmx), Metaspace, Thread Stacks (1MB per thread by default), Code Cache, Direct ByteBuffers (NIO), Native memory (JNI, gzip, TLS), and the OS overhead. A common mistake: setting -Xmx2g in a 2GB container — the JVM needs 2g for heap plus 300-500MB for everything else, guaranteeing OOMKill.
  • Step 1 — Check if it is a JVM heap issue. Look at the container’s memory usage over time with docker stats or CloudWatch/Prometheus metrics. If memory grows linearly, it is likely a leak. If it spikes suddenly, it could be a burst of traffic creating threads or loading data.
  • Step 2 — Capture a heap dump before it dies. Add -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap.hprof to JVM args. Mount /tmp to a volume so the dump survives the container restart. Analyze with Eclipse MAT or jhat.
  • Step 3 — Check for common non-heap culprits. Native Memory Tracking: add -XX:NativeMemoryTracking=summary and query with jcmd <pid> VM.native_memory summary. I once found a service leaking 50MB/hour in Direct ByteBuffers because a library was allocating NIO buffers in a loop without closing them.
  • Step 4 — Right-size JVM for the container. Modern JVMs (11+) respect container limits with -XX:+UseContainerSupport (on by default). Set -XX:MaxRAMPercentage=75.0 instead of a fixed -Xmx. This gives the JVM 75% of the container limit (1.5GB) and leaves 500MB for non-heap.
  • Step 5 — Investigate “no recent deployment” claim. Check if a dependency was auto-updated (Dependabot, Renovate), if a feature flag changed (enabling a new code path), or if data volume increased (more users, bigger payloads, cache not evicting).
  • War story: A Spring Boot app OOMKilled every 3 days. Heap was fine. Turned out logback was configured with an AsyncAppender that created a new thread per log destination, and someone added a dynamic logger that created a new appender per tenant. 2000 tenants = 2000 threads = 2GB in thread stacks alone.
Follow-up:
  1. Your Node.js container (not Java) is OOMKilled with exit code 137, but process.memoryUsage() shows only 200MB heap. Where is the missing memory?
  2. How does the Linux kernel’s OOM killer decide which process to kill when the cgroup limit is hit? Can you influence this?
  3. You set --memory=2g --memory-swap=2g. What does this mean, and how does it differ from --memory=2g --memory-swap=4g?
Scenario: Your CI pipeline runs inside a Docker container (e.g., GitLab CI runner or Jenkins agent). The pipeline needs to build Docker images and push them to a registry. A junior engineer mounted /var/run/docker.sock into the CI container. It works, but your security team flagged it. Another engineer suggests using docker:dind (Docker-in-Docker). You need to advise the team on the correct approach.What weak candidates say:
  • “Just mount the socket, it is fine for CI.” (Ignores the security implications entirely.)
  • “Use DinD, it is designed for this.” (Does not understand the operational complexity of DinD.)
  • Cannot articulate the difference between socket mounting and true DinD.
What strong candidates say:
  • Socket mounting (/var/run/docker.sock) — fast but dangerous.
    • The CI container gets full root access to the host’s Docker daemon. It can docker run --privileged to escape to the host, docker rm -f any container, or docker exec into production containers sharing that host.
    • Builds share the host’s layer cache (fast), but also share the host’s image namespace (CI builds can accidentally overwrite production images by tag).
    • Build artifacts (layers, volumes) accumulate on the host and are never cleaned up by CI.
    • Mitigation if you must use it: Run the CI container with a read-only socket (ro), use --userns-remap to limit root inside the container, and scope daemon access with authorization plugins.
  • True DinD (docker:dind) — isolated but operationally painful.
    • Runs a full Docker daemon inside a container. Requires --privileged (which the security team will also flag).
    • Layer cache is inside the DinD container. When the CI job finishes and the container dies, cache is gone. Every build starts cold. This makes builds 3-5x slower.
    • Storage driver conflicts: the inner Docker’s overlay2 running on top of the outer Docker’s overlay2 can cause data corruption. You must use --storage-driver=vfs which is extremely slow but safe.
    • Mitigation: Use a persistent volume for /var/lib/docker inside DinD to retain cache across jobs.
  • The modern answer: rootless build tools.
    • Kaniko (Google): Builds images in userspace, no Docker daemon needed, runs as unprivileged container. Perfect for Kubernetes-based CI. Downside: no support for RUN --mount or all BuildKit features.
    • Buildah: Daemonless, rootless image builder. OCI-compliant. Works well in Podman-based CI.
    • BuildKit (docker buildx) with remote builder: Run a BuildKit daemon as a separate service, connect to it over TCP/TLS. CI container does not need Docker at all — it only needs the buildctl client.
  • War story: A team using socket mounting in their GitLab runner had a rogue CI job (from a fork PR) that ran docker run -it --pid=host --privileged ubuntu and gained root on the build server. They moved to Kaniko within a week.
Follow-up:
  1. Your Kaniko-based CI build is 4x slower than the old socket-mounted approach. How do you optimize Kaniko build caching?
  2. Explain the security implications of --privileged in the context of Linux capabilities. What specific capabilities does it grant that make it dangerous?
  3. How would you design a CI/CD pipeline that can build Docker images inside Kubernetes pods without any privileged access?
Scenario: A security audit reveals that developers on your team have been pulling base images directly from Docker Hub (e.g., FROM python:3.11) in production Dockerfiles. The security team demands you lock this down within two weeks. What is your plan?What weak candidates say:
  • “Just tell developers to use official images, they are safe.” (Official images have had CVEs. “Official” does not mean “vulnerability-free”.)
  • “Enable Docker Content Trust.” (Partial solution — only verifies image was signed, not that it is free of vulnerabilities.)
  • No awareness of supply chain attacks or image provenance.
What strong candidates say:
  • The threat model is real. In 2023-2024, multiple attacks targeted Docker Hub: typosquatting (e.g., pythonn instead of python), compromised maintainer accounts uploading backdoored images, and cryptominer payloads in popular images. One study found 51% of Docker Hub images had critical CVEs.
  • Step 1 — Stand up a private registry (or use a managed one). Harbor (open-source, supports vulnerability scanning, RBAC, replication), AWS ECR, GCP Artifact Registry, or Azure ACR. Configure it as a pull-through cache for Docker Hub so developers still get upstream images but through your gateway.
  • Step 2 — Image scanning in the registry. Integrate Trivy, Grype, or the registry’s built-in scanner (Harbor has built-in Trivy). Set a policy: images with Critical or High CVEs cannot be pulled/deployed. In Harbor, this is a “Prevent vulnerable images from running” policy at the project level.
  • Step 3 — Pin image digests, not tags. Tags are mutable — python:3.11 can point to a different image tomorrow. Pin to digest:
    FROM python:3.11@sha256:a1b2c3d4e5f6...
    
    Use docker pull python:3.11 then docker inspect --format='{{.RepoDigests}}' python:3.11 to get the digest. Automate digest updates with Dependabot or Renovate.
  • Step 4 — Enforce via admission control. In Kubernetes: OPA Gatekeeper or Kyverno policy that rejects any pod with an image not from your approved registry. In Docker directly: use the --registry-mirror daemon config and block Docker Hub at the network level (firewall/proxy).
  • Step 5 — Sign your own images. Use cosign (from Sigstore) to sign images after CI builds them. Verify signatures in your admission controller. This creates a full chain of trust: you built it, you scanned it, you signed it.
  • Step 6 — SBOM (Software Bill of Materials). Generate SBOMs with syft or docker sbom for every image. Attach them to images via OCI artifacts. When a new CVE drops (like Log4Shell), you can query your SBOM database to find every affected image in minutes instead of days.
  • Metrics: After implementing this at a previous company (200 engineers, ~80 microservices), we went from 340 critical CVEs across production images to 12 within 6 weeks.
Follow-up:
  1. A developer argues that pinning digests makes it impossible to get security patches automatically. How do you balance pinning with staying up to date?
  2. Your pull-through cache goes down. All CI builds fail because they cannot pull base images. How do you design for this failure mode?
  3. Explain how a tag-based image substitution attack works and how digest pinning prevents it.
Scenario: Your Python web application container (Flask + Gunicorn with 4 workers) has been running for two weeks. Monitoring shows that the container’s process count has grown from 5 to 847. docker top reveals hundreds of defunct (zombie) processes. The application itself is still responding but getting slower. Explain what is happening and how to fix it.What weak candidates say:
  • “Restart the container.” (Fixes the symptom temporarily. Zombies will come back.)
  • “Increase the PID limit.” (Delays the crash, does not fix the cause.)
  • Cannot explain what a zombie process is or why PID 1 matters in containers.
What strong candidates say:
  • What is happening: In a container, the ENTRYPOINT process becomes PID 1. In a normal Linux system, PID 1 is init/systemd, which has a special responsibility: reaping orphaned child processes. When a child process exits, it becomes a zombie (it has exited but its entry remains in the process table) until its parent calls wait() on it. If the parent dies before calling wait(), the orphan is re-parented to PID 1, which must reap it. Gunicorn spawns worker processes. Those workers may spawn subprocesses (e.g., via subprocess.run(), health check scripts, shell commands). If a worker dies or those subprocesses are orphaned, they get re-parented to PID 1 (Gunicorn master). But Gunicorn is not designed to be an init system — it only reaps its own workers, not arbitrary orphans.
  • Why 847 processes: The Flask app likely has a code path that spawns subprocesses (maybe calling an external tool, running a shell command, or forking for background tasks). Those subprocesses finish but are never reaped because Gunicorn does not call wait() for processes it did not create. Each zombie consumes a PID and a small amount of kernel memory. Eventually you hit the PID limit (--pids-limit or kernel default of 32768) and the container cannot spawn new processes at all.
  • Fix 1 — Use tini as the init process.
    # Option A: Docker's built-in (recommended)
    # Run with: docker run --init myapp
    
    # Option B: Install tini in the image
    FROM python:3.11-slim
    RUN apt-get update && apt-get install -y tini && rm -rf /var/lib/apt/lists/*
    ENTRYPOINT ["tini", "--"]
    CMD ["gunicorn", "-w", "4", "app:app"]
    
    tini is a tiny (~30KB) init process that does exactly two things: forward signals to child processes and reap zombies. It calls waitpid(-1, ...) in a loop.
  • Fix 2 — Fix the root cause. Find the code spawning subprocesses and ensure they are properly waited on. In Python: subprocess.run() automatically waits, but subprocess.Popen() without .wait() or .communicate() will create zombies. Check for os.fork() without os.waitpid().
  • Fix 3 — Signal handling. Without a proper init, SIGTERM sent to the container (docker stop) goes to PID 1. If PID 1 does not have a SIGTERM handler (common in Python scripts), the signal is ignored (PID 1 is special — it does not get default signal handling). Docker waits 10 seconds, then sends SIGKILL. This means your app never gets a graceful shutdown. tini fixes this by forwarding SIGTERM to the child process group.
  • War story: A data pipeline container ran Python scripts that shelled out to ffmpeg for video processing. Each Popen() call without .wait() left a zombie. After 3 days, the container hit its PID limit (4096), new ffmpeg calls failed with OSError: [Errno 11] Resource temporarily unavailable, and the pipeline silently dropped videos. Took 2 hours to diagnose, 5 minutes to fix (added tini + fixed the missing .wait() calls).
Follow-up:
  1. Your application needs to handle SIGTERM for graceful shutdown (drain connections, finish in-flight requests). With tini as PID 1, how does the signal reach your application? What is signal forwarding vs signal rewriting?
  2. Why does PID 1 not get default signal handling in Linux? What kernel behavior makes this different from any other PID?
  3. You are using a scratch base image (no package manager). How do you add tini without apt-get?
Scenario: Your company is migrating production from x86 EC2 instances to Graviton (ARM64) for 40% cost savings. You have 30 microservices, all with x86-only Docker images. Your task is to make all images build for both linux/amd64 and linux/arm64, keep CI build times under 15 minutes, and ensure developers on both Intel Macs and M1/M2 Macs can build locally. Lay out your strategy.What weak candidates say:
  • “Just use docker buildx build --platform linux/amd64,linux/arm64.” (Technically correct but ignores the 15 real-world problems that come with it.)
  • “QEMU handles everything.” (Does not understand the 10-20x performance penalty of emulation.)
  • No awareness of native compilation vs emulation trade-offs.
What strong candidates say:
  • Understand the build strategies:
    • QEMU emulation: docker buildx uses QEMU to emulate the target architecture. Simple to set up (docker run --privileged --rm tonistiigi/binfmt --install all). But RUN steps for the non-native architecture are 10-20x slower. A Go compile that takes 30 seconds natively takes 5-8 minutes under QEMU. For 30 services, this blows past your 15-minute CI budget.
    • Cross-compilation (preferred for compiled languages): Build the binary for the target platform on the native platform. Go supports this natively: GOOS=linux GOARCH=arm64 go build. Rust uses cross. This avoids QEMU entirely for the expensive compile step. Only the final FROM stage needs the target platform.
    • Native remote builders: Set up ARM64 build nodes (Graviton spot instances at ~$0.02/hr) and register them as buildx remote builders. docker buildx create --name multiarch --driver docker-container --platform linux/arm64 ssh://build@arm-builder. BuildKit will dispatch the arm64 build to the native node and amd64 to the local CI runner. Both run at full native speed.
  • Dockerfile pattern for cross-compilation (Go example):
    FROM --platform=$BUILDPLATFORM golang:1.21 AS builder
    ARG TARGETPLATFORM TARGETOS TARGETARCH
    WORKDIR /app
    COPY go.mod go.sum ./
    RUN go mod download
    COPY . .
    RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -o server .
    
    FROM alpine:3.18
    COPY --from=builder /app/server /server
    ENTRYPOINT ["/server"]
    
    Key detail: FROM --platform=$BUILDPLATFORM makes the builder stage run on the CI runner’s native architecture, while TARGETOS/TARGETARCH cross-compile for the target. No QEMU needed for the compile step.
  • For interpreted languages (Node.js, Python): Cross-compilation does not apply. The code is the same, but native dependencies (e.g., sharp, bcrypt, grpc) have platform-specific binaries. Strategy: use npm ci --platform=linux --arch=arm64 in the build stage, or rely on QEMU for the RUN npm ci step (slower but correct). Alternatively, use multi-stage with --platform=$TARGETPLATFORM on the install stage and cache aggressively with BuildKit cache mounts.
  • Registry and manifest list: docker buildx build --platform linux/amd64,linux/arm64 -t registry/app:v1 --push . creates a manifest list (also called a fat manifest). When a Graviton node pulls registry/app:v1, the registry returns the arm64 layer hashes. When an x86 node pulls the same tag, it gets amd64 layers. The client and registry negotiate this via the Accept header and mediaType in the manifest.
  • CI optimization for 30 services:
    1. Use BuildKit remote cache: --cache-from type=registry,ref=registry/app:cache --cache-to type=registry,ref=registry/app:cache,mode=max. Shares layer cache across CI runs.
    2. Only rebuild services whose code changed (monorepo path filtering in CI).
    3. Parallelize: build all 30 services concurrently across multiple CI runners.
    4. For the migration, build amd64+arm64 simultaneously so you can canary-deploy arm64 pods alongside amd64 pods and compare behavior.
  • Metrics from a real migration: 30 services, all dual-arch, CI builds averaging 8 minutes (down from 22 minutes with naive QEMU approach). Graviton3 instances gave 38% cost reduction and 15% latency improvement for compute-heavy services.
Follow-up:
  1. One of your 30 services depends on a C library (librdkafka for Kafka) that does not provide pre-built ARM64 binaries. How do you handle this in your multi-arch build?
  2. A developer on an M1 Mac runs docker build locally and gets an amd64 image. Why, and how do you configure their environment so docker build produces the correct architecture by default?
  3. You push a multi-arch image and notice the arm64 variant is 20% larger than the amd64 variant of the same code. What could cause this and does it matter?

Work-Sample Patterns

These are open-ended prompts designed to test real-world problem-solving. Give the candidate 5-10 minutes to think through each one.
How to present this: “Here is a Dockerfile for a Node.js API. The resulting image is 3GB. You have 10 minutes to analyze it and propose changes. Talk me through your process.”
FROM ubuntu:22.04
RUN apt-get update
RUN apt-get install -y nodejs npm python3 build-essential git curl wget
RUN npm install -g yarn
COPY . /app
WORKDIR /app
RUN yarn install
RUN yarn build
RUN apt-get install -y nginx
COPY nginx.conf /etc/nginx/nginx.conf
EXPOSE 3000 80
CMD ["sh", "-c", "nginx && node /app/dist/server.js"]
What weak candidates do:
  • Jump straight to “use Alpine” without analyzing the Dockerfile first
  • Miss the .git directory being copied in via COPY . /app
  • Do not mention multi-stage builds
  • Do not notice the shell-form CMD or the multi-process anti-pattern
What strong candidates do:
  1. Diagnose first: “I would run docker history to see which layers are biggest. But just reading this, I can already see several problems.”
  2. Identify the big wins in order of impact:
    • COPY . /app copies everything including .git, node_modules, test files. Add .dockerignore.
    • ubuntu:22.04 as base is ~77MB, but after apt-get install, this will be 500MB+. Switch to node:18-slim or multi-stage.
    • Every RUN is a separate layer. The apt-get update and apt-get install are split — intermediate state persists.
    • build-essential, git, curl, wget are build-time tools that should not be in the production image.
    • yarn install includes dev dependencies unless --production is specified.
  3. Propose a multi-stage rewrite:
    • Stage 1: node:18 with build-essential for native modules, yarn install, yarn build
    • Stage 2: node:18-alpine with only production node_modules and dist/
    • Separate nginx into its own container (one process per container principle)
  4. Estimate the result: “I would expect this to go from 3GB to ~200-250MB, maybe lower with aggressive pruning.”
Staff-level bonus: Mentions BuildKit cache mounts for yarn install, proposes a shared base image for the org’s Node.js services, and suggests CI-side image size checks (fail the build if image exceeds 500MB).
How to present this: “PagerDuty fires at 2 AM. Your production container on ECS Fargate is restart-looping. Exit code 137. No recent deploys. Walk me through your investigation.”What weak candidates do:
  • “Increase the memory and go back to sleep.” (Band-aid, leak will consume the new limit too.)
  • Do not know what exit code 137 means.
  • Cannot distinguish between OOMKill from Docker/cgroup vs the Linux kernel’s OOM killer.
What strong candidates do:
  1. Decode the exit code: “137 = 128 + 9 = SIGKILL. The process was killed with signal 9. This is almost always an OOM kill, but could also be docker kill or Kubernetes eviction.”
  2. Confirm OOMKill: docker inspect -f '{{.State.OOMKilled}}' <id> — if true, it is a memory limit issue. On ECS, check CloudWatch for MemoryUtilization spikes.
  3. Investigate why memory spiked:
    • Was there a traffic spike? Check request rate metrics.
    • Is it a gradual leak? Memory usage graph over the last 24 hours will show a sawtooth pattern (restart-recover-leak-kill).
    • Did a dependency change? Check if Dependabot or Renovate merged something.
    • Language-specific: JVM heap vs off-heap, Node.js --max-old-space-size, Python process count.
  4. Immediate mitigation: Increase memory limit temporarily while investigating. Set up CloudWatch alarm on MemoryUtilization > 85% to catch it before OOMKill.
  5. Root cause: “In my experience, the top 3 causes are: unbounded caches (in-memory LRU without eviction), loading entire datasets into memory (should stream), and connection pool leaks (each connection holds buffers).”
Staff-level bonus: Sets up container memory metrics dashboards across all services, proposes memory budgets per service tier, and establishes runbooks for OOMKill investigation.
How to present this: “You just joined as the platform engineer for a company with 50 microservices, 6 teams, and no container standards. Images range from 2GB to 50MB. Some use latest, some use commit SHAs. Three different base images. No scanning. Design the container image strategy.”What weak candidates do:
  • Propose a single standard without considering team diversity (Java team, Python team, Go team have different needs)
  • Focus only on Dockerfiles, ignore registry, scanning, signing, and governance
  • No migration plan — just “everyone should switch”
What strong candidates do:
  1. Approved base images: Create 4-5 “golden” base images maintained by the platform team: company/node:18, company/python:3.11, company/java:17, company/go-builder:1.21, company/static:latest (distroless). Auto-rebuild weekly with latest security patches. Scan with Trivy, sign with cosign.
  2. Registry architecture: Harbor or ECR with pull-through cache for Docker Hub. RBAC per team. Vulnerability scanning on push. Policy: block images with Critical CVEs from being pulled in production.
  3. Tagging standard: <service>:<semver>-<git-sha>. No latest in production (enforce via admission controller). Pin base images by digest in Dockerfiles.
  4. Image scanning pipeline: Scan on push (gate deployments) + nightly full-catalog scan (catch newly discovered CVEs in already-deployed images). Alert owners via Slack when their running image has a new Critical CVE.
  5. Build standardization: Shared CI templates (GitHub Actions reusable workflow or GitLab CI includes) that enforce multi-stage builds, non-root user, health checks, and SBOM generation.
  6. Migration plan: Do not mandate everything at once. Phase 1: approved base images + scanning (2 weeks). Phase 2: tagging standard + cosign signing (4 weeks). Phase 3: admission controller enforcement (8 weeks). Support teams with office hours and migration PRs.
  7. Metrics: Track median image size, CVE count per image, build time p95, percentage of services on approved bases. Report monthly.
Staff-level bonus: Builds the platform as an internal product with documentation, Slack support channel, and self-service onboarding. Considers cost: ECR storage costs at scale, build minute costs, Graviton cost savings from multi-arch images.
How to present this: “Here is a docker-compose.yml. The api container logs say ‘Connection refused’ when connecting to redis:6379. Redis is running and healthy. You have 5 minutes to find the issue.”
services:
  api:
    build: ./api
    ports: ["3000:3000"]
    networks:
      - frontend
  redis:
    image: redis:7-alpine
    networks:
      - backend
  worker:
    build: ./worker
    networks:
      - frontend
      - backend

networks:
  frontend:
  backend:
The bug: api is on frontend network only. redis is on backend network only. They cannot communicate because they are on different Docker networks. DNS resolution for redis fails inside the api container.What weak candidates do:
  • “Expose port 6379 with -p” (host port mapping is irrelevant for container-to-container communication)
  • Do not read the network configuration carefully
  • Suggest hardcoding IP addresses
What strong candidates do:
  1. Read the compose file carefully and immediately spot: “api is on frontend, redis is on backend — they are isolated.”
  2. Fix: Either add api to the backend network, or add redis to the frontend network, or create a shared network.
  3. Verify with: docker exec api getent hosts redis — should resolve once the network is fixed.
  4. Bonus: Note that worker can already reach both api and redis because it is on both networks. This is correct for a worker that processes jobs from Redis and calls back to the API.

Candidate Comparison Patterns

What separates Senior from Staff in Docker interviewsSenior engineers demonstrate: writing production-quality Dockerfiles, debugging container issues (networking, OOM, startup failures), understanding layer caching and multi-stage builds, configuring health checks, and managing images in CI/CD pipelines.Staff engineers demonstrate everything above, plus: designing container standards for an organization, registry architecture and governance, base image supply chain security (scanning, signing, SBOM), build infrastructure strategy (BuildKit remote builders, multi-arch), runtime security posture (seccomp profiles, rootless enforcement, admission policies), and cost optimization (image size budgets, Graviton migration, registry storage).The key difference: a senior engineer solves their team’s Docker problems. A staff engineer prevents Docker problems across 10 teams by building the right platform and policies.
What weak candidates say: “Use Alpine and you are done. Alpine is always smaller.”What strong candidates say: “The way I approach image optimization is in layers of aggressiveness. First, .dockerignore — it is free and often cuts 30%+ from the build context. Second, multi-stage builds to separate build tools from runtime. Third, minimal base images — but Alpine is not always the answer. If your app has native dependencies that link against glibc, Alpine’s musl libc will break things silently. I have seen bcrypt and sharp fail on Alpine. For those cases, slim variants or distroless are better choices. Fourth, BuildKit cache mounts to keep dependency caches out of layers entirely. I always measure with docker history and dive before and after.”
What weak candidates say: “Containers are isolated, so security is built-in. We scan for CVEs and that is enough.”What strong candidates say: “Container isolation is a floor, not a ceiling. I think about it in layers: the image (minimal base, no shell, scanned, signed), the build (BuildKit secrets, no ARG for credentials, pinned digests), the runtime (non-root, read-only root FS, dropped capabilities, seccomp profile), and the platform (admission control rejecting privileged containers, image pull policies, network policies). The gotcha most people miss is that scanning catches known CVEs but not supply chain attacks — a malicious base image maintainer can inject a backdoor that does not match any CVE signature. That is why image signing and provenance verification matter.”
What weak candidates say: “Everything runs on the default bridge. If containers cannot talk, just expose ports.”What strong candidates say: “I always use user-defined bridge networks because the default bridge does not support DNS resolution between containers. For multi-service applications, I create dedicated networks per application boundary — this gives me network-level isolation without firewall rules. When debugging connectivity, my first three checks are: are they on the same network (docker network inspect), is the service binding to 0.0.0.0 not 127.0.0.1, and is DNS resolving (getent hosts <name> from inside the container). The number one issue I have seen in production is services binding to localhost inside the container — Redis, PostgreSQL, and many frameworks default to this.”
What weak candidates say: “Multi-stage is when you have two FROM statements. It makes images smaller.”What strong candidates say: “Multi-stage builds solve three problems at once: image size (build tools do not ship to production), security (compilers and source code are not in the final image), and secrets hygiene (build-time credentials like NPM tokens in stage 1 never leak to the final stage). The pattern I use for Go services gets images down to 5-15MB: full golang base for building with CGO_ENABLED=0, then distroless/static for production. For Node.js, it is trickier because you still need the runtime — I use node:18-slim as the final stage with only production node_modules. The advanced technique is --target — I define a test stage that runs go test and a prod stage for deployment. CI runs docker build --target=test for testing and docker build --target=prod for the release image, all from the same Dockerfile.”