Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Docker Interview Questions (70+ Detailed Q&A)
1. Fundamentals & Architecture
1. VM vs Container
1. VM vs Container
| Feature | Virtual Machine | Container |
|---|---|---|
| Virtualization level | Hardware (hypervisor emulates CPU, memory, I/O) | OS-level (shares host kernel directly) |
| Isolation mechanism | Hypervisor (Type 1: bare-metal like ESXi, KVM; Type 2: hosted like VirtualBox) | Linux namespaces (PID, NET, MNT, UTS, IPC, USER) + cgroups (resource limits) |
| Guest OS | Full OS per VM (kernel + userland) — each VM boots its own kernel | No guest OS — containers share the host kernel and only package userland (libs + app) |
| Size | Typically 1-40 GB per VM image | Typically 5-500 MB per container image |
| Boot time | 30 seconds to several minutes | Milliseconds to a few seconds |
| Density | ~10-50 VMs per host (limited by memory for each guest OS) | ~100-1000+ containers per host |
| Security isolation | Stronger — separate kernel per VM means kernel exploits are contained | Weaker by default — a kernel vulnerability affects all containers on the host |
- “You said containers share the host kernel. What specific attack vector does that create that VMs do not have?” (A kernel exploit like Dirty Pipe (CVE-2022-0847) in the shared kernel affects every container on the host. In a VM, the kernel is per-VM so a guest kernel exploit is contained. This is why multi-tenant SaaS platforms run containers inside VMs — the VM is the trust boundary, containers are the density mechanism.)
- “If containers share the kernel, how can you run a Linux container on macOS or Windows?” (You cannot — Docker Desktop runs a lightweight Linux VM (using Apple’s Virtualization.framework on macOS or WSL2/Hyper-V on Windows) and runs containers inside that VM. The container still uses a Linux kernel, just one provided by the hidden VM.)
- “When would you choose Kata Containers or gVisor over standard runc-based containers?” (When you need stronger isolation than namespaces but lighter weight than full VMs. gVisor intercepts syscalls in userspace via a custom kernel. Kata Containers spawn a lightweight VM per pod. Both are used in multi-tenant environments — GKE Sandbox uses gVisor, AWS Fargate uses Firecracker microVMs.)
- “Your security team says containers are not secure enough for PCI-DSS workloads. How do you respond?” (Defense-in-depth: non-root user + read-only root filesystem + seccomp profiles + AppArmor/SELinux + USER namespace remapping + running containers inside VMs. Many PCI-DSS compliant systems run containers — the requirement is demonstrating equivalent isolation, not using VMs specifically. Reference the CIS Docker Benchmark.)
2. Docker Architecture Components
2. Docker Architecture Components
- Daemon (
dockerd): The long-running background process that manages all Docker objects (containers, images, networks, volumes). It exposes a REST API on a Unix socket (/var/run/docker.sock) or optionally over TCP. EverydockerCLI command translates to an API call to this daemon. - Client (
docker): The CLI binary. It serializes your command into an HTTP request and sends it to the daemon. The client and daemon can run on different machines — this is the basis for Docker contexts and remote management. - Registry: A stateless server that stores and distributes Docker images. Docker Hub is the default public registry; private options include AWS ECR, GCP Artifact Registry, Azure ACR, and self-hosted Harbor. Images are stored as layers (blobs) plus a manifest that describes how to assemble them.
- Containerd: The high-level container runtime. It manages the complete container lifecycle — image pull/push, container creation, storage, and networking setup. Containerd is a graduated CNCF project and is used directly by Kubernetes (bypassing the Docker daemon entirely since Kubernetes 1.24).
- Runc: The low-level OCI runtime. It is a small binary that takes a container configuration (OCI spec) and uses Linux kernel APIs (namespaces, cgroups) to actually create the isolated process. Alternatives to runc include
gVisor(Google’s sandboxed runtime) andKata Containers(lightweight VMs).
docker pull nginx— CLI sends REST request to Daemon- Daemon queries Registry for the image manifest (JSON doc listing layers and their SHA256 digests)
- Downloads layers in parallel, skipping any that already exist in local cache (content-addressable storage)
- Stores layers in
/var/lib/docker/overlay2/(on overlay2 storage driver)
docker run— Daemon creates a container config (OCI spec JSON)- Containerd prepares the filesystem — assembles the union mount from image layers + writable layer
- Runc creates namespaces (PID, NET, MNT, UTS, IPC, USER) and cgroup for resource limits
- Runc executes the entrypoint process as PID 1 inside the new namespaces
- “Kubernetes removed dockershim in 1.24. Does that mean Docker images stop working in Kubernetes?” (No. Docker images are OCI-compliant. Kubernetes removed the Docker daemon dependency, not the image format. containerd pulls and runs the same images.)
- “If the Docker daemon crashes, do running containers die?” (With containerd’s
--live-restoreand Docker’slive-restoredaemon option, containers continue running even if dockerd restarts. This is critical for production upgrades.) - “When would you replace runc with an alternative runtime like gVisor?” (Multi-tenant environments where you need stronger isolation than Linux namespaces — gVisor intercepts syscalls in userspace, adding a security boundary without the overhead of full VMs.)
3. Image Layers (Union File System)
3. Image Layers (Union File System)
RUN, COPY, ADD) creates a new layer. These layers are stored and managed using a Union File System (OverlayFS on modern Linux, formerly AUFS).How layers work internally:- Each layer is a filesystem diff — it contains only the files that changed from the previous layer.
- When you
docker pull, layers are downloaded independently and cached by their content hash (SHA256). If two images share the same base layer (e.g., both usenode:18-alpine), that layer is stored only once on disk. - When a container starts, Docker stacks all image layers (read-only) and adds a thin writable layer (also called the container layer) on top.
- Layer ordering matters for cache: Docker caches layers by instruction. If you change line 5 of a Dockerfile, layers 1-4 are cached but lines 5+ are rebuilt. This is why you
COPY package.jsonandRUN npm installbeforeCOPY . .— so dependency installation is cached when only source code changes. - Layer count affects pull time: Each layer is a separate download. Combining
RUNcommands with&&reduces layer count and total image size (intermediate files created and deleted in the sameRUNare never stored). - Size debugging:
docker history <image>shows each layer’s size, which helps identify bloated layers.
- “You mentioned Copy-on-Write. A container writes 1 byte to a 500MB file that exists in the base image layer. How much additional disk space does this consume?” (The entire 500MB file is copied to the writable layer, then the 1 byte is modified. CoW operates at the file level on overlay2, not at the block level. This is why writing to large files inside containers is expensive — and why databases should always use volumes, not the container’s writable layer.)
- “How does
docker historydiffer fromdocker image inspectfor debugging layer sizes?” (docker historyshows each layer’s instruction and size but uses the compressed size.docker image inspectshows the full config and layer digests. For the most accurate size breakdown, usedive— an open-source tool that shows file-level changes per layer and detects wasted space.) - “Two teams both use
FROM node:18-alpine. Team A’s image is 400MB, Team B’s is 180MB. Same base, same app framework. What is the most likely cause?” (Team A is probably installing dev dependencies (npm installinstead ofnpm ci --production), copying test fixtures or documentation into the image, or runningapt-get/apkinstalls without cleanup. The base image is shared but everything above it differs.)
4. Dockerfile: COPY vs ADD
4. Dockerfile: COPY vs ADD
COPY unless you specifically need ADD’s extra features.-
COPY: Copies files or directories from the build context (your local filesystem) into the image. Does exactly what it says — nothing more. Predictable and transparent. -
ADD: Does everythingCOPYdoes, plus two extra behaviors:- Auto-extracts compressed archives:
ADD app.tar.gz /app/automatically extracts the tarball into/app/. Supports tar, gzip, bzip2, and xz. - Downloads from URLs:
ADD https://example.com/file.txt /app/fetches the file. However, this is discouraged because it creates a layer that cannot be cached reliably, and you cannot verify checksums inline.
- Auto-extracts compressed archives:
COPY is preferred: Docker’s own best practices documentation recommends COPY because its behavior is explicit and predictable. ADD’s implicit extraction can cause surprises — if you ADD archive.tar.gz /data/ intending to place the archive file itself, you get extracted contents instead. For URL downloads, RUN curl -O gives you more control (you can verify checksums, set permissions, and clean up in the same layer).When ADD is appropriate: When you specifically want auto-extraction of a local tarball into the image. This is the one legitimate use case.What interviewers are really testing: Whether you follow Docker best practices and understand the principle of least surprise in Dockerfile instructions.Follow-up chain:- “A developer uses
ADD https://example.com/config.json /app/config.jsonin a Dockerfile. Beyond the caching issue, what security concern does this create?” (No checksum verification — if the URL is compromised or MITM’d, you bake a malicious file into the image with no audit trail. WithRUN curl, you can verify a SHA256 checksum inline:curl -o file.tar.gz URL && echo "expected_sha256 file.tar.gz" | sha256sum -c -.) - “Does
COPY --from=builder /app/dist /htmlwork for copying between multi-stage build stages? Is itCOPYorADD?” (OnlyCOPYsupports--from. This is another reason to default toCOPY— it has capabilities thatADDdoes not in the multi-stage context.)
5. ENTRYPOINT vs CMD
5. ENTRYPOINT vs CMD
ENTRYPOINT: Defines the executable that always runs. It is the fixed part of the command. To override it, you must usedocker run --entrypoint. In practice, this means ENTRYPOINT defines what program runs.CMD: Provides default arguments to the ENTRYPOINT. These are easily overridden by appending arguments todocker run. CMD defines how the program runs by default.
docker run myimageexecutespython app.pydocker run myimage test.pyexecutespython test.py(CMD overridden)docker run --entrypoint bash myimageexecutesbash(ENTRYPOINT overridden)
CMD npm start) wraps your command in /bin/sh -c, which means your process runs as a child of the shell. This causes PID 1 signal-handling issues — SIGTERM from docker stop goes to the shell, not your app, leading to a 10-second hard kill instead of graceful shutdown.- Web server:
ENTRYPOINT ["node"]+CMD ["server.js"]— lets you rundocker run myimage --inspect server.jsfor debugging - CLI tool:
ENTRYPOINT ["aws"]+CMD ["help"]— container acts like the AWS CLI itself - Only CMD: Many images skip ENTRYPOINT entirely and use only CMD. This is fine for simple cases where you want the entire command to be easily overridden.
--entrypoint. Also, not knowing the shell form vs exec form distinction.Follow-up chain:- “Your container takes exactly 10 seconds to stop with
docker stop. What is happening?” (The process is not handling SIGTERM. Docker sends SIGTERM, waits the grace period (default 10s), then sends SIGKILL. This is the shell form vs exec form issue — if CMD uses shell form, SIGTERM goes to/bin/sh, not your app. Or the app simply has no SIGTERM handler.) - “Can you have both ENTRYPOINT and CMD specified, where ENTRYPOINT comes from the base image and CMD from your Dockerfile?” (Yes. This is a common pattern — the base image sets the ENTRYPOINT and your Dockerfile overrides CMD to provide different default arguments.
docker inspectshows both, and you can trace which layer set each withdocker history.) - “What happens if you specify both CMD and ENTRYPOINT in shell form?” (This is a trap. In shell form, CMD is completely ignored when ENTRYPOINT is set. Only the exec form (JSON array) allows CMD to append arguments to ENTRYPOINT. Shell form ENTRYPOINT runs
/bin/sh -c <entrypoint>and CMD never gets invoked.)
6. What happens when you run `docker run`?
6. What happens when you run `docker run`?
docker run nginx, a surprisingly complex chain of events fires in rapid succession. Understanding this sequence is essential for debugging containers that fail to start or behave unexpectedly.The complete sequence:- CLI parses the command — The Docker client validates flags, serializes the request, and sends it to the Docker daemon via the REST API (
POST /containers/createfollowed byPOST /containers/{id}/start). - Image resolution — The daemon checks if
nginx:latest(or whatever tag) exists in the local image cache (/var/lib/docker/overlay2/). If not, it pulls from the configured registry (Docker Hub by default), downloading the manifest and then each layer in parallel. - Container config creation — The daemon generates an OCI runtime specification: a JSON document defining the root filesystem, environment variables, namespace configuration, cgroup limits, mount points, and the entrypoint command.
- Filesystem assembly — Containerd creates a union mount: all image layers stacked read-only, with a thin writable (copy-on-write) layer on top. This is the container’s root filesystem.
- Network setup — Docker creates a
vethpair (virtual ethernet cable). One end goes inside the container’s network namespace aseth0, the other attaches to the bridge network (docker0or a user-defined bridge). The embedded IPAM assigns an IP address. If-pwas specified, iptables DNAT rules are added for port forwarding. - Namespace creation — Runc creates Linux namespaces: PID (isolated process tree), NET (isolated network stack), MNT (isolated filesystem), UTS (isolated hostname), IPC (isolated shared memory), and optionally USER (UID remapping).
- Cgroup setup — Runc creates a cgroup for the container and applies resource limits (
--memory,--cpus,--pids-limit). - Process execution — Runc executes the entrypoint/CMD as PID 1 inside the new namespaces. The container is now running.
- “Your
docker runhangs for 45 seconds before starting. The image is already cached locally. What could cause this?” (DNS resolution timeout if the daemon is trying to resolve the image tag against a slow/unreachable registry. Also possible: slow storage driver, exhausted IP pool on the bridge network, or iptables lock contention.) - “What is the difference between
docker createanddocker run?” (docker createdoes steps 1-4 only — creates the container but does not start it.docker startthen does steps 5-8.docker runiscreate+startcombined.) - “If you run
docker run --rm, at what point is the container removed?” (After the main process exits and the container stops. The--rmflag registers a cleanup hook that removes the container and its writable layer upon exit.)
7. Detached vs Interactive Mode
7. Detached vs Interactive Mode
-d(Detached): Runs the container in the background. The container starts and your terminal is immediately returned. Use this for long-running services (web servers, databases). You interact with it viadocker logs,docker exec, ordocker attach.-it(Interactive + TTY): Two flags combined.-ikeeps STDIN open (you can type input),-tallocates a pseudo-TTY (gives you a formatted terminal). Together, they give you an interactive shell session inside the container. Use this for debugging, running one-off commands, or exploring an image’s filesystem.
docker run -d and then docker exec -it is the standard production debugging workflow. You never run production containers in interactive mode — they always run detached. Interactive mode is a development and troubleshooting tool.What interviewers are really testing: Whether you know the practical debugging workflow and understand that -i and -t are separate flags with distinct purposes. Bonus points for explaining when you would use -i without -t (piping data into a container) or -t without -i (getting formatted output without needing input).Red flag answer: Confusing -d with backgrounding a process inside the container (like CMD node server.js &). Detached mode runs the container in the background; the process inside still runs in the foreground of its namespace.Follow-up:- “You run
docker run -d myappbut the container exits immediately.docker psshows nothing. How do you debug?” (Usedocker ps -ato see stopped containers, thendocker logs <id>to see what the process printed before exiting. Checkdocker inspect -f '{{.State.ExitCode}}' <id>for the exit code.) - “What is the difference between
docker attachanddocker exec -it bash?” (attachconnects to PID 1’s STDIN/STDOUT — if you press Ctrl+C, you send SIGINT to the main process and may stop the container.execspawns a new process — exiting the exec shell does not affect the main container process.)
8. Docker Context
8. Docker Context
/var/run/docker.sock), but contexts let you switch targets seamlessly.Use cases:- Remote server management:
docker context create prod --docker "host=ssh://user@prod-server"lets you rundocker psagainst your production host without SSH-ing in. - Minikube/Kind: Switch between local Kubernetes clusters and the default Docker daemon.
- Docker Desktop vs Colima: On macOS, switch between different container runtimes.
DOCKER_HOST environment variables, which was error-prone (forgetting you had it set could lead to accidentally running commands against production).What interviewers are really testing: Whether you have managed Docker across multiple environments and understand the operational tooling beyond basic docker run. This question separates developers who only run Docker locally from those who manage remote Docker hosts.Red flag answer: Not knowing Docker contexts exist, or confusing Docker contexts with Kubernetes contexts (kubectl config use-context). They are similar concepts but for different tools.Follow-up:- “You accidentally ran
docker rm -f $(docker ps -q)while your context was set to production. How do you prevent this in the future?” (Use context naming conventions with color-coded terminal prompts, require confirmation for destructive operations on production contexts, or use read-only socket proxies for production.) - “How do Docker contexts differ from Kubernetes contexts, and can you use both simultaneously?” (Docker contexts switch the Docker daemon target; K8s contexts switch the cluster/namespace. They are independent — you can have Docker pointing at production while kubectl points at staging.)
9. Image vs Container
9. Image vs Container
- Image: A read-only, layered filesystem template. It contains everything needed to run an application — OS libraries, runtime, application code, and configuration. Images are immutable and identified by a content-addressable hash (SHA256). You can think of it as a compiled binary that captures an entire runtime environment.
- Container: A running (or stopped) instance of an image. It adds a thin writable layer on top of the image layers (using Copy-on-Write), plus an isolated process space with its own PID namespace, network stack, and filesystem view. Multiple containers can share the same image layers, each with their own writable layer.
docker build an image, docker push it to a registry, and docker run it to create a container. Images are portable and reproducible; containers are ephemeral and disposable.Practical implication: This is why you should never store important data in a container’s writable layer. When the container is removed, that layer is gone. Use volumes for persistent data. The container should be cattle (replaceable), not a pet (irreplaceable).What interviewers are really testing: Whether you understand the immutability model — images are immutable artifacts, containers are ephemeral processes. This mental model drives correct decisions about data persistence, deployment strategies, and debugging approaches.Red flag answer: “An image is a stopped container” or “You save a container to create an image.” While docker commit exists, using it is an anti-pattern — images should be built from Dockerfiles for reproducibility, not by capturing container state.Follow-up:- “Can you modify a running container’s filesystem and then create a new image from it? Should you?” (Yes,
docker commitdoes this. No, you should not — it creates unreproducible images with no audit trail. Always use Dockerfiles.) - “Two containers are running from the same image. Container A writes a file. Can Container B see it?” (No. Each container has its own writable layer. The image layers are shared and read-only. To share data between containers, use a shared volume.)
10. Multi-Architecture Builds
10. Multi-Architecture Builds
docker buildx is the tool that makes this possible. It extends the standard docker build with multi-platform support.How it works:- You create a buildx builder instance:
docker buildx create --use - Build for multiple platforms in one command:
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest --push . - Docker uses QEMU emulation to build for architectures different from your host. An amd64 machine can build arm64 images (slowly) via QEMU user-mode emulation.
- The result is a manifest list (also called a multi-arch manifest) — a single tag (
myapp:latest) that points to multiple platform-specific images. When a user pulls the image, Docker automatically selects the correct variant for their architecture.
ubuntu-latest on ARM, GitLab offers ARM runners) or cross-compilation in your build stage (Go’s GOARCH=arm64 compiles natively without emulation).Why this matters: AWS Graviton instances are 20-40% cheaper than equivalent x86 instances. If your images only support amd64, you cannot take advantage of this cost saving. Multi-arch builds are table stakes for modern cloud deployments.2. Networking & Storage
11. Docker Network Drivers
11. Docker Network Drivers
| Driver | Scope | Use Case | Performance | Isolation |
|---|---|---|---|---|
| Bridge (default) | Single host | Standard container-to-container communication | Good (slight NAT overhead) | Containers are on a private subnet |
| Host | Single host | Performance-critical apps (eliminate NAT overhead) | Best (no network translation) | None — container shares host’s network stack entirely |
| None | Single host | Containers that need no network (batch jobs, security-sensitive compute) | N/A | Complete network isolation |
| Overlay | Multi-host | Swarm services or multi-host container communication | Moderate (VXLAN encapsulation overhead) | Encrypted cross-host communication |
| Macvlan | Single host | Containers that need to appear as physical devices on the LAN (legacy app integration) | Excellent (no NAT, no bridge) | Each container gets a real MAC and IP on the physical network |
- “Your application needs the absolute lowest possible network latency. You are currently using bridge networking. What do you change and what do you lose?” (Switch to
--network host. You eliminate NAT/bridge overhead, but lose port isolation — two containers cannot both bind to port 80. You also lose the embedded DNS service for container name resolution. Measure the actual improvement; for most apps, the bridge overhead is <0.1ms.) - “You need containers across 3 hosts to communicate without Kubernetes. What network driver do you use and how?” (Overlay network with Docker Swarm.
docker swarm initon one host, join the others, create an overlay:docker network create -d overlay mynet. Under the hood, VXLAN encapsulates L2 frames in UDP for cross-host communication. Alternative: use a third-party CNI like Weave or Flannel without Swarm.) - “A legacy application expects to have its own MAC address and appear directly on the physical LAN. Can Docker do this?” (Yes — Macvlan driver. Each container gets a unique MAC and IP on the physical network. The trade-off: the host cannot communicate with its own macvlan containers without a separate macvlan sub-interface, and you need the physical switch to accept multiple MACs per port.)
12. How Bridge Network works (Internals)
12. How Bridge Network works (Internals)
docker0bridge: When Docker starts, it creates a virtual Ethernet bridge calleddocker0(or a custom name for user-defined networks). Think of this as a virtual network switch that all containers on this network plug into.vethpair: For each container, Docker creates a virtual Ethernet pair — a “virtual cable” with two ends. One end (eth0) is placed inside the container’s network namespace. The other end (vethXXXXX) is attached to thedocker0bridge on the host.- IP allocation: Docker’s built-in IPAM (IP Address Management) assigns each container an IP from the bridge’s subnet (default:
172.17.0.0/16). - iptables NAT: For outbound traffic, Docker adds masquerade rules so container traffic appears to come from the host’s IP. For inbound traffic with port mapping (
-p 8080:80), Docker adds DNAT rules that forward packets from the host port to the container IP.
docker0 bridge does not provide DNS resolution between containers — you must use IP addresses or the deprecated --link flag. User-defined bridges (docker network create mynet) include an embedded DNS server that resolves container names to IPs automatically, which is essential for service discovery.Follow-up chain:- “You run
iptables -L -t naton the host and see dozens of DNAT rules from Docker port mappings. A new team member is confused about where these come from. Explain.” (Every-p host:containerflag creates two iptables rules: a DNAT rule in the nat table that rewrites incoming packets’ destination IP/port to the container’s IP/port, and a masquerade rule for outbound traffic.docker-proxyalso listens on the host port as a userspace fallback for hairpin NAT. This is why Docker requires root orNET_ADMINcapability.) - “Two containers on the same bridge can communicate. Can a container on bridge A reach a container on bridge B?” (No, not by default. Each bridge is an isolated L2 domain. To allow it: either connect a container to both networks (
docker network connect bridgeB mycontainer) or route between bridges at the host level. The isolation is the entire point of separate networks.) - “How does Docker handle DNS for containers on a user-defined bridge?” (Docker runs an embedded DNS server at
127.0.0.11inside each container. It resolves container names and network aliases to their internal IPs. For external DNS, it forwards to the host’s configured DNS servers. The127.0.0.11address is intercepted by iptables rules in the container’s network namespace.)
13. Container to Container Communication
13. Container to Container Communication
- Same user-defined bridge network: Containers can reach each other by container name (DNS resolution is automatic) or by IP address. This is the recommended approach. Example: a Node.js app connects to
mongodb://mongo:27017wheremongois the container name on the same network. - Same default bridge: Containers can communicate by IP address only. No DNS resolution. You would need the deprecated
--linkflag for name resolution — avoid this. - Different bridge networks: Containers cannot communicate by default. Network isolation is the whole point. To allow it, connect a container to multiple networks:
docker network connect network2 mycontainer. - Host network: Containers on the host network communicate via
localhostlike regular processes.
14. Exposing Ports (`-p`)
14. Exposing Ports (`-p`)
-p [host_ip:]host_port:container_port[/protocol].Common patterns:EXPOSE vs -p: The EXPOSE instruction in a Dockerfile is documentation only — it does not actually publish the port. The -p flag at runtime is what actually creates the port mapping. This distinction confuses many beginners.Security note: -p 8080:80 binds to 0.0.0.0 by default, meaning the port is accessible from any network interface, including public IPs. In production, either use -p 127.0.0.1:8080:80 to bind to localhost only, or place containers behind a reverse proxy and do not publish ports directly.15. Volumes vs Bind Mounts
15. Volumes vs Bind Mounts
| Feature | Volume | Bind Mount |
|---|---|---|
| Managed by | Docker engine (/var/lib/docker/volumes/) | Host filesystem (any path) |
| Portability | Can be shared across containers, backed up with docker volume commands | Tied to host directory structure |
| Performance | Optimized by Docker (especially on macOS/Windows where Docker uses a VM) | Native filesystem speed on Linux; can be slow on Docker Desktop |
| Best for | Production data persistence (databases, uploads) | Development workflows (live code reloading) |
node_modules can see operations take 5-10x longer in a bind-mounted container vs. a volume. The fix: mount node_modules as a separate anonymous volume so it stays in the Linux VM.Follow-up chain:- “Your Postgres container’s data volume is 200GB. You need to migrate it to a new host. What is your approach?” (Use
pg_dump/pg_restorefor a logical backup (portable, can change schema), not a filesystem copy. For large volumes,pg_basebackupwith streaming replication is faster. Neverdocker cportara running database’s volume — you risk inconsistent state.) - “A developer says ‘I deleted a file in my container but the image is still the same size.’ Explain why.” (Deleting a file in the writable layer does not affect the read-only image layers. The deletion is recorded as a “whiteout” file in the writable layer. The original file still exists in its image layer. This is Union FS semantics — layers are additive, never modified.)
- “When would you use a volume driver other than the default
localdriver?” (Remote storage backends:rexray/ebsfor AWS EBS volumes,netappfor NFS,portworxfor distributed storage across nodes. In Swarm or when containers move between hosts, volumes need to follow them. Kubernetes handles this with PersistentVolumes and CSI drivers instead.)
16. Tmpfs Mount
16. Tmpfs Mount
- Secrets and sensitive data: Temporary credentials or encryption keys that should never touch persistent storage.
- High-speed scratch space: Intermediate computation results or caches that benefit from memory-speed I/O and do not need to survive a restart.
- Security compliance: Some compliance frameworks require that certain data never be written to a persistent filesystem.
17. Dangling Images/Volumes
17. Dangling Images/Volumes
- Dangling Image (
<none>:<none>): Created when you rebuild an image with the same tag. The old layers lose their tag and become “dangling.” Also created by interrupted multi-stage builds. - Dangling Volume: A volume that is not referenced by any container (including stopped containers). This happens when you
docker rma container without the-vflag.
docker system prune -f --filter "until=168h" as a weekly cron job on build servers. Without this, disk usage grows unbounded and eventually causes builds to fail with “no space left on device.”18. DNS in Docker
18. DNS in Docker
127.0.0.11 inside every container connected to a user-defined network. This server resolves container names and network aliases to their internal IP addresses, enabling service discovery without hardcoding IPs.Important limitation: The default bridge network (docker0) does not provide DNS resolution. Containers on the default bridge can only communicate by IP address. This is one of the most common “why can’t my containers talk to each other” debugging issues. The fix is always to use a user-defined bridge: docker network create mynet.Network aliases: You can give a container multiple DNS names using --network-alias. Multiple containers with the same alias create round-robin service discovery.19. IPv6 Support
19. IPv6 Support
daemon.json with "ipv6": true and a "fixed-cidr-v6" subnet, then restart the Docker daemon. User-defined networks also need --ipv6 and --subnet flags.Why this matters: As IPv4 addresses become scarcer and cloud providers charge for public IPv4 (AWS began charging $0.005/hour per public IPv4 address in 2024), running dual-stack or IPv6-only container networks is increasingly relevant for cost optimization.20. Backup/Restore Volume
20. Backup/Restore Volume
cp them. The standard approach is to use a temporary container as a bridge:Backup:pg_dump, mongodump, mysqldump) rather than filesystem-level copies. Filesystem backups of a running database can capture inconsistent state. Native tools ensure transactional consistency.3. Best Practices & Optimization
21. Minimize Image Size
21. Minimize Image Size
- Use minimal base images:
node:18is ~900MB.node:18-slimis ~200MB.node:18-alpineis ~180MB.gcr.io/distroless/nodejs18is ~120MB. For Go/Rust, you can usescratch(literally 0 bytes) since the binary is statically compiled. - Multi-stage builds: Build in a full image (compilers, dev tools), copy only the artifact to a minimal runtime image. A Go service goes from ~800MB to ~15MB with
alpineor ~5MB withscratch. - Combine RUN commands: Files deleted in a later
RUNstill exist in earlier layers. This is the most common mistake: .dockerignore: Exclude.git(can be 100MB+),node_modules,dist,*.log,.env. These are never needed in the image.- Install only production deps:
npm ci --productioninstead ofnpm install.pip install --no-cache-dir. --no-install-recommendswithapt-getskips suggested packages, cutting 50-100MB from Debian images.
RUN rm in a separate layer does not reduce image size.- Senior: Writes multi-stage Dockerfiles, uses
.dockerignore, picks slim/alpine base images, and knows about the same-layer delete trick. - Staff: Designs the org-wide image strategy — curated golden base images with security patches, automated size regression checks in CI that fail builds over a threshold, image scanning gates (Trivy/Snyk), signed base images (cosign), and a policy that sets image size SLOs per runtime (<100MB for Go, <300MB for Node, <500MB for Python). Also thinks about registry cost: 10K images x 500MB = 5TB storage billed monthly.
- “You have a Python ML service with a 3GB image due to PyTorch. How do you reduce it?” — Multi-stage: copy only needed
.sofiles. Usepytorch/pytorch:*-runtime. Store model weights in a volume or S3 instead of baking into image. Switch topython:3.11-slimbase. Use BuildKit cache mounts for pip to avoid re-downloading wheels. - “What is the difference between
docker image lsreported size and actual disk usage?” — Shared layers are counted once on disk but shown fully per image.docker system dfshows true usage. Registries (ECR/GCR) also deduplicate layers, so pushing 10 images that share a 500MB base layer only costs 500MB once. - “Alpine images broke your production app because of
muslvsglibc. How do you handle this?” — (a) Switch todebian:bookworm-slimorgcr.io/distroless/base-debian12— similar size, glibc-compatible. (b) Rebuild the problematic dependency for musl (often impractical for ML libs). (c) Use distroless which keeps glibc but strips the shell/package manager. Common culprits:requests+certifi,numpy/scipywheels, DNS resolution differences. - “At 10x scale (10,000 services), what operational problems emerge from image size?” — Registry storage cost, node disk pressure (Docker’s overlay2 graph driver caps at ~80% of disk), pull bandwidth during a stampede (e.g., Kubernetes scale-out pulling 10GB from 500 nodes = registry overload), and image GC latency. Mitigations: image pre-pulling via DaemonSet, registry mirrors per region, aggressive image garbage collection, and signed base image policies.
- Step 1:
docker history <image>— identify the fat layers. UsuallyCOPY . .pulling innode_modules(1GB+) or base image (node:18= 900MB). - Step 2: Switch base to
node:18-alpine(180MB) orgcr.io/distroless/nodejs18(120MB). - Step 3: Multi-stage build — first stage installs
devDependenciesand builds; second stage copies onlydist/and runsnpm ci --omit=dev. - Step 4: Add
.dockerignoreexcludingnode_modules,.git, tests, docs,.env*. - Step 5: Use BuildKit cache mount for
npm:RUN --mount=type=cache,target=/root/.npm npm ci. - Step 6: Verify with
docker image inspectthat the final image is <200MB and contains only runtime artifacts.
.dockerignore, layer ordering, and the trap of deleting in a later layer.What strong candidates say: “Image size is a symptom of Dockerfile discipline. I treat every COPY and RUN as a layer that lives forever — there is no ‘undoing’ in a later layer. I start with distroless or alpine, multi-stage everything, and measure image size in CI as a first-class metric. Small images are cheaper, faster to pull, and have a smaller attack surface — three wins from one discipline.”22. Layer Caching
22. Layer Caching
COPY . .before dependency install is the most common mistakeARGorENVchanges invalidate all subsequent layersRUN apt-get updatewithout pinning creates a layer that Docker considers “unchanged” even when the package index is stale — combine it withapt-get installin one layer- Build arguments like
--build-arg BUILD_DATE=$(date)bust the cache by design
--cache-from with a registry-cached image so CI runners can pull previous layers instead of rebuilding from scratch. GitHub Actions and GitLab CI support this natively with BuildKit cache backends.Red flag answer: Not knowing why COPY . . should come after dependency installation, or thinking that Docker caches based on file timestamps (it uses content hashes).Follow-up chain:- “Your CI builds take 12 minutes. Locally they take 30 seconds. Same Dockerfile. What is wrong?” (CI runners start with an empty Docker cache on every run (ephemeral runners). Locally, you have the cache from previous builds. Fix: use
--cache-from type=registry,ref=myapp:cacheto pull the previous build’s layers from the registry. BuildKit also supports--cache-toto push the cache after building.) - “You change an
ENVvalue in the middle of your Dockerfile. What happens to caching?” (All layers after theENVchange are invalidated. ENV changes modify the layer’s metadata hash. This is why ENV instructions should be near the top (rarely changing) or near the bottom (application-specific config). Same applies toARG.) - “How does BuildKit’s cache differ from the legacy builder’s cache?” (BuildKit can: export/import cache from registries, use content-aware caching for
COPY(checks file content hashes, not just filenames), build independent stages in parallel, and use cache mount types that persist between builds without being stored in layers. The legacy builder was linear and local-only.)
23. Multi-Stage Builds
23. Multi-Stage Builds
- Size: 87% smaller (1.2GB → 150MB)
- Security: No build tools in production image
- Speed: Faster pulls and deployments
- Secrets: Build-time secrets don’t leak to final image
RUN rm -rf.” This does not work — deleted files still exist in earlier layers. The only way to exclude build tools from the final image is to never include them in the final stage.Follow-up chain:- “Your multi-stage build copies
node_modulesfrom the builder but the final image is still 600MB. Where is the size coming from?” (Likely copying dev dependencies. The builder rannpm install(all deps). The final stage should runnpm ci --productionor usenpm prune --productionbefore copying. Alternatively, copy onlydist/and run a freshnpm ci --productionin the final stage.) - “Can you use more than two stages? When would you?” (Yes. Common pattern: Stage 1 (deps) — install dependencies. Stage 2 (builder) — compile/build. Stage 3 (tester) — run tests. Stage 4 (production) — final image. Each stage can be built independently with
--target. CI runs--target=testerfor tests, production deploys build without--targetto get the last stage.) - “How do you handle build-time secrets (like an NPM token for private packages) in a multi-stage build without leaking them?” (Use BuildKit’s
--mount=type=secret. The secret is mounted as a tmpfs file during theRUNinstruction and never written to any layer. Even if the builder stage is leaked, the secret is not in any layer. Never useARGfor secrets — they appear indocker history.)
24. Handling PID 1 (Init Process)
24. Handling PID 1 (Init Process)
systemd or init, which has special signal-handling behavior: it only responds to signals it explicitly handles. Most application processes are not designed to be PID 1 and do not set up signal handlers for SIGTERM.What goes wrong: When Kubernetes or Docker sends SIGTERM to stop a container, the signal goes to PID 1. If your app does not handle SIGTERM, the signal is ignored. Docker waits for the grace period (default 10 seconds), then sends SIGKILL (force kill). This means every container shutdown takes a full 10 seconds instead of being instant, and your app does not get a chance to close database connections, flush logs, or finish in-flight requests.The zombie process problem: PID 1 is also responsible for reaping zombie (orphaned) child processes. If your app forks child processes (common in Python, Ruby, or shell scripts), dead children accumulate as zombies because your app does not call wait().Solutions:docker stop -t 0” (which sends SIGKILL immediately, which is the opposite of graceful).Follow-up questions:- “Your Kubernetes pods take exactly 30 seconds to terminate (the terminationGracePeriodSeconds default). What is likely happening?” (The app is not handling SIGTERM. Kubernetes sends SIGTERM, waits 30s, then sends SIGKILL. Fix: add a signal handler or use tini.)
- “What is the difference between shell form CMD and exec form CMD in terms of signal handling?” (Shell form wraps in
/bin/sh -c, so SIGTERM goes to the shell, not your app. Exec form runs your app directly as PID 1.)
25. Non-Root User
25. Non-Root User
root (UID 0). This means if an attacker exploits your application, they have root privileges inside the container. Combined with a kernel vulnerability, this can lead to container escape — gaining root on the host machine.Best practice — create and use a non-root user:- Place
USERafterRUNcommands that need root (installing packages, creating directories) but beforeCMD. - Use
COPY --chown=appuser:appgroupto set ownership during copy, avoiding a separateRUN chownlayer. - In Kubernetes, enforce non-root with
securityContext: { runAsNonRoot: true }at the pod level. This will reject any container that tries to run as root. - Some base images (like
node:18) include a built-innodeuser (UID 1000) — use it instead of creating your own.
privileged: true. This is a standard security baseline at companies with SOC 2 or ISO 27001 compliance.Red flag answer: “Containers are already isolated, so root inside a container is fine.” This ignores container escape vulnerabilities and shows no awareness of defense-in-depth security principles.26. Health Checks
26. Health Checks
--interval=30s: Check every 30 seconds--timeout=5s: If the check takes longer than 5s, it is a failure--retries=3: After 3 consecutive failures, container status becomesunhealthy--start-period=60s: Grace period after container start (for slow-starting apps like JVM services) during which failures do not count
- Docker Swarm: Unhealthy containers are killed and replaced automatically.
- Docker standalone: The
unhealthystatus is visible indocker psbut Docker does NOT automatically restart the container. You need restart policies (--restart=unless-stopped) combined with health checks for self-healing. - Kubernetes: Does NOT use Dockerfile HEALTHCHECK. It uses its own probe system:
livenessProbe(restart if failed),readinessProbe(remove from load balancer if failed), andstartupProbe(disable other probes until app has started).
curl in HEALTHCHECK requires curl to be installed in the image. Alpine images do not include it by default. Use wget -q --spider instead, or better yet, write a small health-check binary or use the application’s own health endpoint.Red flag answer: “Just use HEALTHCHECK CMD curl localhost” without understanding the parameters, or not knowing that Kubernetes ignores Dockerfile HEALTHCHECK entirely.27. .dockerignore
27. .dockerignore
.dockerignore works like .gitignore but for the Docker build context — the directory tree sent to the Docker daemon before building. Without it, everything in your project directory is uploaded.What to exclude and why:.git/— can be 100MB+ and is never needed inside an image. One team I worked with shaved 600MB from their build context just by adding this.node_modules/— your Dockerfile shouldRUN npm ciinside the image. Copying the host’snode_modulescauses platform-mismatch bugs (Linux container, macOS host)..envfiles — secrets should never be baked into images. Use runtime env vars or secrets management.Dockerfileanddocker-compose.yml— meta files, not application code.- Test directories, documentation, IDE configs (
.vscode/,.idea/). - Build artifacts (
dist/,build/,*.log).
.dockerignore, COPY . . copies .env files containing API keys and database passwords into the image layer. That layer is stored permanently in the image — even if you RUN rm .env in a later layer. Anyone with access to the registry can extract it with docker save | tar.What interviewers are really testing: Whether you understand that the build context is a security boundary, not just a performance optimization.Red flag answer: Not knowing .dockerignore exists, or saying “I just copy the files I need with multiple COPY commands” (which is fragile and misses the point).28. Tagging Strategy
28. Tagging Strategy
latest is dangerous:latestis not a special tag — it is just the default when no tag is specified. It does not mean “most recent.” If you pushv2.0without also pushinglatest,lateststill points to whatever it pointed to before.- Two developers pull
latestan hour apart and get different images. Debugging becomes impossible. - Kubernetes
imagePullPolicy: Alwayswithlatesttags causes non-deterministic deployments.
- Semantic versioning:
myapp:1.2.3,myapp:1.2,myapp:1. The more specific tag is immutable; the less specific floats. Users who pin to1.2.3get determinism. Users who pin to1.2get patch updates. - Git commit SHA:
myapp:a1b2c3d. Every image is traceable to exact source code. Combined with CI, this creates a full audit trail. This is the strategy used by most mature CI/CD pipelines. - Hybrid: Tag with both semver and SHA:
myapp:1.2.3andmyapp:a1b2c3dpointing to the same manifest. Semver for human readability, SHA for automation.
:latest or untagged images. Require digest pinning (@sha256:...) for the highest security.What interviewers are really testing: Whether you think about image identity as a deployment concern, not just a naming convention.Red flag answer: “I always use latest and just rebuild.” No awareness of immutable tags or deployment traceability.29. ARG vs ENV
29. ARG vs ENV
| Feature | ARG | ENV |
|---|---|---|
| Available during | Build time only | Build time AND run time |
| Persists in image | No — gone after build completes | Yes — baked into the image metadata |
| Override at build | --build-arg KEY=value | Cannot override at build time |
| Override at run | N/A (does not exist at runtime) | -e KEY=value or --env-file |
| Layer caching | Changing an ARG value busts cache for all subsequent layers | Changing ENV busts cache similarly |
ARG controls which base image to use (build-time decision). The ENV bakes metadata into the image that the running container can read.Security gotcha: Never use ARG for secrets. ARG values are visible in docker history even though they do not persist at runtime. Use BuildKit’s --mount=type=secret instead:ARG to pass API keys into the build.30. Flattening Images
30. Flattening Images
docker export (which exports a container’s filesystem) and docker import (which creates a new image from that filesystem).docker history shows nothing useful), and the ability to share base layers with other images. In practice, multi-stage builds are a better optimization than flattening.4. Troubleshooting & Operations
31. Docker Exec
31. Docker Exec
docker exec spawns a new process inside an already-running container. This is the primary tool for debugging containers in real time.exec does not restart the container or affect the main process. It creates an additional process that shares the container’s namespaces (filesystem, network, PID space). This means you can inspect files, run curl localhost:3000/health, check environment variables, or install debugging tools — all without disrupting the running application.Production tip: If your image is distroless (no shell), you cannot exec into it. Use ephemeral debug containers in Kubernetes (kubectl debug -it pod/myapp --image=busybox) or build a debug variant of your image for staging environments.32. Logs
32. Logs
json-file(default): Logs stored as JSON in/var/lib/docker/containers/<id>/. Supportsdocker logs. Can grow unbounded — setmax-sizeandmax-fileto prevent disk exhaustion.syslog: Forward to a syslog server.awslogs: Forward directly to CloudWatch (no agent needed).fluentd: Forward to Fluentd/Fluent Bit for aggregation.
fluentd), docker logs no longer works for that container. This trips up many operators during incident debugging. The workaround is to use the dual-logging feature (Docker 20.10+) or keep a local json-file driver alongside the remote driver.33. Inspect
33. Inspect
docker inspect returns a comprehensive JSON blob with every detail about a container, image, network, or volume. It is the single most useful debugging command.State.OOMKilled: Was the container killed for exceeding memory limits?State.ExitCode: Non-zero means the process crashed. 137 = OOMKilled/SIGKILL, 143 = SIGTERM.NetworkSettings.IPAddress: The container’s IP on the bridge network.Config.Env: All environment variables (check for misconfiguration).Mounts: Volume and bind mount mappings.HostConfig.RestartPolicy: Current restart policy.
34. Container Exits Immediately
34. Container Exits Immediately
- CMD runs a short-lived command:
CMD ["echo", "hello"]prints “hello” and exits. The container stops immediately. - Application crashes on startup: Missing env vars, bad config, port conflict. Check
docker logs. - Shell form without foreground process:
CMD node server.js &backgrounds the process, so the shell has nothing to wait for and exits.
docker logs <id> to see what the process printed before exiting, and docker inspect -f '{{.State.ExitCode}}' to check the exit code.35. Connection Refused (Localhost)
35. Connection Refused (Localhost)
127.0.0.1:3000, but when you try to reach it from the host via -p 3000:3000, you get “connection refused.”Why it happens: 127.0.0.1 means “only accept connections from this network namespace.” Since the host is in a different namespace than the container, connections from the host are rejected. Inside the container, localhost is the container’s own loopback, not the host’s.Fix: Configure your application to listen on 0.0.0.0 (all interfaces), which accepts connections from any network namespace — including the host via the bridge network.127.0.0.1 in development mode.36. OOMKilled (Exit Code 137)
36. OOMKilled (Exit Code 137)
memory.oom_control) terminated the process. Inside Docker, this almost always means the container hit its memory cgroup limit; occasionally it means the whole host is OOM and the kernel picked your container as a victim based on oom_score_adj.Diagnosis workflow:docker inspect -f '{{.State.OOMKilled}}' <id>— returnstrueif the cgroup killed it. Iffalsebut exit code is 137, it was host-level OOM or an externalkill -9.docker stats --no-stream <id>(while running) ordocker events --filter container=<id>— real-time memory and OOM events.- Inside Kubernetes:
kubectl describe pod <pod>->Last State: Terminated, Reason: OOMKilled, Exit Code: 137. dmesg | grep -i "killed process"on the node — kernel log of which PID was killed and why.- Check cgroup memory stats:
cat /sys/fs/cgroup/memory/docker/<id>/memory.max_usage_in_bytes(cgroup v1) ormemory.peak(v2).
- Memory leak in app code — heap grows unboundedly. Fix: heap profiling (pprof, jemalloc, memray for Python).
- JVM heap not sized for container —
-Xmxwas set without considering off-heap (metaspace, threads, direct buffers, JIT code cache). Rule of thumb:-Xmxshould be ~70-75% of container limit. Use-XX:MaxRAMPercentage=75.0so the JVM auto-detects container limits. - Loading large files into memory — reading a 2GB CSV with pandas, not streaming. Fix: chunked iteration.
- Unbounded caches — in-process LRU with no size cap, or Redis without
maxmemory. - Page cache pressure — a file-heavy workload on a tight container can cause the working set to exceed the limit.
memory.oom.groupmay kill the whole cgroup.
- Senior: Identifies the cause through
docker statsand logs, fixes the leak or raises the limit. - Staff: Designs the memory management strategy — mandatory memory requests/limits in CI, JVM Kubernetes-aware flags templated into base images, Prometheus alerts on
container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.85, a runbook linking exit 137 -> heap dump collection, and a post-incident process where “OOMKilled in prod” auto-creates a Jira with the heap dump attached.
- “Your container was killed with exit 137 but
docker statsshows memory was well under the limit at the time of death. What is happening?” —docker statssamples every few seconds; a momentary spike can kill the container between samples. Or the kill came from host-level OOM (check/var/log/kern.log). Or a subprocess forked and its RSS counted against the cgroup briefly. - “How is Java’s ‘Container-aware JVM’ different in Java 10+ vs older versions?” — Pre-Java 10, the JVM read
/proc/meminfodirectly, seeing the host’s memory, and would set heap to a fraction of host RAM — blowing past the container limit. Java 10+ respects cgroup limits by default; flags like-XX:MaxRAMPercentagetarget container RAM. Upgrading old JVMs is often the fastest OOM fix. - “What is the difference between
memoryandmemory.swaplimits in Docker?” —--memory=512mcaps RAM.--memory-swap=1gcaps RAM+swap total (so 512MB swap allowed).--memory-swappiness=0disables swap. Most container platforms disable swap entirely because swap defeats the point of memory limits — you just degrade into thrashing before OOM. - “How do you capture a heap dump from a container that is OOMKilling in a loop?” — Add a preStop or automatic dump on OOM: JVM
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapwith a PVC mounted at/tmp. For Python, usefaulthandler+tracemallocsnapshots on SIGUSR1. For Go,pprof.WriteHeapProfileon SIGUSR1. The key is dumping before the kill, since SIGKILL is non-catchable.
- Step 1:
kubectl top podover time — is memory growing linearly (leak) or spiking (load-related)? - Step 2: If linear, classic leak. Take a heap snapshot:
kill -SIGUSR2 <pid>with Node--heapsnapshot-signal=SIGUSR2, thenchrome://inspectto analyze. - Step 3: Common Node leaks: unbounded event listeners (
emitter.onin a hot path withoutoff), closures capturing large scope, globalMaporSetused as cache, promise chains retaining references. - Step 4: Short-term: raise the limit to 2GB and add autoscaling to reduce per-pod pressure. Medium-term: fix the leak identified in heap diff.
- Step 5: Guardrail: add
--max-old-space-size=768(75% of 1GB) to Node so V8 does GC pressure before cgroup kills.
37. Pruning
37. Pruning
| Command | What it removes |
|---|---|
docker container prune | Stopped containers |
docker image prune | Dangling images (untagged) |
docker image prune -a | All unused images |
docker volume prune | Volumes not attached to any container |
docker network prune | Unused networks |
docker system prune | Containers + images + networks (not volumes) |
docker system prune -a --volumes | Everything unused |
docker volume prune is the most dangerous — it deletes data volumes with no confirmation beyond the initial prompt. Always verify with docker volume ls first. In production, schedule automated pruning with the --filter "until=168h" flag to preserve recent resources.38. Docker Events
38. Docker Events
docker events streams a real-time feed of actions happening on the Docker daemon — container lifecycle events (create, start, die, destroy), image events (pull, push, tag), volume and network events.docker events to a log aggregator to track container restarts, OOM kills, and image pulls. This is how some teams detect crashlooping containers before Kubernetes restarts mask the problem.39. Stats
39. Stats
docker stats provides a live-updating view of resource consumption per container, similar to top for Docker.40. Restart Policies
40. Restart Policies
| Policy | Behavior |
|---|---|
no (default) | Never restart. Container stays stopped. |
on-failure[:max-retries] | Restart only if exit code is non-zero. on-failure:5 retries up to 5 times. |
always | Always restart, including after daemon restart. Does NOT restart if manually stopped with docker stop. |
unless-stopped | Like always, but does not restart if it was manually stopped before the daemon restarted. |
unless-stopped for services you want to survive host reboots (the Docker daemon restarts automatically via systemd). Use on-failure:5 for batch jobs where infinite restarts would be harmful. Never use always in Kubernetes — the kubelet handles restarts via the pod spec’s restartPolicy.Backoff behavior: Docker applies an exponential backoff delay between restarts, starting at 100ms and doubling up to a cap of 1 minute. This prevents a crashlooping container from consuming all system resources.5. Security & Ecosystem
41. Namespaces
41. Namespaces
| Namespace | Isolates | Effect |
|---|---|---|
| PID | Process IDs | Container sees its own process tree starting at PID 1. Cannot see or signal host processes. |
| NET | Network stack | Container gets its own IP address, routing table, and port space. Two containers can both listen on port 80. |
| MNT | Filesystem mounts | Container has its own root filesystem. Cannot see host files unless explicitly mounted. |
| UTS | Hostname | Container can have its own hostname distinct from the host. |
| IPC | Inter-process communication | Shared memory and semaphores are isolated per container. |
| USER | User/Group IDs | Root (UID 0) inside the container can map to a non-root UID on the host. This is the foundation of rootless containers. |
- “What is the USER namespace and why is it not enabled by default in Docker?” (USER namespace remaps UIDs: root (UID 0) inside the container maps to an unprivileged UID (e.g., 100000) on the host. Even if an attacker escapes the container as “root,” they are nobody on the host. Docker does not enable it by default because it breaks volume permissions — files created by the remapped UID are not owned by the expected host user. Enable with
"userns-remap": "default"indaemon.json.) - “Can you create a container that shares the host’s PID namespace? When would you want this?” (Yes,
--pid=host. The container can see all host processes. Useful for monitoring/debugging tools likestrace, process managers, and sidecar containers that need to send signals to host processes. Security risk: the container cankill -9any host process.) - “How do namespaces interact with capabilities? If a container has
NET_ADMINcapability, does it affect the host network?” (Only if the container shares the host’s NET namespace. With a separate NET namespace (the default),NET_ADMINonly allows modifying the container’s own network stack. Capabilities are scoped to the namespace they operate in — this is the layered security model.)
42. Cgroups (Control Groups)
42. Cgroups (Control Groups)
- CPU: Limit to N cores or a percentage of host CPU. Throttling (not killing) when exceeded.
- Memory: Hard limit — kernel OOM-kills the container’s process if exceeded. Soft limit — triggers reclaim but does not kill.
- Disk I/O: Limit read/write bandwidth to storage devices (BPS and IOPS).
- PIDs: Limit the number of processes a container can create (prevents fork bombs).
- “You set
--cpus=2on a container. The host has 8 cores. Does the container see 2 cores or 8?” (It sees 8 cores — cgroups limit time, not visibility./proc/cpuinfoinside the container shows all host CPUs. The container gets 200% of a single core’s CPU time, distributed across any available cores. This confuses JVM and Go runtime auto-detection — they may spawn 8 threads thinking they have 8 cores, then contend for 2 cores’ worth of CPU time. UseGOMAXPROCSor-XX:ActiveProcessorCountto override.) - “What is the difference between CPU
--cpus(quota) and--cpuset-cpus(pinning)?” (--cpus=2gives you 2 cores’ worth of time on any core (scheduler decides).--cpuset-cpus="0,1"pins the container to physical cores 0 and 1 only. Pinning is useful for latency-sensitive workloads (avoids cache misses from core migration) but reduces scheduling flexibility. In practice, combine them:--cpuset-cpus="0,1" --cpus=2.) - “A container is set to
--memory=1gbutfree -minside the container shows 32GB (the host’s RAM). Why?” (/proc/meminfois not namespaced in cgroup v1 — it shows the host’s memory. This breaks applications that auto-tune based on available memory (JVM, Node.js, Python ML libraries). cgroup v2 withlxcfsor settingMALLOC_ARENA_MAXcan mitigate this. Modern JVMs (11+) read from/sys/fs/cgroup/memory.maxinstead.)
43. Docker Socket Security
43. Docker Socket Security
/var/run/docker.sock) is the API endpoint for the Docker daemon. Mounting it inside a container gives that container full control over Docker on the host — it can create, delete, and inspect any container, pull images, mount host filesystems, and effectively gain root access to the host machine.Why teams mount it: CI/CD agents (Jenkins, GitLab Runner), monitoring tools (Portainer, cAdvisor), and log collectors sometimes need Docker API access.The risk: A compromised container with socket access can run docker run -v /:/host --privileged alpine to mount the entire host filesystem with root access. This is a complete host compromise.Mitigations:- Use Docker-in-Docker (DinD) instead of socket mounting for CI/CD. DinD runs a separate Docker daemon inside the container.
- Use Podman or Kaniko for building images without a daemon.
- If you must mount the socket, use a TCP proxy (like
tecnativa/docker-socket-proxy) that restricts which API endpoints the container can access. - In Kubernetes, avoid mounting the socket entirely — use Kaniko for in-cluster image builds.
- “A monitoring tool requires Docker API access to list containers. The security team vetoes socket mounting. What alternatives exist?” (1. Use
tecnativa/docker-socket-proxy— a HAProxy-based proxy that exposes only safe read-only endpoints. 2. Use the Docker REST API over TLS instead of the socket. 3. Use cAdvisor or the Prometheus node exporter which read from cgroups and/procdirectly, no Docker socket needed. 4. In Kubernetes, use the Kubelet API or metrics-server instead.) - “Can you detect if a container has the Docker socket mounted?” (Yes.
docker inspect -f '{{range .Mounts}}{{.Source}}{{end}}' <id>shows all mounts. In Kubernetes, OPA Gatekeeper or Kyverno policies can block any pod mounting/var/run/docker.sock. At the host level, auditinotifywatches on the socket or use Falco for runtime detection.)
44. Privileged Mode
44. Privileged Mode
docker run --privileged disables virtually all security features: it grants the container all Linux capabilities, access to all host devices (/dev), and disables seccomp, AppArmor, and SELinux profiles. The container can do anything the host root can do, including loading kernel modules, modifying iptables rules, and mounting filesystems.When it is legitimately needed: Running Docker-in-Docker, running certain system monitoring tools, or managing host networking. These cases are rare.Better alternatives: Instead of --privileged, grant only the specific capabilities needed with --cap-add. For example, a container that needs to modify network settings only needs --cap-add=NET_ADMIN, not full privileged mode. This follows the principle of least privilege.Red flag in production: Any container running with --privileged in production is a security audit failure. Use Kubernetes PodSecurityStandards or OPA Gatekeeper to block privileged containers at the admission level.45. Content Trust (Notary)
45. Content Trust (Notary)
- “What is the difference between Docker Content Trust (Notary v1) and cosign? Which would you recommend for a new project?” (Cosign. Notary v1 requires running your own Notary server, has a complex key management model, and is tightly coupled to Docker. Cosign stores signatures as OCI artifacts in the same registry as the image, supports keyless signing via GitHub Actions OIDC (no long-lived keys to manage), and integrates with Kyverno/OPA for Kubernetes admission control. Notary v2 (now called Notation) is a newer standard but cosign has broader adoption.)
- “How does keyless signing with cosign work? Where is the private key?” (There is no persistent private key. Cosign uses the Sigstore transparency log (Rekor) and a short-lived certificate from Fulcio. The CI job authenticates via OIDC (e.g., GitHub Actions identity token), Fulcio issues an ephemeral signing certificate, the image is signed, and the signature is recorded in the Rekor transparency log. Verification checks the Rekor log entry and the OIDC identity. The key exists only for milliseconds.)
- “An attacker pushes a malicious image with the same tag to your registry. How does signing prevent this from reaching production?” (Your Kubernetes admission controller (Kyverno or OPA) verifies the cosign signature before allowing a pod to run. The attacker’s image is not signed by your CI pipeline’s OIDC identity, so the signature check fails and the pod is rejected. Without signing, tag-based pulls are vulnerable to tag overwriting attacks.)
46. Docker Compose
46. Docker Compose
docker run commands with complex flags, you describe your entire application stack in docker-compose.yml and bring it up with one command.docker-compose Python-based CLI has been replaced by docker compose (a Go plugin built into Docker CLI). Compose v2 is faster and supports profiles, service dependencies with health checks, and watch mode for development (docker compose watch).Production note: Compose is excellent for development, testing, and single-host deployments. For multi-host production orchestration, use Kubernetes. Some teams use Compose for local dev and generate Kubernetes manifests from the same service definitions using tools like Kompose.Follow-up chain:- “What is the difference between
depends_onand health-check-based ordering in Compose v2?” (depends_onwithout conditions only waits for the container to start, not for the service to be ready. Your database container might start in 100ms but take 5 seconds to accept connections. Compose v2 supportsdepends_on: { db: { condition: service_healthy } }which waits until the healthcheck passes. This eliminates the need for hackywait-for-it.shscripts.) - “When would you use Compose profiles?” (Profiles let you define optional services that only start when explicitly requested. Example: a
debugprofile that includes a pgAdmin container and a Redis Commander UI.docker compose --profile debug upstarts everything including debug tools. Without the flag, only the core services start. This keeps your default startup fast.) - “Your team uses Compose for local dev. How do you keep the Compose file and Kubernetes manifests in sync?” (Three approaches: 1. Use
kompose convertto generate K8s manifests from Compose files (good for simple cases, breaks on Compose-specific features). 2. Maintain both separately with CI checks that compare exposed ports, env vars, and volume mounts. 3. Use a shared values layer — Helm chart values and Compose.envfiles sourced from the same config. In practice, most teams maintain both separately because the environments have fundamentally different needs.)
docker-compose) with v2 (docker compose) syntax differences.What strong candidates say: “I use Compose as my local development contract — it defines the services, networks, and volumes my app needs. The same service topology is replicated in Kubernetes manifests for staging and production, but with different resource limits, health checks, and scaling policies.”47. Docker Swarm
47. Docker Swarm
docker swarm init) but lacked the extensibility, ecosystem, and community momentum that Kubernetes built. By 2020, most cloud providers had dropped or deprioritized Swarm support in favor of managed Kubernetes. Swarm is still maintained and included in Docker, but new projects overwhelmingly choose Kubernetes.When Swarm still makes sense: Very small teams (1-3 people) running fewer than 10 services who want orchestration without the operational complexity of Kubernetes. The learning curve is hours, not weeks.48. Podman vs Docker
48. Podman vs Docker
| Feature | Docker | Podman |
|---|---|---|
| Architecture | Client-server (requires a daemon running as root) | Daemonless (each container is a child process of Podman) |
| Root requirement | Daemon runs as root by default (rootless mode available since 20.10) | Rootless by default (no daemon = no root process) |
| OCI compliance | OCI-compatible | Fully OCI-compliant |
| CLI compatibility | N/A | Drop-in replacement (alias docker=podman works for most commands) |
| Systemd integration | Requires separate configuration | Generates systemd unit files natively (podman generate systemd) |
| Compose | Native support | Supports docker-compose.yml via podman-compose or podman compose |
49. Distroless Images
49. Distroless Images
gcr.io/distroless/) contain only your application and its runtime dependencies — no shell, no package manager, no ls, no curl, no bash. The image has the bare minimum to run your application.Available base images: distroless/static (for statically compiled binaries like Go/Rust), distroless/base (with glibc), distroless/java, distroless/nodejs, distroless/python3.Why use them:- Security: No shell means an attacker who gains code execution inside the container cannot easily pivot — no
wgetto download tools, nobashto run scripts. CVE scanners typically find 80-90% fewer vulnerabilities in distroless vs. Debian-based images. - Size:
distroless/staticis ~2 MB vs.alpineat ~5 MB vs.debianat ~120 MB.
docker exec -it ... /bin/sh. The workaround is Kubernetes debug containers (kubectl debug) or building a debug variant image for staging environments that includes a shell.50. Seccomp Profiles
50. Seccomp Profiles
reboot(), swapon(), mount(), kexec_load(), and ptrace().How it works: When a container process makes a blocked syscall, the kernel immediately terminates it with SIGKILL. The process never gets to execute the dangerous operation.Custom profiles: You can create stricter profiles for security-sensitive containers. For example, a stateless API server that only needs network I/O and file reads can be restricted to a very narrow set of syscalls. The Docker default profile is a good baseline, but production-grade security uses custom profiles generated by tools like strace (to capture which syscalls your application actually uses) or OCI runtime spec generators.securityContext.seccompProfile field. Since Kubernetes 1.27, the RuntimeDefault seccomp profile is applied by default when using the restricted Pod Security Standard.5. Docker Medium Level Questions
41. Docker Compose Services
41. Docker Compose Services
42. Docker Compose Networks
42. Docker Compose Networks
43. Environment Variables
43. Environment Variables
44. Health Checks
44. Health Checks
45. Build Args vs ENV
45. Build Args vs ENV
46. Docker Registry
46. Docker Registry
47. Docker Prune
47. Docker Prune
48. Container Logs
48. Container Logs
49. Docker Stats
49. Docker Stats
50. Docker Inspect
50. Docker Inspect
6. Docker Advanced Level Questions
51. Multi-Architecture Builds
51. Multi-Architecture Builds
52. BuildKit Deep Dive — Cache Mounts, Secrets, and SSH
52. BuildKit Deep Dive — Cache Mounts, Secrets, and SSH
docker history or any layer.SSH mounts — forward SSH agent for private git repo access during build:- “How do you enable BuildKit in CI if the runner has an older Docker version?” (Set
DOCKER_BUILDKIT=1environment variable before Docker 23.0. On 23.0+, it is the default. In GitHub Actions, thedocker/build-push-actionuses BuildKit by default.) - “What is the difference between
--mount=type=cacheand--cache-from?” (Cache mounts persist build-time dependencies (npm cache, apt cache) across builds locally.--cache-fromimports layer cache from a remote registry image, enabling cache sharing across CI runners. They solve different problems — use both together for maximum speed.) - “A developer adds
RUN --mount=type=secretbut forgets to pass--secretat build time. What happens?” (The build fails with an error about missing secret. This is the safe behavior — builds fail closed, not open. You can make secrets optional withRUN --mount=type=secret,id=token,required=false.) - “Your CI builds still take 8 minutes after enabling BuildKit. What are the likely remaining issues?” — Cache mounts not configured for npm/pip/go modules.
--cache-from/--cache-tonot pointing at a registry mirror. Sequential stages that could be parallel (missing independentFROMlines). LargeCOPY . .before dependency install. Fix each:docker buildx bakewith a BuildKit registry cache backend usually shaves 60-80% off CI build time.
- Senior: Uses BuildKit cache mounts, secret mounts, and
--cache-fromappropriately. Knows BuildKit is the default on modern Docker. - Staff: Designs the build cache topology for the org — regional registry mirrors for cache, CI-wide cache-key conventions (git SHA vs semver vs branch), build-time attestations (SBOM, provenance via
--provenance=mode=max), and a multi-arch build strategy (QEMU vs native ARM runners). Also enforces: no secrets in Dockerfile (pre-commit + PR checks), SBOM attached to every pushed image, and parallel builds withbuildx bakefor monorepos. Tracks build-time SLO as a team metric.
- Measure: identify the bottleneck. Usually: (a) no layer cache reuse across CI runs, (b) sequential builds where parallelism is possible, (c) repeated downloads of the same dependencies.
- Fix 1: BuildKit +
--cache-from type=registry,ref=ghcr.io/org/service:buildcache. CI pulls the previous layers before build. First-time cold: 12 min; warm: 90 seconds. - Fix 2: Cache mounts for package managers (
npm,pip,go mod). Saves 1-3 minutes per service. - Fix 3:
docker buildx bakewith adocker-bake.hclfile — parallelizes up to N builds across CI runners. - Fix 4: only build services whose code changed (
git diff --name-only-> map to service directories). Unchanged services skip entirely. - Expected outcome: per-service build time ~60s warm, and only rebuild affected services. CI wall-clock for typical PR: 2-3 minutes.
53. Image Signing — Content Trust, Cosign, and Notation
53. Image Signing — Content Trust, Cosign, and Notation
54. Rootless Containers and User Namespaces
54. Rootless Containers and User Namespaces
- Cannot bind to privileged ports (<1024) without
sysctl net.ipv4.ip_unprivileged_port_start=0 - Overlay2 storage driver requires kernel 5.11+ for rootless. Older kernels fall back to
fuse-overlayfs(slower). - Cannot use
--network host(requires CAP_NET_ADMIN which non-root does not have) - AppArmor and some cgroup features may not work in rootless mode
alias docker=podman and you get rootless containers without changing any workflows.What interviewers are really testing: Whether you understand that “running as non-root inside the container” (USER instruction) is different from “running the Docker daemon as non-root” (rootless Docker). Both are important; they address different threat vectors.Follow-up chain:- “You enable userns-remap and your volume mounts break. Files are owned by UID 100000 instead of UID 0. How do you fix it?” (The remapped UID does not match the host UID. Fix:
chownthe volume directory to the remapped UID range, or use named volumes instead of bind mounts. In Kubernetes,fsGroupin the security context handles this.) - “What is the difference between rootless Docker, rootless Podman, and rootless Kubernetes (usernetes)?” (Rootless Docker: daemon as non-root. Rootless Podman: no daemon at all, each container is a direct child process. Usernetes: Kubernetes components (kubelet, containerd) run as non-root. All aim to eliminate root processes from the container stack, but at different layers.)
55. Seccomp and AppArmor — Runtime Security Profiles
55. Seccomp and AppArmor — Runtime Security Profiles
reboot(), mount(), kexec_load(), ptrace(), etc.strace or seccomp-profiler to record which syscalls your app actually uses, then whitelist only those. A Node.js HTTP server needs ~60 syscalls. A Go static binary needs ~30. Everything else can be blocked.AppArmor (Mandatory Access Control):
AppArmor restricts which files, capabilities, and network operations a process can use. Docker applies the docker-default AppArmor profile automatically.docker run --security-opt label=type:my_container_t nginx. SELinux uses labels and type enforcement; AppArmor uses path-based rules. Same goal, different mechanism.Defense-in-depth stack for production containers:- Non-root user (
USER appuser) - Read-only root filesystem (
--read-only) - Drop all capabilities, add back only what is needed (
--cap-drop=ALL --cap-add=NET_BIND_SERVICE) - Custom seccomp profile (whitelist only needed syscalls)
- AppArmor/SELinux profile (restrict file and network access)
- No new privileges (
--security-opt no-new-privileges:true)
56. Resource Constraints
56. Resource Constraints
57. Docker Swarm Mode
57. Docker Swarm Mode
58. Docker Secrets
58. Docker Secrets
59. Distroless Images — Deep Dive
59. Distroless Images — Deep Dive
gcr.io/distroless/):| Image | Contents | Size | Use case |
|---|---|---|---|
distroless/static | CA certs, timezone data | ~2 MB | Statically compiled Go, Rust binaries |
distroless/base | glibc, libssl, CA certs | ~20 MB | Dynamically linked C/C++ binaries |
distroless/cc | libstdc++ on top of base | ~25 MB | C++ applications |
distroless/nodejs18 | Node.js runtime | ~120 MB | Node.js applications |
distroless/java17 | OpenJDK 17 runtime | ~220 MB | Java applications |
distroless/python3 | Python 3 runtime | ~50 MB | Python applications |
:nonroot tag matters: Distroless images come in :latest (runs as root) and :nonroot (runs as UID 65534) variants. Always use :nonroot unless your app specifically needs root.The debugging trade-off and how to work around it:cgr.dev/chainguard/ images with similar philosophy but better supply chain guarantees (signed with cosign, daily CVE updates, SBOM attached). Many teams are migrating from Google distroless to Chainguard for better security posture.CVE comparison (real numbers):node:18(Debian): ~280 known vulnerabilities (Trivy scan)node:18-alpine: ~15 known vulnerabilitiescgr.dev/chainguard/node:18: 0-3 known vulnerabilities
curl), runtime configuration (sed/envsubst), or debugging. The right answer is “distroless for production, debug variant for staging, and Alpine as a pragmatic middle ground.”- Senior: Uses distroless for Go/Rust services, understands the debug container pattern, picks
:nonrootvariants. - Staff: Mandates distroless (or Chainguard) as the org default via base image policy, builds a curated internal registry of approved base images with CVE SLAs, wires cosign signature verification into admission control, tracks SBOM diff between image versions for supply-chain auditing, and owns the migration plan from Debian-based base images to distroless fleet-wide. Also makes the build-vs-buy call on Chainguard vs maintaining internal distroless images.
- “Your app needs to run
curlfor health checks. Can you still use distroless?” — Replacecurl-based health checks with in-app HTTP endpoints (/health). Or use KuberneteshttpGetprobes which call from outside the container. Distroless forces you to do this correctly; “I need curl inside the container” is usually a sign of a fragile health check pattern. - “How do you scan distroless images for vulnerabilities when they have no package manager?” — Trivy and Grype scan based on the SBOM and binary manifests, not the package manager. They still find CVEs in Go/Rust binaries (by analyzing the Go module versions compiled in) and in any OS-level libraries. Chainguard images ship SBOMs via cosign attestations — even better.
- “What breaks when you go from Alpine to distroless for a Node.js service?” — (a) Entrypoints that rely on shell features (
$VARexpansion) — rewrite to use Node-native arg parsing. (b)npmnot available at runtime — ensurenpm installhappens in the build stage. (c) Native modules may needdistroless/nodejs18(glibc) not-alpinevariants because glibc vs musl differ.
- Phase 1: Pick the base image library. Chainguard for paid, distroless + hardened internal variants for free.
- Phase 2: Build CI template (Docker, Buildx, cosign sign) that any team can adopt. Provide golden Dockerfile examples per language.
- Phase 3: Admission control — Kyverno or Gatekeeper policies in Kubernetes that reject unsigned images or images with critical CVEs (via Trivy operator scan results).
- Phase 4: Migration — start with non-prod clusters, allow images to remain Debian with warnings for 30 days, then fail CI builds that don’t use the approved bases.
- Phase 5: SLO — “time to patch a critical CVE in all production images” as a team KPI. Use Trivy daily scans feeding into a ticketing pipeline.
60. Docker Socket Security
60. Docker Socket Security
61. OCI Specification — The Standard Behind Containers
61. OCI Specification — The Standard Behind Containers
- OCI Image Spec: Defines the format for container images — the manifest (JSON metadata), config (runtime settings), and layer blobs (filesystem diffs). This is why a Docker-built image works in Podman, containerd, CRI-O, or any OCI-compliant runtime.
- OCI Runtime Spec: Defines how to configure and run a container from an image. It specifies the JSON config file format that runc (or any OCI runtime) reads: root filesystem path, environment variables, namespaces, cgroups, mounts, capabilities. This is the
config.jsonyou see when yourunc spec. - OCI Distribution Spec: Defines the HTTP API for pushing and pulling images to/from registries. This is why Docker images work in ECR, GCR, ACR, GHCR, Harbor, and any OCI-compliant registry.
- Docker images work in Kubernetes because both follow OCI Image Spec. Kubernetes does not need Docker — it needs an OCI-compliant runtime (containerd, CRI-O).
- You can switch runtimes without changing images. Run the same image with runc (default), gVisor (sandboxed), or Kata Containers (micro-VM).
- Registry portability: Push to Docker Hub, pull from a Harbor mirror. The wire protocol is the same.
- OCI Artifacts: The Distribution Spec now supports storing arbitrary artifacts (Helm charts, Wasm modules, SBOMs, cosign signatures) alongside images.
oras pushlets you push anything to an OCI registry.
- “If OCI standardizes images, why do some images work with Docker but not Podman, or vice versa?” (They almost always work. The rare exceptions involve Docker-specific extensions not in the OCI spec, like Docker Compose labels or Docker’s legacy v1 manifest format. OCI v2 manifests are universal.)
- “What is an OCI manifest list (index), and how does it enable multi-architecture images?” (A manifest list is a JSON document that points to multiple platform-specific manifests under a single tag. When a client pulls, it sends its platform in the
Acceptheader. The registry returns the matching manifest. This is howdocker pull nginxgets arm64 on Graviton and amd64 on Intel — same tag, different layers.) - “How do OCI Artifacts differ from container images, and what would you store as an artifact?” (OCI Artifacts use the same registry API but with different media types. Store: Helm charts, Wasm modules, SBOMs, cosign signatures, policy bundles. Benefit: single registry for all your supply chain artifacts, with the same auth, replication, and scanning infrastructure.)
62. Init Systems in Containers — Beyond Tini
62. Init Systems in Containers — Beyond Tini
- PID 1 does not receive default signal disposition. If your app does not explicitly handle SIGTERM, the signal is silently ignored (unlike any other PID, which would be terminated).
- PID 1 is responsible for reaping orphaned child processes. If it does not call
wait(), zombies accumulate. - PID 1 receives SIGCHLD for every orphan, regardless of who spawned the child.
| Solution | Size | Zombie reaping | Signal forwarding | Process supervision | Use case |
|---|---|---|---|---|---|
tini | 30 KB | Yes | Yes | No | 90% of containers (recommended default) |
dumb-init (Yelp) | 50 KB | Yes | Yes (rewrites signals) | No | Similar to tini, slightly different signal behavior |
s6-overlay | ~2 MB | Yes | Yes | Yes (full supervisor) | Multi-process containers (app + cron + sidecar) |
supervisord | ~50 MB (Python) | Yes | Limited | Yes | Legacy, not recommended for new containers |
--init flag: Injects tini at runtime without modifying the Dockerfile. Kubernetes equivalent: shareProcessNamespace: true with a sidecar that reaps zombies, or simply handle SIGTERM in your application code.What interviewers are really testing: Whether you have debugged slow container shutdowns or zombie process accumulation in production. This is a practical problem, not theoretical — it manifests as Kubernetes pods stuck in “Terminating” for 30 seconds on every deploy.Follow-up chain:- “Your container needs to run both nginx and a background worker process. Is a multi-process container the right approach, or should you use separate containers?” (Separate containers is the Docker/Kubernetes best practice — one process per container. But sometimes a tightly-coupled pair (like envoy sidecar + app) benefits from sharing a container for performance. If you must go multi-process, use s6-overlay, never bare
supervisordor shell scripts with&.) - “What is the difference between
tini’s signal forwarding anddumb-init’s signal rewriting?” (Tini forwards signals directly to PID 2 (your app). Dumb-init rewrites SIGTERM to the child process group, which ensures all descendants receive the signal even if your app does not propagate it. For process groups (forking servers), dumb-init’s behavior is more correct.)
Advanced Scenario-Based Questions
Scenario 1: Image Layer Bloat — The 2GB Node.js Image
Scenario 1: Image Layer Bloat — The 2GB Node.js Image
RUN instructions, installs build-essential for native modules, copies the entire repo, and nobody has touched it since the original author left. You are asked to fix it this sprint. Walk me through your approach.What weak candidates say:- “Just switch to Alpine.” (Breaks native modules like
bcryptandsharpthat depend on glibc.) - “Delete the node_modules folder.” (Shows no understanding of layers vs runtime filesystem.)
- Cannot explain why layers accumulate size or how
docker historyworks.
- Step 1 — Diagnose before cutting. Run
docker history --no-trunc myapp:latestto see per-layer sizes. In my experience, 80% of the bloat comes from 2-3 layers: theapt-get install build-essentiallayer (400MB+), aCOPY . .that drags in.git(sometimes 500MB+), and leftovernpm cacheinside aRUN npm installlayer. - Step 2 — Add
.dockerignoreimmediately. Excluding.git,node_modules,dist,*.log, and test fixtures often shaves 30-50% of build context size. I once cut build context from 1.8GB to 90MB just with this file. - Step 3 — Multi-stage build. Stage 1 (
node:18) installs build-essential, runsnpm ci(notnpm install— deterministic, respects lockfile), and compiles native addons. Stage 2 (node:18-slimornode:18-alpineif musl-compatible) copies onlynode_modulesanddistfrom the builder. This eliminates gcc, make, python, and all build artifacts from the final image. - Step 4 — Collapse and clean in the same layer.
RUN apt-get update && apt-get install -y build-essential && npm ci && apt-get purge -y build-essential && rm -rf /var/lib/apt/lists/*— if you split these across layers, the deleted files still exist in earlier layers due to Union FS semantics. - Step 5 — Use BuildKit cache mounts.
RUN --mount=type=cache,target=/root/.npm npm ciavoids re-downloading the npm cache on every build while keeping it out of the final layer entirely. - Metrics from real cleanup: Took a 2.1GB image down to 220MB. Deploy time went from 14 minutes to 2.5 minutes. ECR storage costs dropped by ~$180/month across 40 image tags.
- After switching to Alpine,
bcryptthrowsError: /lib/x86_64-linux-gnu/libc.so.6: version GLIBC_2.28 not found. What happened and how do you fix it without abandoning Alpine? - You have a monorepo with 12 services sharing a root
package.json. How do you structure the Dockerfile to avoid invalidating layer cache for Service A when Service B’s code changes? - Your CI is rebuilding the entire image from scratch on every push despite no dependency changes. What is wrong with your layer caching strategy?
Scenario 2: Multi-Stage Build Failures — The Missing Binary
Scenario 2: Multi-Stage Build Failures — The Missing Binary
ubuntu-latest), the multi-stage build succeeds but the final container crashes on startup with exec format error or not found (for a statically linked binary that clearly exists in the image). The Dockerfile looks correct. What is happening?What weak candidates say:- “The binary must not have been copied correctly.” (Does not investigate architecture or linking.)
- “Try rebuilding with
--no-cache.” (Cargo cult debugging.) - Cannot explain the difference between static and dynamic linking in the context of containers.
exec format erroris almost always an architecture mismatch. Building on an M1 Mac produceslinux/arm64binaries. If CI runs onlinux/amd64or the final image’s platform is amd64, you get this error. The fix: explicitly set--platform=linux/amd64in theFROMline of your builder stage, or usedocker buildx build --platform linux/amd64.not foundon a binary that exists is a dynamic linking issue. The Go binary was compiled withCGO_ENABLED=1(the default when cgo dependencies likenetoros/userare imported). It linked against glibc in the builder stage (golang:1.20is Debian-based). The final stage usesalpine(musl libc) orscratch(no libc). The dynamic linker (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2) does not exist.- Fix for the linking issue:
The
-ldflags="-s -w"strips debug info and DWARF symbols, reducing binary size by 30-40%. - If you genuinely need CGO (e.g., SQLite via
mattn/go-sqlite3), your final stage must have a compatible libc. UsealpinewithRUN apk add --no-cache libc6-compat, or usegcr.io/distroless/basewhich includes glibc. - War story: We had a service that built fine for 8 months. Then a developer added
import "os/user"which silently enabled CGO. Builds still passed in CI but the container crashed in staging at 2 AM. We addedCGO_ENABLED=0as a mandatory linter check in CI after that.
- Your Go binary needs to make HTTPS calls but runs in a
scratchcontainer. It fails withx509: certificate signed by unknown authority. Why, and what is the minimal fix? - How would you set up a CI pipeline that builds and pushes both
linux/amd64andlinux/arm64images from a single Dockerfile, and what does the manifest list look like in the registry?
Scenario 3: Container Networking Mysteries — Containers Cannot Talk
Scenario 3: Container Networking Mysteries — Containers Cannot Talk
docker-compose.yml: api, worker, and redis. The api container tries to connect to redis:6379 and gets Connection refused. You have verified Redis is running inside its container. docker ps shows all three containers are up. What do you check?What weak candidates say:- “Expose port 6379 to the host with
-p 6379:6379.” (Misunderstands container-to-container networking entirely — you do not need host port mapping for inter-container communication on the same Docker network.) - “Use the container’s IP address instead of hostname.” (Fragile, misses the point of Docker DNS.)
- Cannot explain how Docker DNS resolution works.
- Step 1 — Verify they are on the same network.
docker network inspect <network_name>and confirm all three containers appear in theContainerssection. Compose creates a default network named<project>_default, but if someone defined custom networks and forgot to attach a service, that container is isolated. - Step 2 — Check if Redis is binding to
127.0.0.1vs0.0.0.0. This is the number one cause of “Connection refused” between containers. Redis 7+ defaults tobind 127.0.0.1 -::1withprotected-mode yes. Inside the Redis container, it only accepts connections from its own loopback. Fix: setbind 0.0.0.0inredis.confor pass--bind 0.0.0.0as a command argument. This is the exact same problem as Q35 in this doc but people forget it applies to every service, not just web servers. - Step 3 — DNS resolution. Exec into the
apicontainer:docker exec -it api sh -c "getent hosts redis". If it does not resolve, the containers may be on the default bridge network (which does not support DNS — only user-defined bridges do). Compose normally creates a user-defined bridge, but if someone usednetwork_mode: bridgeexplicitly, they bypassed this. - Step 4 — Check
depends_onrace condition.depends_ononly waits for the container to start, not for the service to be ready. Redis might not be accepting connections yet whenapitries to connect. Fix: usedepends_onwithcondition: service_healthyand define a healthcheck for Redis:test: ["CMD", "redis-cli", "ping"]. - Step 5 — Firewall / iptables. On production Linux hosts,
iptablesrules or firewalld can silently block Docker bridge traffic. Runiptables -L -nand check for DROP rules on thedocker0orbr-*interfaces. - Debugging toolkit:
docker exec api ping redis,docker exec api nc -zv redis 6379,docker network inspect,docker logs redis.
- You now need
apito communicate with a container running in a different Compose project on the same host. How do you set this up without host networking? - Your containers can talk to each other by IP but not by hostname. What specific Docker network type causes this, and why?
- In production, you switch from Compose to Kubernetes. How does service discovery change, and what breaks if you hardcode container hostnames?
Scenario 4: OOMKill Debugging — The Memory Leak at 3 AM
Scenario 4: OOMKill Debugging — The Memory Leak at 3 AM
docker inspect shows "OOMKilled": true and exit code 137 on every restart. The application was running fine for weeks. No recent code deployments. Diagnose and fix.What weak candidates say:- “Just increase the memory limit to 4GB.” (Treats the symptom, not the cause. The leak will consume 4GB too, just slower.)
- “Java has garbage collection so it cannot have memory leaks.” (Fundamentally wrong — GC handles heap, but off-heap memory, thread stacks, metaspace, and native allocations can all leak.)
- Cannot distinguish between container memory limit and JVM heap.
- Understand the memory stack. Container memory limit (2GB via
--memoryor ECS task definition) caps the total RSS of the process, which for Java includes: JVM Heap (-Xmx), Metaspace, Thread Stacks (1MB per thread by default), Code Cache, Direct ByteBuffers (NIO), Native memory (JNI, gzip, TLS), and the OS overhead. A common mistake: setting-Xmx2gin a 2GB container — the JVM needs 2g for heap plus 300-500MB for everything else, guaranteeing OOMKill. - Step 1 — Check if it is a JVM heap issue. Look at the container’s memory usage over time with
docker statsor CloudWatch/Prometheus metrics. If memory grows linearly, it is likely a leak. If it spikes suddenly, it could be a burst of traffic creating threads or loading data. - Step 2 — Capture a heap dump before it dies. Add
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap.hprofto JVM args. Mount/tmpto a volume so the dump survives the container restart. Analyze with Eclipse MAT orjhat. - Step 3 — Check for common non-heap culprits. Native Memory Tracking: add
-XX:NativeMemoryTracking=summaryand query withjcmd <pid> VM.native_memory summary. I once found a service leaking 50MB/hour in Direct ByteBuffers because a library was allocating NIO buffers in a loop without closing them. - Step 4 — Right-size JVM for the container. Modern JVMs (11+) respect container limits with
-XX:+UseContainerSupport(on by default). Set-XX:MaxRAMPercentage=75.0instead of a fixed-Xmx. This gives the JVM 75% of the container limit (1.5GB) and leaves 500MB for non-heap. - Step 5 — Investigate “no recent deployment” claim. Check if a dependency was auto-updated (Dependabot, Renovate), if a feature flag changed (enabling a new code path), or if data volume increased (more users, bigger payloads, cache not evicting).
- War story: A Spring Boot app OOMKilled every 3 days. Heap was fine. Turned out
logbackwas configured with anAsyncAppenderthat created a new thread per log destination, and someone added a dynamic logger that created a new appender per tenant. 2000 tenants = 2000 threads = 2GB in thread stacks alone.
- Your Node.js container (not Java) is OOMKilled with exit code 137, but
process.memoryUsage()shows only 200MB heap. Where is the missing memory? - How does the Linux kernel’s OOM killer decide which process to kill when the cgroup limit is hit? Can you influence this?
- You set
--memory=2g --memory-swap=2g. What does this mean, and how does it differ from--memory=2g --memory-swap=4g?
Scenario 5: Docker-in-Docker Pitfalls — CI Pipeline Madness
Scenario 5: Docker-in-Docker Pitfalls — CI Pipeline Madness
/var/run/docker.sock into the CI container. It works, but your security team flagged it. Another engineer suggests using docker:dind (Docker-in-Docker). You need to advise the team on the correct approach.What weak candidates say:- “Just mount the socket, it is fine for CI.” (Ignores the security implications entirely.)
- “Use DinD, it is designed for this.” (Does not understand the operational complexity of DinD.)
- Cannot articulate the difference between socket mounting and true DinD.
- Socket mounting (
/var/run/docker.sock) — fast but dangerous.- The CI container gets full root access to the host’s Docker daemon. It can
docker run --privilegedto escape to the host,docker rm -fany container, ordocker execinto production containers sharing that host. - Builds share the host’s layer cache (fast), but also share the host’s image namespace (CI builds can accidentally overwrite production images by tag).
- Build artifacts (layers, volumes) accumulate on the host and are never cleaned up by CI.
- Mitigation if you must use it: Run the CI container with a read-only socket (
ro), use--userns-remapto limit root inside the container, and scope daemon access with authorization plugins.
- The CI container gets full root access to the host’s Docker daemon. It can
- True DinD (
docker:dind) — isolated but operationally painful.- Runs a full Docker daemon inside a container. Requires
--privileged(which the security team will also flag). - Layer cache is inside the DinD container. When the CI job finishes and the container dies, cache is gone. Every build starts cold. This makes builds 3-5x slower.
- Storage driver conflicts: the inner Docker’s overlay2 running on top of the outer Docker’s overlay2 can cause data corruption. You must use
--storage-driver=vfswhich is extremely slow but safe. - Mitigation: Use a persistent volume for
/var/lib/dockerinside DinD to retain cache across jobs.
- Runs a full Docker daemon inside a container. Requires
- The modern answer: rootless build tools.
- Kaniko (Google): Builds images in userspace, no Docker daemon needed, runs as unprivileged container. Perfect for Kubernetes-based CI. Downside: no support for
RUN --mountor all BuildKit features. - Buildah: Daemonless, rootless image builder. OCI-compliant. Works well in Podman-based CI.
- BuildKit (
docker buildx) with remote builder: Run a BuildKit daemon as a separate service, connect to it over TCP/TLS. CI container does not need Docker at all — it only needs thebuildctlclient.
- Kaniko (Google): Builds images in userspace, no Docker daemon needed, runs as unprivileged container. Perfect for Kubernetes-based CI. Downside: no support for
- War story: A team using socket mounting in their GitLab runner had a rogue CI job (from a fork PR) that ran
docker run -it --pid=host --privileged ubuntuand gained root on the build server. They moved to Kaniko within a week.
- Your Kaniko-based CI build is 4x slower than the old socket-mounted approach. How do you optimize Kaniko build caching?
- Explain the security implications of
--privilegedin the context of Linux capabilities. What specific capabilities does it grant that make it dangerous? - How would you design a CI/CD pipeline that can build Docker images inside Kubernetes pods without any privileged access?
Scenario 6: Registry Security — Pulling Malicious Images
Scenario 6: Registry Security — Pulling Malicious Images
FROM python:3.11) in production Dockerfiles. The security team demands you lock this down within two weeks. What is your plan?What weak candidates say:- “Just tell developers to use official images, they are safe.” (Official images have had CVEs. “Official” does not mean “vulnerability-free”.)
- “Enable Docker Content Trust.” (Partial solution — only verifies image was signed, not that it is free of vulnerabilities.)
- No awareness of supply chain attacks or image provenance.
- The threat model is real. In 2023-2024, multiple attacks targeted Docker Hub: typosquatting (e.g.,
pythonninstead ofpython), compromised maintainer accounts uploading backdoored images, and cryptominer payloads in popular images. One study found 51% of Docker Hub images had critical CVEs. - Step 1 — Stand up a private registry (or use a managed one). Harbor (open-source, supports vulnerability scanning, RBAC, replication), AWS ECR, GCP Artifact Registry, or Azure ACR. Configure it as a pull-through cache for Docker Hub so developers still get upstream images but through your gateway.
- Step 2 — Image scanning in the registry. Integrate Trivy, Grype, or the registry’s built-in scanner (Harbor has built-in Trivy). Set a policy: images with Critical or High CVEs cannot be pulled/deployed. In Harbor, this is a “Prevent vulnerable images from running” policy at the project level.
- Step 3 — Pin image digests, not tags. Tags are mutable —
python:3.11can point to a different image tomorrow. Pin to digest:Usedocker pull python:3.11thendocker inspect --format='{{.RepoDigests}}' python:3.11to get the digest. Automate digest updates with Dependabot or Renovate. - Step 4 — Enforce via admission control. In Kubernetes: OPA Gatekeeper or Kyverno policy that rejects any pod with an image not from your approved registry. In Docker directly: use the
--registry-mirrordaemon config and block Docker Hub at the network level (firewall/proxy). - Step 5 — Sign your own images. Use
cosign(from Sigstore) to sign images after CI builds them. Verify signatures in your admission controller. This creates a full chain of trust: you built it, you scanned it, you signed it. - Step 6 — SBOM (Software Bill of Materials). Generate SBOMs with
syftordocker sbomfor every image. Attach them to images via OCI artifacts. When a new CVE drops (like Log4Shell), you can query your SBOM database to find every affected image in minutes instead of days. - Metrics: After implementing this at a previous company (200 engineers, ~80 microservices), we went from 340 critical CVEs across production images to 12 within 6 weeks.
- A developer argues that pinning digests makes it impossible to get security patches automatically. How do you balance pinning with staying up to date?
- Your pull-through cache goes down. All CI builds fail because they cannot pull base images. How do you design for this failure mode?
- Explain how a tag-based image substitution attack works and how digest pinning prevents it.
Scenario 7: PID 1 and Init Process — Zombie Apocalypse
Scenario 7: PID 1 and Init Process — Zombie Apocalypse
docker top reveals hundreds of defunct (zombie) processes. The application itself is still responding but getting slower. Explain what is happening and how to fix it.What weak candidates say:- “Restart the container.” (Fixes the symptom temporarily. Zombies will come back.)
- “Increase the PID limit.” (Delays the crash, does not fix the cause.)
- Cannot explain what a zombie process is or why PID 1 matters in containers.
- What is happening: In a container, the ENTRYPOINT process becomes PID 1. In a normal Linux system, PID 1 is
init/systemd, which has a special responsibility: reaping orphaned child processes. When a child process exits, it becomes a zombie (it has exited but its entry remains in the process table) until its parent callswait()on it. If the parent dies before callingwait(), the orphan is re-parented to PID 1, which must reap it. Gunicorn spawns worker processes. Those workers may spawn subprocesses (e.g., viasubprocess.run(), health check scripts, shell commands). If a worker dies or those subprocesses are orphaned, they get re-parented to PID 1 (Gunicorn master). But Gunicorn is not designed to be an init system — it only reaps its own workers, not arbitrary orphans. - Why 847 processes: The Flask app likely has a code path that spawns subprocesses (maybe calling an external tool, running a shell command, or forking for background tasks). Those subprocesses finish but are never reaped because Gunicorn does not call
wait()for processes it did not create. Each zombie consumes a PID and a small amount of kernel memory. Eventually you hit the PID limit (--pids-limitor kernel default of 32768) and the container cannot spawn new processes at all. - Fix 1 — Use
tinias the init process.tiniis a tiny (~30KB) init process that does exactly two things: forward signals to child processes and reap zombies. It callswaitpid(-1, ...)in a loop. - Fix 2 — Fix the root cause. Find the code spawning subprocesses and ensure they are properly waited on. In Python:
subprocess.run()automatically waits, butsubprocess.Popen()without.wait()or.communicate()will create zombies. Check foros.fork()withoutos.waitpid(). - Fix 3 — Signal handling. Without a proper init, SIGTERM sent to the container (
docker stop) goes to PID 1. If PID 1 does not have a SIGTERM handler (common in Python scripts), the signal is ignored (PID 1 is special — it does not get default signal handling). Docker waits 10 seconds, then sends SIGKILL. This means your app never gets a graceful shutdown.tinifixes this by forwarding SIGTERM to the child process group. - War story: A data pipeline container ran Python scripts that shelled out to
ffmpegfor video processing. EachPopen()call without.wait()left a zombie. After 3 days, the container hit its PID limit (4096), newffmpegcalls failed withOSError: [Errno 11] Resource temporarily unavailable, and the pipeline silently dropped videos. Took 2 hours to diagnose, 5 minutes to fix (addedtini+ fixed the missing.wait()calls).
- Your application needs to handle SIGTERM for graceful shutdown (drain connections, finish in-flight requests). With
tinias PID 1, how does the signal reach your application? What is signal forwarding vs signal rewriting? - Why does PID 1 not get default signal handling in Linux? What kernel behavior makes this different from any other PID?
- You are using a
scratchbase image (no package manager). How do you addtiniwithout apt-get?
Scenario 8: Buildx Cross-Platform — The ARM64 Production Migration
Scenario 8: Buildx Cross-Platform — The ARM64 Production Migration
linux/amd64 and linux/arm64, keep CI build times under 15 minutes, and ensure developers on both Intel Macs and M1/M2 Macs can build locally. Lay out your strategy.What weak candidates say:- “Just use
docker buildx build --platform linux/amd64,linux/arm64.” (Technically correct but ignores the 15 real-world problems that come with it.) - “QEMU handles everything.” (Does not understand the 10-20x performance penalty of emulation.)
- No awareness of native compilation vs emulation trade-offs.
- Understand the build strategies:
- QEMU emulation:
docker buildxuses QEMU to emulate the target architecture. Simple to set up (docker run --privileged --rm tonistiigi/binfmt --install all). ButRUNsteps for the non-native architecture are 10-20x slower. A Go compile that takes 30 seconds natively takes 5-8 minutes under QEMU. For 30 services, this blows past your 15-minute CI budget. - Cross-compilation (preferred for compiled languages): Build the binary for the target platform on the native platform. Go supports this natively:
GOOS=linux GOARCH=arm64 go build. Rust usescross. This avoids QEMU entirely for the expensive compile step. Only the finalFROMstage needs the target platform. - Native remote builders: Set up ARM64 build nodes (Graviton spot instances at ~$0.02/hr) and register them as
buildxremote builders.docker buildx create --name multiarch --driver docker-container --platform linux/arm64 ssh://build@arm-builder. BuildKit will dispatch the arm64 build to the native node and amd64 to the local CI runner. Both run at full native speed.
- QEMU emulation:
- Dockerfile pattern for cross-compilation (Go example):
Key detail:
FROM --platform=$BUILDPLATFORMmakes the builder stage run on the CI runner’s native architecture, whileTARGETOS/TARGETARCHcross-compile for the target. No QEMU needed for the compile step. - For interpreted languages (Node.js, Python):
Cross-compilation does not apply. The code is the same, but native dependencies (e.g.,
sharp,bcrypt,grpc) have platform-specific binaries. Strategy: usenpm ci --platform=linux --arch=arm64in the build stage, or rely on QEMU for theRUN npm cistep (slower but correct). Alternatively, use multi-stage with--platform=$TARGETPLATFORMon the install stage and cache aggressively with BuildKit cache mounts. - Registry and manifest list:
docker buildx build --platform linux/amd64,linux/arm64 -t registry/app:v1 --push .creates a manifest list (also called a fat manifest). When a Graviton node pullsregistry/app:v1, the registry returns the arm64 layer hashes. When an x86 node pulls the same tag, it gets amd64 layers. The client and registry negotiate this via theAcceptheader andmediaTypein the manifest. - CI optimization for 30 services:
- Use BuildKit remote cache:
--cache-from type=registry,ref=registry/app:cache --cache-to type=registry,ref=registry/app:cache,mode=max. Shares layer cache across CI runs. - Only rebuild services whose code changed (monorepo path filtering in CI).
- Parallelize: build all 30 services concurrently across multiple CI runners.
- For the migration, build amd64+arm64 simultaneously so you can canary-deploy arm64 pods alongside amd64 pods and compare behavior.
- Use BuildKit remote cache:
- Metrics from a real migration: 30 services, all dual-arch, CI builds averaging 8 minutes (down from 22 minutes with naive QEMU approach). Graviton3 instances gave 38% cost reduction and 15% latency improvement for compute-heavy services.
- One of your 30 services depends on a C library (
librdkafkafor Kafka) that does not provide pre-built ARM64 binaries. How do you handle this in your multi-arch build? - A developer on an M1 Mac runs
docker buildlocally and gets an amd64 image. Why, and how do you configure their environment sodocker buildproduces the correct architecture by default? - You push a multi-arch image and notice the arm64 variant is 20% larger than the amd64 variant of the same code. What could cause this and does it matter?
Work-Sample Patterns
These are open-ended prompts designed to test real-world problem-solving. Give the candidate 5-10 minutes to think through each one.Work Sample 1: Your Docker image is 3GB — Walk through how you would reduce it
Work Sample 1: Your Docker image is 3GB — Walk through how you would reduce it
- Jump straight to “use Alpine” without analyzing the Dockerfile first
- Miss the
.gitdirectory being copied in viaCOPY . /app - Do not mention multi-stage builds
- Do not notice the shell-form CMD or the multi-process anti-pattern
- Diagnose first: “I would run
docker historyto see which layers are biggest. But just reading this, I can already see several problems.” - Identify the big wins in order of impact:
COPY . /appcopies everything including.git,node_modules, test files. Add.dockerignore.ubuntu:22.04as base is ~77MB, but afterapt-get install, this will be 500MB+. Switch tonode:18-slimor multi-stage.- Every
RUNis a separate layer. Theapt-get updateandapt-get installare split — intermediate state persists. build-essential,git,curl,wgetare build-time tools that should not be in the production image.yarn installincludes dev dependencies unless--productionis specified.
- Propose a multi-stage rewrite:
- Stage 1:
node:18with build-essential for native modules,yarn install,yarn build - Stage 2:
node:18-alpinewith only productionnode_modulesanddist/ - Separate nginx into its own container (one process per container principle)
- Stage 1:
- Estimate the result: “I would expect this to go from 3GB to ~200-250MB, maybe lower with aggressive pruning.”
yarn install, proposes a shared base image for the org’s Node.js services, and suggests CI-side image size checks (fail the build if image exceeds 500MB).Work Sample 2: A container is exiting with code 137 in production — What do you check?
Work Sample 2: A container is exiting with code 137 in production — What do you check?
- “Increase the memory and go back to sleep.” (Band-aid, leak will consume the new limit too.)
- Do not know what exit code 137 means.
- Cannot distinguish between OOMKill from Docker/cgroup vs the Linux kernel’s OOM killer.
- Decode the exit code: “137 = 128 + 9 = SIGKILL. The process was killed with signal 9. This is almost always an OOM kill, but could also be
docker killor Kubernetes eviction.” - Confirm OOMKill:
docker inspect -f '{{.State.OOMKilled}}' <id>— if true, it is a memory limit issue. On ECS, check CloudWatch forMemoryUtilizationspikes. - Investigate why memory spiked:
- Was there a traffic spike? Check request rate metrics.
- Is it a gradual leak? Memory usage graph over the last 24 hours will show a sawtooth pattern (restart-recover-leak-kill).
- Did a dependency change? Check if Dependabot or Renovate merged something.
- Language-specific: JVM heap vs off-heap, Node.js
--max-old-space-size, Python process count.
- Immediate mitigation: Increase memory limit temporarily while investigating. Set up CloudWatch alarm on
MemoryUtilization > 85%to catch it before OOMKill. - Root cause: “In my experience, the top 3 causes are: unbounded caches (in-memory LRU without eviction), loading entire datasets into memory (should stream), and connection pool leaks (each connection holds buffers).”
Work Sample 3: Design the container image strategy for a 50-service organization
Work Sample 3: Design the container image strategy for a 50-service organization
latest, some use commit SHAs. Three different base images. No scanning. Design the container image strategy.”What weak candidates do:- Propose a single standard without considering team diversity (Java team, Python team, Go team have different needs)
- Focus only on Dockerfiles, ignore registry, scanning, signing, and governance
- No migration plan — just “everyone should switch”
- Approved base images: Create 4-5 “golden” base images maintained by the platform team:
company/node:18,company/python:3.11,company/java:17,company/go-builder:1.21,company/static:latest(distroless). Auto-rebuild weekly with latest security patches. Scan with Trivy, sign with cosign. - Registry architecture: Harbor or ECR with pull-through cache for Docker Hub. RBAC per team. Vulnerability scanning on push. Policy: block images with Critical CVEs from being pulled in production.
- Tagging standard:
<service>:<semver>-<git-sha>. Nolatestin production (enforce via admission controller). Pin base images by digest in Dockerfiles. - Image scanning pipeline: Scan on push (gate deployments) + nightly full-catalog scan (catch newly discovered CVEs in already-deployed images). Alert owners via Slack when their running image has a new Critical CVE.
- Build standardization: Shared CI templates (GitHub Actions reusable workflow or GitLab CI includes) that enforce multi-stage builds, non-root user, health checks, and SBOM generation.
- Migration plan: Do not mandate everything at once. Phase 1: approved base images + scanning (2 weeks). Phase 2: tagging standard + cosign signing (4 weeks). Phase 3: admission controller enforcement (8 weeks). Support teams with office hours and migration PRs.
- Metrics: Track median image size, CVE count per image, build time p95, percentage of services on approved bases. Report monthly.
Work Sample 4: Debug a Docker networking mystery in 5 minutes
Work Sample 4: Debug a Docker networking mystery in 5 minutes
docker-compose.yml. The api container logs say ‘Connection refused’ when connecting to redis:6379. Redis is running and healthy. You have 5 minutes to find the issue.”api is on frontend network only. redis is on backend network only. They cannot communicate because they are on different Docker networks. DNS resolution for redis fails inside the api container.What weak candidates do:- “Expose port 6379 with
-p” (host port mapping is irrelevant for container-to-container communication) - Do not read the network configuration carefully
- Suggest hardcoding IP addresses
- Read the compose file carefully and immediately spot: “api is on
frontend, redis is onbackend— they are isolated.” - Fix: Either add
apito thebackendnetwork, or addredisto thefrontendnetwork, or create a shared network. - Verify with:
docker exec api getent hosts redis— should resolve once the network is fixed. - Bonus: Note that
workercan already reach bothapiandredisbecause it is on both networks. This is correct for a worker that processes jobs from Redis and calls back to the API.
Candidate Comparison Patterns
Weak vs Strong: Image Optimization
Weak vs Strong: Image Optimization
.dockerignore — it is free and often cuts 30%+ from the build context. Second, multi-stage builds to separate build tools from runtime. Third, minimal base images — but Alpine is not always the answer. If your app has native dependencies that link against glibc, Alpine’s musl libc will break things silently. I have seen bcrypt and sharp fail on Alpine. For those cases, slim variants or distroless are better choices. Fourth, BuildKit cache mounts to keep dependency caches out of layers entirely. I always measure with docker history and dive before and after.”Weak vs Strong: Container Security
Weak vs Strong: Container Security
Weak vs Strong: Docker Networking
Weak vs Strong: Docker Networking
docker network inspect), is the service binding to 0.0.0.0 not 127.0.0.1, and is DNS resolving (getent hosts <name> from inside the container). The number one issue I have seen in production is services binding to localhost inside the container — Redis, PostgreSQL, and many frameworks default to this.”Weak vs Strong: Multi-Stage Builds
Weak vs Strong: Multi-Stage Builds
golang base for building with CGO_ENABLED=0, then distroless/static for production. For Node.js, it is trickier because you still need the runtime — I use node:18-slim as the final stage with only production node_modules. The advanced technique is --target — I define a test stage that runs go test and a prod stage for deployment. CI runs docker build --target=test for testing and docker build --target=prod for the release image, all from the same Dockerfile.”