Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Chapter 3: Filesystem Isolation
Containers need their own filesystem — but copying gigabytes for each container would be wasteful. Enter overlay filesystems and copy-on-write! This is one of the cleverest ideas in container technology. Imagine a library with a single reference copy of a textbook. Instead of printing a full copy for each student, you give each student a stack of transparent overlays. They can write notes on their overlay, and when they look through it, they see the original book plus their own annotations. If they “delete” a paragraph, they just put a sticky note over it. The original book is never touched. That is exactly how OverlayFS works: a shared read-only base layer (the image) plus a thin read-write layer per container where changes are captured. This is why you can spin up 100 containers from the same image and consume only marginally more disk space than a single container.Prerequisites: Chapter 2: Cgroups
Further Reading: Operating Systems: File Systems
Time: 3-4 hours
Outcome: Efficient layered filesystem for containers
Further Reading: Operating Systems: File Systems
Time: 3-4 hours
Outcome: Efficient layered filesystem for containers
The Storage Challenge
How OverlayFS Works
Part 1: Filesystem Manager
src/main/java/com/minidocker/fs/FilesystemManager.java
Part 2: Image Layer Manager
src/main/java/com/minidocker/image/ImageManager.java
Part 3: Copy-on-Write Demonstration
Part 4: Integrated Container
src/main/java/com/minidocker/Container.java
Exercises
Exercise 1: Implement Layer Caching
Exercise 1: Implement Layer Caching
Implement efficient layer caching:
Exercise 2: Add Volume Mounts
Exercise 2: Add Volume Mounts
Support mounting host directories:
Exercise 3: Implement Image Building
Exercise 3: Implement Image Building
Create a simple Dockerfile-like builder:
Key Takeaways
Overlay FS
Combines multiple directories into unified view
Copy-on-Write
Changes are written to upper layer, lower layers unchanged
Layer Sharing
Base layers shared between containers saves space
Whiteouts
Special files mark deletions without modifying lower layers
What’s Next?
In Chapter 4: Networking, we’ll implement:- Virtual ethernet pairs (veth)
- Bridge networking
- Port forwarding
- Container-to-container communication
Next: Networking
Connect your containers to the network
Interview Deep-Dive
How does OverlayFS copy-on-write actually work at the filesystem level, and what are the performance implications?
How does OverlayFS copy-on-write actually work at the filesystem level, and what are the performance implications?
Strong Answer:
- When a container reads a file, OverlayFS checks the upper (writable) layer first. If the file is not there, it transparently reads from the lower layers. When a container modifies a file, OverlayFS performs a “copy-up”: it copies the entire file from the lower layer to the upper layer, then applies the modification. The first write to a large file incurs the full copy cost — modifying one byte of a 500MB file triggers a 500MB copy.
- Deletions are handled with “whiteout” files — marker files in the upper layer (named
.wh.<filename>) that hide the corresponding lower-layer file. The lower-layer file still exists on disk, consuming space that cannot be reclaimed without rebuilding the image. - For production: keep container writes to volumes (not the overlay), minimize modification of base image files, and use multi-stage builds to keep layers small. Database containers that write to overlay instead of a mounted volume suffer from copy-up overhead and I/O amplification.
RUN apt-get install then RUN apt-get clean in a separate instruction, the cleanup does not reduce image size because the installed files persist in the earlier layer. The clean layer only adds whiteouts. The best practice is chaining commands in a single RUN instruction so intermediate files never persist in a committed layer. Multi-stage builds solve this more fundamentally by copying only final artifacts into a fresh image.What is the difference between pivot_root and chroot, and why do container runtimes prefer pivot_root?
What is the difference between pivot_root and chroot, and why do container runtimes prefer pivot_root?
Strong Answer:
chroot()changes the apparent root for pathname lookups, but the process retains its original root via open file descriptors and can escape with root privileges.pivot_root()atomically swaps the current root mount with a new one, then the old root is unmounted entirely. Afterpivot_root, there is no accessible path back to the host filesystem.- The security difference is meaningful:
chrootis a pathname-level illusion;pivot_rootis a mount-namespace-level operation that actually detaches the old filesystem. - A practical detail:
pivot_rootrequires the new root to be a mount point, which is why runtimes bind-mount the new root to itself first. Skipping this causes confusing “invalid argument” errors. - Combined with dropping
CAP_SYS_ADMIN, seccomp profiles, and user namespaces,pivot_rootis one layer in a defense-in-depth filesystem isolation strategy.
CAP_SYS_ADMIN could access the host via /proc/1/root. Defense in depth includes: dropping capabilities, mounting /proc with hidepid=2, seccomp profiles blocking mount and pivot_root from within the container, and user namespaces so container root maps to an unprivileged host user. No single mechanism is sufficient.A developer reports 10GB container disk usage despite only 200MB of application data. How would you investigate?
A developer reports 10GB container disk usage despite only 200MB of application data. How would you investigate?
Strong Answer:
- Check image layers with
docker history <image>— a Dockerfile that installs build tools then cleans up in a later layer still stores them in earlier layers. Check the writable layer withdocker diff <container>for large log files or temp files. Check whether the application writes to overlay instead of mounted volumes. - The fix depends on root cause: for image bloat, use multi-stage builds or combine RUN instructions; for runtime bloat, mount writable paths as volumes; for copy-up issues, avoid modifying large base image files.
- You cannot shrink the overlay upper layer while the container is running. Options are: delete files within the container, restart (discards upper layer), or
docker cpdata to a volume. For prevention, use tmpfs mounts for scratch data and configure log rotation.