Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Chapter 5: Container Images
Container images are the portable, versioned packages that contain everything needed to run an application. Let’s understand the OCI format and implement image pulling! An image is to a container what a class is to an object in OOP: the image is the blueprint, the container is the running instance. But unlike a class, an image is designed for distribution — it can be pushed to a registry, pulled by any machine in the world, and run identically everywhere. The OCI (Open Container Initiative) specification standardizes the image format so that images built by Docker can be run by Podman, containerd, or any compliant runtime. Understanding this format teaches you how content-addressable layers (the same idea as Git’s object store!) enable efficient storage, incremental transfers, and cryptographic verification of every byte.Prerequisites: Chapter 3: Filesystem
Further Reading: System Design: Distributed Storage
Time: 3-4 hours
Outcome: Pull and run images from Docker Hub
Further Reading: System Design: Distributed Storage
Time: 3-4 hours
Outcome: Pull and run images from Docker Hub
OCI Image Specification
Image Registry Protocol
Part 1: Registry Client
src/main/java/com/minidocker/image/RegistryClient.java
Part 2: Image Manifest
src/main/java/com/minidocker/image/ImageManifest.java
Part 3: Image Config
src/main/java/com/minidocker/image/ImageConfig.java
Part 4: Image Puller
src/main/java/com/minidocker/image/ImagePuller.java
Part 5: Using Pulled Images
Image Storage Structure
Exercises
Exercise 1: Implement Image Listing
Exercise 1: Implement Image Listing
Add command to list local images:
Exercise 2: Implement Image Removal
Exercise 2: Implement Image Removal
Add command to remove unused images:
Exercise 3: Implement Basic Image Building
Exercise 3: Implement Basic Image Building
Build images from Dockerfile:
Key Takeaways
Content Addressable
Layers identified by SHA256 hash of contents
Layer Sharing
Common base layers shared between images
Manifest + Config
Manifest lists layers; Config has runtime settings
Incremental Pull
Only download layers not already cached
Congratulations! 🎉
You’ve built a working container runtime with:- ✅ Linux namespaces for isolation
- ✅ Cgroups for resource limits
- ✅ Overlay filesystem with copy-on-write
- ✅ Bridge networking with NAT
- ✅ OCI-compatible image pulling
Docker Project Complete!
You now understand how containers work at the kernel level!
What’s Next?
Continue learning with other Build Your Own X projects:Build Your Own Git
Understand version control internals
Build Your Own Redis
Build an in-memory data store
Interview Deep-Dive
How does content-addressable storage work in Docker images, and why is it important for a container registry serving millions of pulls?
How does content-addressable storage work in Docker images, and why is it important for a container registry serving millions of pulls?
Strong Answer:
- Every layer in a Docker image is identified by the SHA256 hash of its contents. When you push an image, the registry stores each layer by its digest (hash). When another image shares the same base layer (same Ubuntu version, same apt packages), that layer already exists in the registry and does not need to be uploaded or stored again.
- For a registry like Docker Hub serving millions of pulls, this deduplication is enormous. Consider that millions of images use the same
ubuntu:22.04base layer. Without content-addressable storage, that layer would be stored millions of times. With it, it is stored once, and every image that references it just points to the same blob. - The pull protocol exploits this directly: the client fetches the manifest (which lists layer digests), checks which layers it already has locally, and only downloads the missing ones. On a build server that frequently pulls similar images, most layers are cached, and pulls complete in seconds instead of minutes.
- The integrity guarantee is also critical: if a layer’s content does not match its digest, the client rejects it. This makes it impossible for a compromised registry to silently serve tampered layers (assuming the manifest itself is verified via Docker Content Trust or cosign signatures).
myapp:latest, the registry updates the tag to point to the new manifest digest. The old manifest and its layers are not deleted — they become “untagged” and eligible for garbage collection. This is why relying on :latest in production is dangerous: it is a moving pointer, and two pulls of the same tag can return different images if a push happened in between. The fix is to pin images by digest (myapp@sha256:abc123...), which is fully immutable. This is also why registry garbage collection is a non-trivial operation — the registry must walk all manifests to determine which layers are still referenced before deleting orphaned blobs.Explain the relationship between a Dockerfile, an image manifest, and the OCI image specification. Why does the format matter?
Explain the relationship between a Dockerfile, an image manifest, and the OCI image specification. Why does the format matter?
Strong Answer:
- A Dockerfile is a build recipe — it describes how to construct an image through a sequence of instructions. The Docker build engine (BuildKit) executes each instruction in a temporary container, captures the filesystem diff as a new layer, and stacks the layers to produce a final image.
- The image manifest is the metadata that describes the finished image: which layers it contains (by digest), the image configuration (environment variables, entrypoint, exposed ports), and the architecture/OS. The manifest is what a registry stores and what a runtime fetches when pulling.
- The OCI (Open Container Initiative) image specification standardizes the manifest and layer formats so that images built by Docker can run on Podman, containerd, CRI-O, or any OCI-compliant runtime. Before OCI, Docker’s image format was proprietary, and competing runtimes had to reverse-engineer it.
- The format matters because it determines portability, security scanning, and storage efficiency. OCI manifests support multi-architecture images (a single tag that resolves to different platform-specific manifests via a “manifest list” or “index”), which is essential for organizations running mixed amd64/arm64 clusters.
alpine or scratch) and copies only the compiled binary from the earlier stage. The build-time layers (compilers, package managers, source code) are discarded entirely, never appearing in the final image. This can reduce image sizes from gigabytes to megabytes. Beyond size, it dramatically reduces the attack surface: a Go binary in a scratch image has zero installed packages, zero shell, and zero utilities for an attacker to exploit.A docker pull is taking 10 minutes in your CI pipeline. Walk me through how you would diagnose and fix the slowness.
A docker pull is taking 10 minutes in your CI pipeline. Walk me through how you would diagnose and fix the slowness.
Strong Answer:
- First, check if layers are being cached. Run
docker pullwith--verboseor checkdocker system dfto see cached layers. If the CI environment uses ephemeral runners (like GitHub Actions default runners), the Docker cache is empty on every run, forcing a full pull every time. - Second, check layer sizes:
docker manifest inspect <image> -vshows each layer’s size. A single 2GB layer (common with unoptimized images that install everything in one RUN instruction) bottlenecks the pull regardless of network speed. - Third, check registry proximity. Pulling from Docker Hub in a CI environment running in a different region adds round-trip latency for each layer. Use a registry mirror or artifact proxy (like AWS ECR pull-through cache, or Harbor as a mirror) close to your CI runners.
- The fixes, in order of impact: (1) Use smaller base images (alpine instead of ubuntu saves ~100MB). (2) Order Dockerfile instructions from least to most frequently changing so cached layers are reused. (3) Enable BuildKit layer caching in CI (
--cache-fromand--cache-toflags with registry-backed caches). (4) Use a regional registry mirror. (5) If ephemeral runners are the bottleneck, pre-bake a runner AMI with common base images already pulled.
--cache-from=type=registry,ref=...), enabling cache sharing across machines. BuildKit also caches at a finer granularity — it uses a content-addressable build graph rather than instruction-level matching, so reordering unrelated instructions does not invalidate the entire cache. This alone can cut CI build times by 60-80% for projects with stable dependencies.