Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Chapter 5: Container Images

Container images are the portable, versioned packages that contain everything needed to run an application. Let’s understand the OCI format and implement image pulling! An image is to a container what a class is to an object in OOP: the image is the blueprint, the container is the running instance. But unlike a class, an image is designed for distribution — it can be pushed to a registry, pulled by any machine in the world, and run identically everywhere. The OCI (Open Container Initiative) specification standardizes the image format so that images built by Docker can be run by Podman, containerd, or any compliant runtime. Understanding this format teaches you how content-addressable layers (the same idea as Git’s object store!) enable efficient storage, incremental transfers, and cryptographic verification of every byte.
Prerequisites: Chapter 3: Filesystem
Further Reading: System Design: Distributed Storage
Time: 3-4 hours
Outcome: Pull and run images from Docker Hub

OCI Image Specification

┌─────────────────────────────────────────────────────────────────────────────┐
│                        OCI IMAGE STRUCTURE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   IMAGE MANIFEST (application/vnd.oci.image.manifest.v1+json)               │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │  {                                                                   │  │
│   │    "schemaVersion": 2,                                               │  │
│   │    "mediaType": "application/vnd.oci.image.manifest.v1+json",       │  │
│   │    "config": {                         ← Image configuration        │  │
│   │      "digest": "sha256:abc123...",                                   │  │
│   │      "size": 7023                                                    │  │
│   │    },                                                                │  │
│   │    "layers": [                         ← Filesystem layers          │  │
│   │      { "digest": "sha256:layer1...", "size": 32654848 },            │  │
│   │      { "digest": "sha256:layer2...", "size": 16724992 },            │  │
│   │      { "digest": "sha256:layer3...", "size": 73109 }                │  │
│   │    ]                                                                 │  │
│   │  }                                                                   │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│   CONFIG BLOB (application/vnd.oci.image.config.v1+json)                   │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │  {                                                                   │  │
│   │    "architecture": "amd64",                                          │  │
│   │    "os": "linux",                                                    │  │
│   │    "config": {                                                       │  │
│   │      "Env": ["PATH=/usr/local/bin:/usr/bin"],                       │  │
│   │      "Cmd": ["/bin/sh"],                                             │  │
│   │      "WorkingDir": "/"                                               │  │
│   │    },                                                                │  │
│   │    "rootfs": {                                                       │  │
│   │      "type": "layers",                                               │  │
│   │      "diff_ids": ["sha256:...", "sha256:...", "sha256:..."]         │  │
│   │    },                                                                │  │
│   │    "history": [...]                                                  │  │
│   │  }                                                                   │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│   LAYER BLOBS (application/vnd.oci.image.layer.v1.tar+gzip)                │
│   ┌─────────────────────────────────────────────────────────────────────┐  │
│   │  Layer 1: Base OS files (compressed tar)                            │  │
│   │  Layer 2: Runtime/Dependencies (compressed tar)                     │  │
│   │  Layer 3: Application code (compressed tar)                         │  │
│   └─────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Image Registry Protocol

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DOCKER REGISTRY V2 API                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   STEP 1: Get Authentication Token                                          │
│   ───────────────────────────────                                           │
│   GET https://auth.docker.io/token?                                         │
│       service=registry.docker.io&                                           │
│       scope=repository:library/alpine:pull                                  │
│                                                                              │
│   Response: { "token": "eyJ0eXAiOi..." }                                    │
│                                                                              │
│   ─────────────────────────────────────────────────────────────────────     │
│                                                                              │
│   STEP 2: Fetch Image Manifest                                              │
│   ────────────────────────────                                              │
│   GET https://registry-1.docker.io/v2/library/alpine/manifests/latest      │
│   Authorization: Bearer eyJ0eXAiOi...                                       │
│   Accept: application/vnd.oci.image.manifest.v1+json                        │
│                                                                              │
│   Response: (the manifest JSON)                                             │
│                                                                              │
│   ─────────────────────────────────────────────────────────────────────     │
│                                                                              │
│   STEP 3: Fetch Config Blob                                                 │
│   ─────────────────────────                                                 │
│   GET https://registry-1.docker.io/v2/library/alpine/blobs/sha256:abc...   │
│                                                                              │
│   ─────────────────────────────────────────────────────────────────────     │
│                                                                              │
│   STEP 4: Fetch Layer Blobs (for each layer)                               │
│   ─────────────────────────────────────────                                 │
│   GET https://registry-1.docker.io/v2/library/alpine/blobs/sha256:layer... │
│                                                                              │
│   Response: (gzipped tar file)                                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Part 1: Registry Client

src/main/java/com/minidocker/image/RegistryClient.java
package com.minidocker.image;

import java.io.IOException;
import java.io.InputStream;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Map;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

/**
 * Client for Docker Registry V2 API.
 * 
 * Handles:
 * - Authentication (Bearer tokens)
 * - Manifest fetching
 * - Blob (layer) downloading
 */
public class RegistryClient {
    
    private static final String DOCKER_HUB = "https://registry-1.docker.io";
    private static final String AUTH_URL = "https://auth.docker.io/token";
    
    private final HttpClient httpClient;
    private final ObjectMapper objectMapper;
    
    public RegistryClient() {
        this.httpClient = HttpClient.newBuilder()
            .followRedirects(HttpClient.Redirect.ALWAYS)
            .build();
        this.objectMapper = new ObjectMapper();
    }
    
    /**
     * Parses image reference (e.g., "alpine:3.18" or "nginx:latest").
     */
    public ImageReference parseReference(String image) {
        String registry = DOCKER_HUB;
        String repository;
        String tag = "latest";
        
        // Handle explicit registry
        if (image.contains("/") && image.split("/")[0].contains(".")) {
            String[] parts = image.split("/", 2);
            registry = "https://" + parts[0];
            image = parts[1];
        }
        
        // Handle tag
        if (image.contains(":")) {
            String[] parts = image.split(":", 2);
            repository = parts[0];
            tag = parts[1];
        } else {
            repository = image;
        }
        
        // Add "library/" prefix for Docker Hub official images
        if (registry.equals(DOCKER_HUB) && !repository.contains("/")) {
            repository = "library/" + repository;
        }
        
        return new ImageReference(registry, repository, tag);
    }
    
    /**
     * Gets authentication token for a repository.
     */
    public String getToken(ImageReference ref) throws IOException, InterruptedException {
        String scope = "repository:" + ref.repository() + ":pull";
        String url = AUTH_URL + "?service=registry.docker.io&scope=" + scope;
        
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .GET()
            .build();
        
        HttpResponse<String> response = httpClient.send(request, 
            HttpResponse.BodyHandlers.ofString());
        
        if (response.statusCode() != 200) {
            throw new IOException("Failed to get token: " + response.statusCode());
        }
        
        JsonNode json = objectMapper.readTree(response.body());
        return json.get("token").asText();
    }
    
    /**
     * Fetches the image manifest.
     */
    public ImageManifest getManifest(ImageReference ref, String token) 
            throws IOException, InterruptedException {
        
        String url = ref.registry() + "/v2/" + ref.repository() + 
                    "/manifests/" + ref.tag();
        
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .header("Authorization", "Bearer " + token)
            .header("Accept", "application/vnd.docker.distribution.manifest.v2+json")
            .header("Accept", "application/vnd.oci.image.manifest.v1+json")
            .GET()
            .build();
        
        HttpResponse<String> response = httpClient.send(request,
            HttpResponse.BodyHandlers.ofString());
        
        if (response.statusCode() != 200) {
            throw new IOException("Failed to get manifest: " + response.statusCode());
        }
        
        return ImageManifest.parse(response.body(), objectMapper);
    }
    
    /**
     * Fetches a blob (config or layer) and saves to disk.
     */
    public Path downloadBlob(ImageReference ref, String token, String digest, Path destDir)
            throws IOException, InterruptedException {
        
        Path destPath = destDir.resolve(digest.replace(":", "_"));
        
        // Check if already downloaded
        if (Files.exists(destPath)) {
            System.out.println("Layer cached: " + digest.substring(0, 19) + "...");
            return destPath;
        }
        
        String url = ref.registry() + "/v2/" + ref.repository() + "/blobs/" + digest;
        
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(url))
            .header("Authorization", "Bearer " + token)
            .GET()
            .build();
        
        System.out.println("Downloading: " + digest.substring(0, 19) + "...");
        
        HttpResponse<InputStream> response = httpClient.send(request,
            HttpResponse.BodyHandlers.ofInputStream());
        
        if (response.statusCode() != 200) {
            throw new IOException("Failed to download blob: " + response.statusCode());
        }
        
        Files.createDirectories(destDir);
        
        try (InputStream in = response.body()) {
            Files.copy(in, destPath);
        }
        
        return destPath;
    }
    
    /**
     * Image reference components.
     */
    public record ImageReference(String registry, String repository, String tag) {
        @Override
        public String toString() {
            return repository + ":" + tag;
        }
    }
}

Part 2: Image Manifest

src/main/java/com/minidocker/image/ImageManifest.java
package com.minidocker.image;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.util.ArrayList;
import java.util.List;

/**
 * Represents an OCI/Docker image manifest.
 */
public class ImageManifest {
    
    private final int schemaVersion;
    private final String configDigest;
    private final long configSize;
    private final List<Layer> layers;
    
    public ImageManifest(int schemaVersion, String configDigest, long configSize, 
                        List<Layer> layers) {
        this.schemaVersion = schemaVersion;
        this.configDigest = configDigest;
        this.configSize = configSize;
        this.layers = layers;
    }
    
    public static ImageManifest parse(String json, ObjectMapper mapper) throws Exception {
        JsonNode root = mapper.readTree(json);
        
        int schemaVersion = root.get("schemaVersion").asInt();
        
        JsonNode config = root.get("config");
        String configDigest = config.get("digest").asText();
        long configSize = config.get("size").asLong();
        
        List<Layer> layers = new ArrayList<>();
        for (JsonNode layer : root.get("layers")) {
            layers.add(new Layer(
                layer.get("mediaType").asText(),
                layer.get("digest").asText(),
                layer.get("size").asLong()
            ));
        }
        
        return new ImageManifest(schemaVersion, configDigest, configSize, layers);
    }
    
    public int getSchemaVersion() { return schemaVersion; }
    public String getConfigDigest() { return configDigest; }
    public long getConfigSize() { return configSize; }
    public List<Layer> getLayers() { return layers; }
    
    public long getTotalSize() {
        return configSize + layers.stream().mapToLong(Layer::size).sum();
    }
    
    /**
     * Represents a filesystem layer.
     */
    public record Layer(String mediaType, String digest, long size) {
        public boolean isGzipped() {
            return mediaType.contains("gzip");
        }
    }
}

Part 3: Image Config

src/main/java/com/minidocker/image/ImageConfig.java
package com.minidocker.image;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;

/**
 * Represents image configuration (CMD, ENV, etc.).
 */
public class ImageConfig {
    
    private final String architecture;
    private final String os;
    private final List<String> env;
    private final List<String> cmd;
    private final List<String> entrypoint;
    private final String workingDir;
    private final String user;
    private final List<String> exposedPorts;
    
    public ImageConfig(String architecture, String os, List<String> env,
                       List<String> cmd, List<String> entrypoint, String workingDir,
                       String user, List<String> exposedPorts) {
        this.architecture = architecture;
        this.os = os;
        this.env = env;
        this.cmd = cmd;
        this.entrypoint = entrypoint;
        this.workingDir = workingDir;
        this.user = user;
        this.exposedPorts = exposedPorts;
    }
    
    public static ImageConfig load(Path configPath) throws Exception {
        ObjectMapper mapper = new ObjectMapper();
        String json = Files.readString(configPath);
        JsonNode root = mapper.readTree(json);
        
        String architecture = root.get("architecture").asText();
        String os = root.get("os").asText();
        
        JsonNode config = root.get("config");
        
        List<String> env = new ArrayList<>();
        if (config.has("Env")) {
            for (JsonNode e : config.get("Env")) {
                env.add(e.asText());
            }
        }
        
        List<String> cmd = new ArrayList<>();
        if (config.has("Cmd")) {
            for (JsonNode c : config.get("Cmd")) {
                cmd.add(c.asText());
            }
        }
        
        List<String> entrypoint = new ArrayList<>();
        if (config.has("Entrypoint")) {
            for (JsonNode e : config.get("Entrypoint")) {
                entrypoint.add(e.asText());
            }
        }
        
        String workingDir = config.has("WorkingDir") ? 
            config.get("WorkingDir").asText() : "/";
            
        String user = config.has("User") ? 
            config.get("User").asText() : "";
            
        List<String> exposedPorts = new ArrayList<>();
        if (config.has("ExposedPorts")) {
            config.get("ExposedPorts").fieldNames().forEachRemaining(exposedPorts::add);
        }
        
        return new ImageConfig(architecture, os, env, cmd, entrypoint, 
                              workingDir, user, exposedPorts);
    }
    
    public String getArchitecture() { return architecture; }
    public String getOs() { return os; }
    public List<String> getEnv() { return env; }
    public List<String> getCmd() { return cmd; }
    public List<String> getEntrypoint() { return entrypoint; }
    public String getWorkingDir() { return workingDir; }
    public String getUser() { return user; }
    public List<String> getExposedPorts() { return exposedPorts; }
    
    /**
     * Gets the effective command to run.
     */
    public String[] getCommand() {
        List<String> command = new ArrayList<>();
        command.addAll(entrypoint);
        command.addAll(cmd);
        return command.toArray(new String[0]);
    }
}

Part 4: Image Puller

src/main/java/com/minidocker/image/ImagePuller.java
package com.minidocker.image;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.List;

/**
 * Pulls container images from registries.
 */
public class ImagePuller {
    
    private final RegistryClient client;
    private final Path storageDir;
    
    public ImagePuller(Path storageDir) {
        this.client = new RegistryClient();
        this.storageDir = storageDir;
    }
    
    /**
     * Pulls an image from a registry.
     * 
     * @param image Image name (e.g., "alpine:3.18")
     * @return Path to extracted rootfs
     */
    public PulledImage pull(String image) throws Exception {
        System.out.println("Pulling image: " + image);
        
        // Parse image reference
        RegistryClient.ImageReference ref = client.parseReference(image);
        System.out.println("Repository: " + ref.repository());
        System.out.println("Tag: " + ref.tag());
        
        // Get authentication token
        String token = client.getToken(ref);
        System.out.println("✓ Authenticated");
        
        // Get manifest
        ImageManifest manifest = client.getManifest(ref, token);
        System.out.println("✓ Fetched manifest (" + manifest.getLayers().size() + " layers)");
        System.out.println("  Total size: " + formatBytes(manifest.getTotalSize()));
        
        // Create directories
        Path blobsDir = storageDir.resolve("blobs");
        Path layersDir = storageDir.resolve("layers");
        Path imageDir = storageDir.resolve("images").resolve(
            ref.repository().replace("/", "_") + "_" + ref.tag());
        
        Files.createDirectories(blobsDir);
        Files.createDirectories(layersDir);
        Files.createDirectories(imageDir);
        
        // Download config
        Path configPath = client.downloadBlob(ref, token, 
            manifest.getConfigDigest(), blobsDir);
        
        ImageConfig config = ImageConfig.load(configPath);
        System.out.println("✓ Config loaded (arch: " + config.getArchitecture() + 
                          ", os: " + config.getOs() + ")");
        
        // Download and extract layers
        List<Path> extractedLayers = new ArrayList<>();
        
        for (int i = 0; i < manifest.getLayers().size(); i++) {
            ImageManifest.Layer layer = manifest.getLayers().get(i);
            
            System.out.println("Layer " + (i + 1) + "/" + manifest.getLayers().size() + 
                              " (" + formatBytes(layer.size()) + ")");
            
            // Download layer blob
            Path layerBlob = client.downloadBlob(ref, token, layer.digest(), blobsDir);
            
            // Extract layer
            Path extractedLayer = layersDir.resolve(layer.digest().replace(":", "_"));
            
            if (!Files.exists(extractedLayer)) {
                extractLayer(layerBlob, extractedLayer, layer.isGzipped());
                System.out.println("  ✓ Extracted");
            } else {
                System.out.println("  ✓ Cached");
            }
            
            extractedLayers.add(extractedLayer);
        }
        
        // Write image metadata
        writeImageMetadata(imageDir, ref, manifest, config);
        
        System.out.println("✓ Image pulled successfully: " + image);
        
        return new PulledImage(imageDir, extractedLayers, config);
    }
    
    private void extractLayer(Path tarball, Path destDir, boolean gzipped) 
            throws IOException, InterruptedException {
        
        Files.createDirectories(destDir);
        
        ProcessBuilder pb;
        if (gzipped) {
            pb = new ProcessBuilder("tar", "-xzf", tarball.toString(),
                                   "-C", destDir.toString());
        } else {
            pb = new ProcessBuilder("tar", "-xf", tarball.toString(),
                                   "-C", destDir.toString());
        }
        pb.inheritIO();
        
        int exitCode = pb.start().waitFor();
        if (exitCode != 0) {
            throw new IOException("tar extraction failed");
        }
    }
    
    private void writeImageMetadata(Path imageDir, RegistryClient.ImageReference ref,
                                   ImageManifest manifest, ImageConfig config) 
            throws IOException {
        
        // Write simple metadata file
        StringBuilder meta = new StringBuilder();
        meta.append("repository=").append(ref.repository()).append("\n");
        meta.append("tag=").append(ref.tag()).append("\n");
        meta.append("architecture=").append(config.getArchitecture()).append("\n");
        meta.append("os=").append(config.getOs()).append("\n");
        
        if (!config.getCmd().isEmpty()) {
            meta.append("cmd=").append(String.join(" ", config.getCmd())).append("\n");
        }
        if (!config.getEntrypoint().isEmpty()) {
            meta.append("entrypoint=").append(String.join(" ", config.getEntrypoint()))
                .append("\n");
        }
        
        Files.writeString(imageDir.resolve("metadata"), meta.toString());
    }
    
    private String formatBytes(long bytes) {
        if (bytes < 1024) return bytes + " B";
        if (bytes < 1024 * 1024) return String.format("%.1f KB", bytes / 1024.0);
        if (bytes < 1024 * 1024 * 1024) return String.format("%.1f MB", bytes / (1024.0 * 1024));
        return String.format("%.2f GB", bytes / (1024.0 * 1024 * 1024));
    }
    
    /**
     * Result of pulling an image.
     */
    public record PulledImage(Path imageDir, List<Path> layers, ImageConfig config) {}
}

Part 5: Using Pulled Images

public class Container {
    
    public static void main(String[] args) throws Exception {
        Path storageDir = Path.of("/var/lib/minidocker");
        ImagePuller puller = new ImagePuller(storageDir);
        
        // Pull the image
        PulledImage image = puller.pull("alpine:3.18");
        
        // Get the command to run
        String[] command = image.config().getCommand();
        if (command.length == 0) {
            command = new String[]{"/bin/sh"};
        }
        
        // Create and run container
        Container container = new Container(
            image.layers(),
            "alpine-container",
            command,
            ResourceLimits.defaults(),
            storageDir
        );
        
        container.run();
    }
}

Image Storage Structure

/var/lib/minidocker/
├── blobs/                              # Downloaded blobs (shared)
│   ├── sha256_abc123...                # Config blob
│   ├── sha256_layer1...                # Layer blob (compressed)
│   ├── sha256_layer2...                # Layer blob (compressed)
│   └── sha256_layer3...                # Layer blob (compressed)

├── layers/                             # Extracted layers (shared)
│   ├── sha256_layer1.../               # Extracted layer 1
│   │   ├── bin/
│   │   ├── etc/
│   │   └── lib/
│   ├── sha256_layer2.../               # Extracted layer 2
│   └── sha256_layer3.../               # Extracted layer 3

├── images/                             # Image metadata
│   ├── library_alpine_3.18/
│   │   └── metadata
│   └── library_nginx_latest/
│       └── metadata

└── containers/                         # Running containers
    └── abc123.../
        ├── upper/                      # Container writes
        ├── work/                       # Overlay work dir
        └── merged/                     # Merged view (rootfs)

Exercises

Add command to list local images:
// minidocker images
// REPOSITORY          TAG       SIZE
// library/alpine      3.18      7.2 MB
// library/nginx       latest    142 MB
Add command to remove unused images:
// minidocker rmi alpine:3.18
// 1. Check if any containers use this image
// 2. Remove image metadata
// 3. Garbage collect unused layers
Build images from Dockerfile:
// FROM alpine:3.18
// RUN apk add --no-cache python3
// COPY app.py /app/
// CMD ["python3", "/app/app.py"]

// 1. Pull base image
// 2. Run each instruction in container
// 3. Commit changes as new layer
// 4. Save manifest and config

Key Takeaways

Content Addressable

Layers identified by SHA256 hash of contents

Layer Sharing

Common base layers shared between images

Manifest + Config

Manifest lists layers; Config has runtime settings

Incremental Pull

Only download layers not already cached

Congratulations! 🎉

You’ve built a working container runtime with:
  • ✅ Linux namespaces for isolation
  • ✅ Cgroups for resource limits
  • ✅ Overlay filesystem with copy-on-write
  • ✅ Bridge networking with NAT
  • ✅ OCI-compatible image pulling

Docker Project Complete!

You now understand how containers work at the kernel level!

What’s Next?

Continue learning with other Build Your Own X projects:

Build Your Own Git

Understand version control internals

Build Your Own Redis

Build an in-memory data store

Interview Deep-Dive

Strong Answer:
  • Every layer in a Docker image is identified by the SHA256 hash of its contents. When you push an image, the registry stores each layer by its digest (hash). When another image shares the same base layer (same Ubuntu version, same apt packages), that layer already exists in the registry and does not need to be uploaded or stored again.
  • For a registry like Docker Hub serving millions of pulls, this deduplication is enormous. Consider that millions of images use the same ubuntu:22.04 base layer. Without content-addressable storage, that layer would be stored millions of times. With it, it is stored once, and every image that references it just points to the same blob.
  • The pull protocol exploits this directly: the client fetches the manifest (which lists layer digests), checks which layers it already has locally, and only downloads the missing ones. On a build server that frequently pulls similar images, most layers are cached, and pulls complete in seconds instead of minutes.
  • The integrity guarantee is also critical: if a layer’s content does not match its digest, the client rejects it. This makes it impossible for a compromised registry to silently serve tampered layers (assuming the manifest itself is verified via Docker Content Trust or cosign signatures).
Follow-up: What happens when you docker push an image with a tag that already exists? How does the registry handle mutable tags pointing to immutable content?Tags are mutable pointers to immutable manifests. When you push myapp:latest, the registry updates the tag to point to the new manifest digest. The old manifest and its layers are not deleted — they become “untagged” and eligible for garbage collection. This is why relying on :latest in production is dangerous: it is a moving pointer, and two pulls of the same tag can return different images if a push happened in between. The fix is to pin images by digest (myapp@sha256:abc123...), which is fully immutable. This is also why registry garbage collection is a non-trivial operation — the registry must walk all manifests to determine which layers are still referenced before deleting orphaned blobs.
Strong Answer:
  • A Dockerfile is a build recipe — it describes how to construct an image through a sequence of instructions. The Docker build engine (BuildKit) executes each instruction in a temporary container, captures the filesystem diff as a new layer, and stacks the layers to produce a final image.
  • The image manifest is the metadata that describes the finished image: which layers it contains (by digest), the image configuration (environment variables, entrypoint, exposed ports), and the architecture/OS. The manifest is what a registry stores and what a runtime fetches when pulling.
  • The OCI (Open Container Initiative) image specification standardizes the manifest and layer formats so that images built by Docker can run on Podman, containerd, CRI-O, or any OCI-compliant runtime. Before OCI, Docker’s image format was proprietary, and competing runtimes had to reverse-engineer it.
  • The format matters because it determines portability, security scanning, and storage efficiency. OCI manifests support multi-architecture images (a single tag that resolves to different platform-specific manifests via a “manifest list” or “index”), which is essential for organizations running mixed amd64/arm64 clusters.
Follow-up: What is a multi-stage build, and why is it the single most impactful optimization for production Docker images?A multi-stage build uses multiple FROM instructions in a single Dockerfile. The early stages install build tools, compile code, and run tests. The final stage starts from a minimal base (like alpine or scratch) and copies only the compiled binary from the earlier stage. The build-time layers (compilers, package managers, source code) are discarded entirely, never appearing in the final image. This can reduce image sizes from gigabytes to megabytes. Beyond size, it dramatically reduces the attack surface: a Go binary in a scratch image has zero installed packages, zero shell, and zero utilities for an attacker to exploit.
Strong Answer:
  • First, check if layers are being cached. Run docker pull with --verbose or check docker system df to see cached layers. If the CI environment uses ephemeral runners (like GitHub Actions default runners), the Docker cache is empty on every run, forcing a full pull every time.
  • Second, check layer sizes: docker manifest inspect <image> -v shows each layer’s size. A single 2GB layer (common with unoptimized images that install everything in one RUN instruction) bottlenecks the pull regardless of network speed.
  • Third, check registry proximity. Pulling from Docker Hub in a CI environment running in a different region adds round-trip latency for each layer. Use a registry mirror or artifact proxy (like AWS ECR pull-through cache, or Harbor as a mirror) close to your CI runners.
  • The fixes, in order of impact: (1) Use smaller base images (alpine instead of ubuntu saves ~100MB). (2) Order Dockerfile instructions from least to most frequently changing so cached layers are reused. (3) Enable BuildKit layer caching in CI (--cache-from and --cache-to flags with registry-backed caches). (4) Use a regional registry mirror. (5) If ephemeral runners are the bottleneck, pre-bake a runner AMI with common base images already pulled.
Follow-up: How does BuildKit’s cache differ from Docker’s legacy builder cache?The legacy builder caches layers locally on the daemon, keyed by the Dockerfile instruction. If the instruction text or any input file changes, the cache is invalidated for that layer and all subsequent layers. BuildKit introduces remote cache backends: you can push cache to a registry and pull it in CI (--cache-from=type=registry,ref=...), enabling cache sharing across machines. BuildKit also caches at a finer granularity — it uses a content-addressable build graph rather than instruction-level matching, so reordering unrelated instructions does not invalidate the entire cache. This alone can cut CI build times by 60-80% for projects with stable dependencies.