Chapter 1: Linux Namespaces
Containers aren’t magic - they’re built on Linux kernel primitives. The first and most fundamental of these is namespaces. In this chapter, we’ll build our own container runtime in Java, starting with namespace isolation.Prerequisites: Linux Internals: Processes
Further Reading: Operating Systems: Process Management
Time: 3-4 hours
Outcome: Understanding of namespace isolation
Further Reading: Operating Systems: Process Management
Time: 3-4 hours
Outcome: Understanding of namespace isolation
What Are Namespaces?
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITHOUT NAMESPACES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ HOST SYSTEM │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Process A (PID 1234) Process B (PID 1235) │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Can see all │ │ Can see all │ │ │
│ │ │ processes │ │ processes │ │ │
│ │ │ Same network │ │ Same network │ │ │
│ │ │ Same filesystem │ │ Same filesystem │ │ │
│ │ │ Same users │ │ Same users │ │ │
│ │ └─────────────────┘ └─────────────────┘ │ │
│ │ │ │
│ │ Both processes share the same view of the system │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITH NAMESPACES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ HOST SYSTEM │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Container A Container B │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ │ │
│ │ │ PID Namespace │ │ PID Namespace │ │ │
│ │ │ Sees only own PIDs│ │ Sees only own PIDs│ │ │
│ │ │ PID 1 = container │ │ PID 1 = container │ │ │
│ │ │ init process │ │ init process │ │ │
│ │ ├───────────────────┤ ├───────────────────┤ │ │
│ │ │ NET Namespace │ │ NET Namespace │ │ │
│ │ │ Own eth0, ports │ │ Own eth0, ports │ │ │
│ │ ├───────────────────┤ ├───────────────────┤ │ │
│ │ │ MNT Namespace │ │ MNT Namespace │ │ │
│ │ │ Own root fs │ │ Own root fs │ │ │
│ │ └───────────────────┘ └───────────────────┘ │ │
│ │ │ │
│ │ Each container has ISOLATED view of system resources │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Linux Namespace Types
| Namespace | Flag | Isolates |
|---|---|---|
| PID | CLONE_NEWPID | Process IDs - container sees own PID 1 |
| NET | CLONE_NEWNET | Network stack - own interfaces, IPs, ports |
| MNT | CLONE_NEWNS | Mount points - own filesystem view |
| UTS | CLONE_NEWUTS | Hostname and domain name |
| IPC | CLONE_NEWIPC | Inter-process communication |
| USER | CLONE_NEWUSER | User and group IDs |
| CGROUP | CLONE_NEWCGROUP | Cgroup root directory |
Part 1: Project Setup
We’ll use Java with JNA (Java Native Access) to call Linux system calls.pom.xml
Copy
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>com.minidocker</groupId>
<artifactId>minidocker</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
</properties>
<dependencies>
<!-- JNA for native Linux calls -->
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>5.13.0</version>
</dependency>
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna-platform</artifactId>
<version>5.13.0</version>
</dependency>
<!-- CLI parsing -->
<dependency>
<groupId>info.picocli</groupId>
<artifactId>picocli</artifactId>
<version>4.7.4</version>
</dependency>
</dependencies>
</project>
Part 2: Linux System Call Bindings
First, we need to call Linux system calls from Java:src/main/java/com/minidocker/linux/LibC.java
Copy
package com.minidocker.linux;
import com.sun.jna.Library;
import com.sun.jna.Native;
import com.sun.jna.Pointer;
/**
* JNA bindings to Linux libc functions.
*
* These are the low-level system calls that Docker uses internally.
*/
public interface LibC extends Library {
LibC INSTANCE = Native.load("c", LibC.class);
// Namespace flags
int CLONE_NEWNS = 0x00020000; // Mount namespace
int CLONE_NEWUTS = 0x04000000; // UTS namespace (hostname)
int CLONE_NEWIPC = 0x08000000; // IPC namespace
int CLONE_NEWUSER = 0x10000000; // User namespace
int CLONE_NEWPID = 0x20000000; // PID namespace
int CLONE_NEWNET = 0x40000000; // Network namespace
int CLONE_NEWCGROUP = 0x02000000; // Cgroup namespace
// Mount flags
int MS_BIND = 4096;
int MS_REC = 16384;
int MS_PRIVATE = 1 << 18;
int MS_NOSUID = 2;
int MS_NOEXEC = 8;
int MS_NODEV = 4;
/**
* Create a new namespace and move the calling process into it.
*
* @param flags Combination of CLONE_NEW* flags
* @return 0 on success, -1 on error
*/
int unshare(int flags);
/**
* Change the root filesystem.
*
* @param path New root directory
* @return 0 on success, -1 on error
*/
int chroot(String path);
/**
* Change working directory.
*/
int chdir(String path);
/**
* Mount a filesystem.
*
* @param source Source device/path
* @param target Mount point
* @param filesystemtype Type (e.g., "proc", "sysfs")
* @param mountflags Mount flags
* @param data Additional data
*/
int mount(String source, String target, String filesystemtype,
long mountflags, Pointer data);
/**
* Unmount a filesystem.
*/
int umount2(String target, int flags);
/**
* Set hostname.
*/
int sethostname(String name, int len);
/**
* Get process ID.
*/
int getpid();
/**
* Get parent process ID.
*/
int getppid();
/**
* Get user ID.
*/
int getuid();
/**
* Get group ID.
*/
int getgid();
/**
* Execute a program.
*/
int execv(String path, String[] argv);
/**
* Fork the process.
*/
int fork();
/**
* Wait for child process.
*/
int waitpid(int pid, int[] status, int options);
/**
* Set user/group ID mappings.
*/
int setuid(int uid);
int setgid(int gid);
/**
* Pivot root - atomically swap root filesystems.
*/
int pivot_root(String new_root, String put_old);
}
Part 3: Namespace Manager
src/main/java/com/minidocker/namespace/NamespaceManager.java
Copy
package com.minidocker.namespace;
import com.minidocker.linux.LibC;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
/**
* Manages Linux namespace creation and configuration.
*
* Namespaces provide isolation for various system resources:
* - PID: Process sees its own process tree, with PID 1
* - NET: Separate network stack (interfaces, routing, firewall)
* - MNT: Separate mount points
* - UTS: Separate hostname
* - IPC: Separate System V IPC and POSIX message queues
* - USER: Separate user/group ID mappings
*/
public class NamespaceManager {
private final LibC libc = LibC.INSTANCE;
/**
* Creates all namespaces needed for container isolation.
*
* @param options Configuration options
* @throws NamespaceException if namespace creation fails
*/
public void createNamespaces(NamespaceOptions options) throws NamespaceException {
int flags = 0;
if (options.newPidNamespace()) {
flags |= LibC.CLONE_NEWPID;
}
if (options.newNetNamespace()) {
flags |= LibC.CLONE_NEWNET;
}
if (options.newMountNamespace()) {
flags |= LibC.CLONE_NEWNS;
}
if (options.newUtsNamespace()) {
flags |= LibC.CLONE_NEWUTS;
}
if (options.newIpcNamespace()) {
flags |= LibC.CLONE_NEWIPC;
}
if (options.newUserNamespace()) {
flags |= LibC.CLONE_NEWUSER;
}
System.out.println("Creating namespaces with flags: 0x" + Integer.toHexString(flags));
// unshare() creates new namespaces and moves calling process into them
int result = libc.unshare(flags);
if (result != 0) {
throw new NamespaceException("Failed to create namespaces: " +
Native.getLastError());
}
System.out.println("✓ Namespaces created successfully");
}
/**
* Sets up user namespace mappings.
*
* Maps container user 0 (root) to host user.
*/
public void setupUserNamespace() throws IOException {
int uid = libc.getuid();
int gid = libc.getgid();
int pid = libc.getpid();
// Write UID mapping: container_uid host_uid count
// Maps container root (0) to current host user
Path uidMapPath = Path.of("/proc/" + pid + "/uid_map");
Files.writeString(uidMapPath, "0 " + uid + " 1\n");
// Disable setgroups (required before writing gid_map)
Path setgroupsPath = Path.of("/proc/" + pid + "/setgroups");
Files.writeString(setgroupsPath, "deny\n");
// Write GID mapping
Path gidMapPath = Path.of("/proc/" + pid + "/gid_map");
Files.writeString(gidMapPath, "0 " + gid + " 1\n");
System.out.println("✓ User namespace mapped: container root -> host uid " + uid);
}
/**
* Sets the hostname within the UTS namespace.
*/
public void setHostname(String hostname) throws NamespaceException {
int result = libc.sethostname(hostname, hostname.length());
if (result != 0) {
throw new NamespaceException("Failed to set hostname: " +
Native.getLastError());
}
System.out.println("✓ Hostname set to: " + hostname);
}
/**
* Demonstrates PID namespace isolation.
*/
public void showPidNamespaceInfo() {
int pid = libc.getpid();
int ppid = libc.getppid();
System.out.println("Inside container:");
System.out.println(" PID: " + pid);
System.out.println(" PPID: " + ppid);
// In a PID namespace, the first process gets PID 1
if (pid == 1) {
System.out.println(" → We are PID 1 (init process) in this namespace!");
}
}
}
Part 4: Namespace Options
src/main/java/com/minidocker/namespace/NamespaceOptions.java
Copy
package com.minidocker.namespace;
/**
* Configuration for which namespaces to create.
*/
public record NamespaceOptions(
boolean newPidNamespace,
boolean newNetNamespace,
boolean newMountNamespace,
boolean newUtsNamespace,
boolean newIpcNamespace,
boolean newUserNamespace
) {
/**
* Creates options with all namespaces enabled (typical container).
*/
public static NamespaceOptions all() {
return new NamespaceOptions(true, true, true, true, true, true);
}
/**
* Creates options with no namespaces (for testing).
*/
public static NamespaceOptions none() {
return new NamespaceOptions(false, false, false, false, false, false);
}
/**
* Builder for custom namespace configurations.
*/
public static Builder builder() {
return new Builder();
}
public static class Builder {
private boolean pid = false;
private boolean net = false;
private boolean mnt = false;
private boolean uts = false;
private boolean ipc = false;
private boolean user = false;
public Builder withPid() { this.pid = true; return this; }
public Builder withNet() { this.net = true; return this; }
public Builder withMount() { this.mnt = true; return this; }
public Builder withUts() { this.uts = true; return this; }
public Builder withIpc() { this.ipc = true; return this; }
public Builder withUser() { this.user = true; return this; }
public NamespaceOptions build() {
return new NamespaceOptions(pid, net, mnt, uts, ipc, user);
}
}
}
Part 5: Understanding Each Namespace
PID Namespace
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ PID NAMESPACE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ HOST VIEW: CONTAINER VIEW: │
│ │
│ PID COMMAND PID COMMAND │
│ 1 systemd 1 /bin/sh ← Thinks it's PID 1! │
│ 100 sshd 2 nginx │
│ 200 dockerd 3 worker │
│ 300 /bin/sh (container init) │
│ 301 nginx │
│ 302 worker │
│ │
│ Host PID 300 = Container PID 1 │
│ The container cannot see host processes! │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
UTS Namespace
Copy
public void demonstrateUtsNamespace() throws Exception {
// Before: we have host's hostname
System.out.println("Host hostname: " + getHostname());
// Create UTS namespace
libc.unshare(LibC.CLONE_NEWUTS);
// Now we can change hostname without affecting host
libc.sethostname("mycontainer", 11);
System.out.println("Container hostname: " + getHostname());
// Output: "mycontainer" - host is unaffected!
}
Mount Namespace
Copy
public void demonstrateMountNamespace() throws Exception {
// Create mount namespace
libc.unshare(LibC.CLONE_NEWNS);
// Make all mounts private (changes don't propagate to host)
libc.mount(null, "/", null, LibC.MS_REC | LibC.MS_PRIVATE, null);
// Now we can mount things that only this container sees
libc.mount("tmpfs", "/tmp", "tmpfs", 0, null);
// This /tmp is completely separate from host's /tmp
}
Part 6: Container Runner
src/main/java/com/minidocker/Container.java
Copy
package com.minidocker;
import com.minidocker.linux.LibC;
import com.minidocker.namespace.NamespaceManager;
import com.minidocker.namespace.NamespaceOptions;
/**
* Main container class that orchestrates isolation.
*/
public class Container {
private final LibC libc = LibC.INSTANCE;
private final NamespaceManager namespaces = new NamespaceManager();
private final String hostname;
private final String[] command;
public Container(String hostname, String[] command) {
this.hostname = hostname;
this.command = command;
}
/**
* Runs the container.
*
* This is a simplified version - real Docker uses fork() and
* clone() for proper isolation. We use unshare() for simplicity.
*/
public void run() throws Exception {
System.out.println("=== Starting Container ===");
System.out.println("Hostname: " + hostname);
System.out.println("Command: " + String.join(" ", command));
System.out.println();
// Step 1: Create namespaces
NamespaceOptions options = NamespaceOptions.builder()
.withPid()
.withMount()
.withUts()
.withIpc()
.build();
namespaces.createNamespaces(options);
// Step 2: Set hostname
namespaces.setHostname(hostname);
// Step 3: Show namespace info
namespaces.showPidNamespaceInfo();
// Step 4: Fork to get PID 1 in new namespace
int pid = libc.fork();
if (pid == 0) {
// Child process - this is PID 1 in new PID namespace
runContainerInit();
} else if (pid > 0) {
// Parent - wait for child
int[] status = new int[1];
libc.waitpid(pid, status, 0);
System.out.println("Container exited with status: " + status[0]);
} else {
throw new RuntimeException("Fork failed");
}
}
private void runContainerInit() {
try {
System.out.println("\n=== Container Init (PID " + libc.getpid() + ") ===");
// Execute the command
if (command.length > 0) {
String[] argv = new String[command.length + 1];
System.arraycopy(command, 0, argv, 0, command.length);
argv[command.length] = null; // null-terminated
libc.execv(command[0], argv);
// If we get here, exec failed
System.err.println("Failed to execute: " + command[0]);
System.exit(1);
}
} catch (Exception e) {
System.err.println("Container init failed: " + e.getMessage());
System.exit(1);
}
}
public static void main(String[] args) {
if (args.length < 2) {
System.out.println("Usage: java Container <hostname> <command...>");
System.out.println("Example: java Container mycontainer /bin/sh");
System.exit(1);
}
String hostname = args[0];
String[] command = new String[args.length - 1];
System.arraycopy(args, 1, command, 0, command.length);
try {
Container container = new Container(hostname, command);
container.run();
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
e.printStackTrace();
System.exit(1);
}
}
}
Exercises
Exercise 1: Add Network Namespace
Exercise 1: Add Network Namespace
Extend the namespace manager to create network namespaces:
Copy
// 1. Create network namespace
libc.unshare(LibC.CLONE_NEWNET);
// 2. Bring up loopback interface
// Use: ip link set lo up
// (Requires additional native calls or ProcessBuilder)
// 3. Verify isolation
// The container should have its own network stack
Exercise 2: Implement Namespace Joining
Exercise 2: Implement Namespace Joining
Allow joining an existing container’s namespaces:
Copy
// Use setns() syscall to join existing namespace
// int setns(int fd, int nstype);
// fd = open("/proc/<pid>/ns/<type>")
// This is how "docker exec" works!
Exercise 3: User Namespace Mapping
Exercise 3: User Namespace Mapping
Implement user namespace with UID/GID mapping:
Copy
// 1. Create user namespace FIRST (before other namespaces)
// 2. Write to /proc/self/uid_map
// 3. Write "deny" to /proc/self/setgroups
// 4. Write to /proc/self/gid_map
// This enables unprivileged containers!
Key Takeaways
Isolation Not Virtualization
Namespaces isolate views of resources, not the resources themselves
Kernel Primitives
unshare(), clone(), setns() are the syscalls that power containers
Layered Isolation
Each namespace type isolates a different resource
No Overhead
Namespaces add negligible overhead - just kernel data structures
Further Reading
Linux Namespaces Manual
Official documentation for Linux namespaces
Linux Internals Course
Deep dive into Linux process management
What’s Next?
In Chapter 2: Control Groups (cgroups), we’ll implement:- CPU limits
- Memory limits
- Process count limits
- Resource accounting
Next: Cgroups
Learn how to limit container resources