Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Operating System Interfaces - Complete Study Guide

Learn operating system fundamentals by understanding the xv6 teaching operating system - a simple, Unix-like OS that demonstrates core OS concepts in ~9,000 lines of readable code.

Core Concept: What is an Operating System?

An operating system has three primary jobs:

Abstract Hardware

Programs don’t need to know specific hardware details (e.g., which disk type, GPU model)

Share Resources

Multiple programs run simultaneously (or appear to) by multiplexing CPU, memory, and I/O

Enable Interaction

Programs can communicate and share data safely through controlled mechanisms

Key Design Tension

[!IMPORTANT] The Interface Dilemma
  • Simple/narrow interface = easier to implement correctly, but limited features
  • Complex interface = more features but harder to maintain and secure
  • Solution: Few powerful mechanisms that combine for generality

The xv6 Operating System

What is xv6?

xv6 is a teaching OS based on Unix design by Ken Thompson & Dennis Ritchie. It provides basic Unix interfaces and helps you understand modern operating systems like BSD, Linux, macOS, Solaris, and even Windows. Key Features:
  • Small enough to understand completely (~9,000 lines)
  • Real enough to be useful (actual working OS)
  • Provides essential Unix interfaces
  • Runs on RISC-V architecture

Architecture Overview

User Space vs Kernel Space Architecture Key Terms:
  • Process: Running program with memory (instructions, data, stack)
  • Kernel: Special program providing services to processes
  • System Call: Process requests kernel service (e.g., fork(), open(), read())
  • User Space: Normal programs run here (limited privileges)
  • Kernel Space: Kernel runs here (full hardware access)
Process alternates: User space → System call → Kernel space → Back to user space

System Calls Reference

Process Management

System CallDescriptionReturn Value
fork()Create new processChild PID to parent, 0 to child
exit(status)Terminate processNo return (0=success, 1=failure)
wait(*status)Wait for child exitChild PID or -1
kill(pid)Terminate process0 or -1
getpid()Get current process PIDPID
sleep(n)Pause for n clock ticks0
exec(file, argv[])Replace process with new programNo return on success, -1 on error
sbrk(n)Grow memory by n bytesAddress of new memory

File Operations

System CallDescriptionReturn Value
open(file, flags)Open fileFile descriptor (fd) or -1
read(fd, buf, n)Read n bytesBytes read or 0 at EOF
write(fd, buf, n)Write n bytesBytes written
close(fd)Release file descriptor0 or -1
dup(fd)Duplicate fdNew fd to same file
pipe(p[])Create pipe0 (p[0]=read, p[1]=write)

File System

System CallDescriptionReturn Value
chdir(dir)Change current directory0 or -1
mkdir(dir)Create directory0 or -1
mknod(file, major, minor)Create device file0 or -1
fstat(fd, *st)Get file info from fd0 or -1
stat(file, *st)Get file info from path0 or -1
link(file1, file2)Create new name for file0 or -1
unlink(file)Remove file name0 or -1
[!NOTE] Convention: Unless stated otherwise, system calls return 0 for success and -1 for error.

Processes and Memory

Process Structure

Each process has:
  • User-space memory: instructions + data + stack
  • Kernel-private state: CPU registers, PID, etc.
  • PID: Process identifier (unique number)

fork() - Creating Processes

The fork() system call creates an exact copy of the parent process. Behavior:
  1. Creates exact copy of parent process memory
  2. Both processes continue execution after fork()
  3. Only difference: return value
Return values:
  • Parent gets: child’s PID (positive number)
  • Child gets: 0
  • Error: -1
Fork and Exec Process Flow Example:
int pid = fork();
if(pid > 0) {
    // Parent branch: fork() returned the child's PID (a positive number).
    // The parent continues executing from this point with its own copy of memory.
    printf("parent: child=%d\n", pid);
    pid = wait((int *) 0);  // Block until the child calls exit()
    printf("child %d is done\n", pid);
} else if(pid == 0) {
    // Child branch: fork() returned 0, which is how the child knows it IS the child.
    // This code runs in a completely separate process with its own address space.
    printf("child: exiting\n");
    exit(0);
} else {
    // fork() returned -1: the kernel could not allocate a new process
    // (e.g., out of memory or process table full).
    printf("fork error\n");
}
Output order unpredictable:
parent: child=1234
child: exiting
OR
child: exiting
parent: child=1234
Then:
parent: child 1234 is done
[!IMPORTANT] Critical Points:
  • Parent and child have SEPARATE memory
  • Changing variable in one doesn’t affect other
  • Both start with identical memory contents
  • Each has separate registers

wait() - Synchronizing Processes

Purpose: Parent waits for child to finish Behavior:
  • Returns PID of exited child
  • Copies child’s exit status to provided address
  • If no children exited yet, blocks until one does
  • If no children exist, returns -1 immediately
Usage:
int status;
int pid = wait(&status);  // Get exit status
// OR
int pid = wait(0);        // Don't care about status

exec() - Running Programs

Purpose: Replace current process with new program
[!WARNING] Critical characteristic: Does NOT create new process, replaces current one!
Arguments:
  1. Path to executable file
  2. Array of string arguments (NULL-terminated)
File format: ELF (Executable and Linkable Format) Behavior:
  • Loads new program from file
  • Replaces process memory completely
  • Preserves: PID, file descriptors, open files
  • If successful: NEVER returns (new program runs)
  • If error: returns to caller
Example:
char *argv[3];
argv[0] = "echo";        // Program name (convention: argv[0] is the program name)
argv[1] = "hello";       // First real argument passed to the program
argv[2] = 0;             // NULL terminator -- exec() needs this to know where the array ends
exec("/bin/echo", argv); // Replaces this process's memory with /bin/echo's code
printf("exec error\n");  // This line only executes if exec() FAILED (returned -1).
                         // On success, the process is now running /bin/echo --
                         // the old code, stack, and heap are gone.
[!NOTE] argv[0] convention: First argument is program name (mostly ignored by program)

How Shell Uses These Calls

Shell main loop:
// Simplified shell structure
while(getcmd(buf, sizeof(buf)) >= 0) {
    if(fork() == 0) {
        // Child: run command
        runcmd(parsecmd(buf));
    }
    // Parent: wait for child
    wait(0);
}
For command echo hello:
  1. Shell forks
  2. Parent waits
  3. Child calls exec("/bin/echo", ["echo", "hello", 0])
  4. echo runs and calls exit()
  5. Parent’s wait() returns
  6. Shell ready for next command
Why fork + exec are separate:
  • Allows I/O redirection between fork() and exec(). The child can close, open, and dup file descriptors in the gap between fork and exec, customizing its I/O environment without affecting the parent.
  • Shell can modify child’s file descriptors before exec()
  • Parent’s I/O remains unchanged
This two-step design is one of the most influential decisions in Unix’s history. It means the shell — not the program — controls I/O routing. A program like cat does not need any code for handling redirection, pipes, or output capture. All of that is the shell’s job, using fork+exec with fd manipulation in between. Alternative (worse) designs:
  • Combined forkexec() call — awkward for I/O redirection because you would need to pass the full fd configuration as parameters
  • Shell modifies own I/O, then undoes — error-prone and not thread-safe
  • Every program handles own redirection — duplicated work across every utility

Memory Management

Implicit allocation:
  • fork() - Allocates memory for child’s copy
  • exec() - Allocates memory for new executable
Explicit allocation:
  • sbrk(n) - Grow data memory by n bytes
  • Returns location of new memory
  • Used by malloc() implementation
Optimization:
  • fork() uses copy-on-write (COW)
  • Doesn’t actually copy memory until modified
  • Avoids waste when exec() immediately follows fork()

I/O and File Descriptors

File Descriptor Concept

Definition: Small integer representing kernel-managed I/O object Can refer to:
  • Regular files
  • Directories
  • Devices (keyboard, screen, etc.)
  • Pipes
Key abstraction: Everything looks like stream of bytes How to obtain:
  • Open file/directory/device
  • Create pipe
  • Duplicate existing descriptor

File Descriptor Table

File Descriptor Table Per-process table:
  • Each process has private fd space
  • FDs start at 0
  • Kernel uses fd as index into table
Standard convention:
  • FD 0 = standard input (stdin)
  • FD 1 = standard output (stdout)
  • FD 2 = standard error (stderr)
Shell ensures: Always has FDs 0, 1, 2 open (default to console)

read() and write()

read(fd, buf, n):
  • Reads UP TO n bytes from fd
  • Copies into buf
  • Returns number of bytes actually read
  • Returns 0 at end of file
  • Each fd has file offset that advances automatically
write(fd, buf, n):
  • Writes n bytes from buf to fd
  • Returns number of bytes written
  • Less than n only on error
  • Offset advances automatically
Sequential access:
// First read gets bytes 0-99
read(fd, buf, 100);
// Next read gets bytes 100-199
read(fd, buf, 100);

cat Example - The Power of Abstraction

char buf[512];
int n;
for(;;) {
    n = read(0, buf, sizeof buf);  // Read from stdin
    if(n == 0)
        break;                      // EOF
    if(n < 0) {
        fprintf(2, "read error\n"); // Error to stderr
        exit(1);
    }
    if(write(1, buf, n) != n) {     // Write to stdout
        fprintf(2, "write error\n");
        exit(1);
    }
}
Why this is powerful:
  • cat does not know if it is reading from a file, the console, or a pipe
  • cat does not know if it is writing to a file, the console, or a pipe
  • Same code works for all cases — this is the Unix “everything is a file” philosophy in action
  • Shell controls actual I/O sources/destinations via fd manipulation
This is the reason the same cat binary works in all of these contexts without modification:
cat file.txt           # read from file, write to console
cat < input.txt        # read from redirected stdin
echo hello | cat       # read from pipe
cat > output.txt       # write to file instead of console
cat < in.txt | sort > out.txt  # pipe chain with redirection

File Descriptor Allocation

close(fd):
  • Releases file descriptor
  • Makes fd available for reuse
New fd allocation:
  • Always uses LOWEST unused number
  • This is critical for I/O redirection

I/O Redirection Mechanism

Example: cat < input.txt
char *argv[2];
argv[0] = "cat";
argv[1] = 0;

if(fork() == 0) {
    close(0);                      // Free fd 0
    open("input.txt", O_RDONLY);   // Gets fd 0 (lowest available)
    exec("cat", argv);             // cat reads from fd 0
}
Why this works:
  1. Child closes stdin (fd 0)
  2. open() gets fd 0 (lowest available)
  3. exec() preserves file descriptor table
  4. cat reads from fd 0, which now refers to input.txt
Parent unaffected: Only child’s fds are modified

open() Flags

Defined in fcntl.h:
  • O_RDONLY - Read only
  • O_WRONLY - Write only
  • O_RDWR - Read and write
  • O_CREATE - Create if doesn’t exist
  • O_TRUNC - Truncate to zero length
Combining flags: Use bitwise OR
fd = open("file", O_CREATE|O_WRONLY);

fork() and File Descriptors

Behavior: fork() copies file descriptor table Shared file offset example:
if(fork() == 0) {
    write(1, "hello ", 6);
    exit(0);
} else {
    wait(0);
    write(1, "world\n", 6);
}
Result: File contains "hello world" Why: Parent and child share same underlying file offset
  • Child writes "hello " at position 0
  • Offset advances to 6
  • Parent writes "world\n" at position 6
Use case: Sequential output from shell commands
(echo hello; echo world) > output.txt

dup() - Duplicating Descriptors

Purpose: Create new fd referring to same file Behavior:
  • Returns new fd (lowest available)
  • Both fds share offset (like fork())
Example:
fd = dup(1);
write(1, "hello ", 6);
write(fd, "world\n", 6);
Result: "hello world" (sequential writes) When offsets are shared:
  • Derived from same original fd by fork()/dup()
When offsets are NOT shared:
  • Separate open() calls, even for same file

Error Redirection

Command: ls existing non-existing > tmp1 2>&1 Meaning:
  • > tmp1 - Redirect stdout (fd 1) to tmp1
  • 2>&1 - Redirect stderr (fd 2) to duplicate of fd 1
Implementation:
close(1);
open("tmp1", O_CREATE|O_WRONLY);  // Gets fd 1
close(2);
dup(1);                            // Gets fd 2, refers to tmp1
Result: Both stdout and stderr go to tmp1

Pipes

Pipe Concept

Definition: Kernel buffer with two file descriptors
  • One fd for reading
  • One fd for writing
Purpose: Inter-process communication Pipe Communication Creation:
int p[2];
pipe(p);
// p[0] = read end
// p[1] = write end

Pipe Example - Running wc

int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;

pipe(p);              // Create the pipe: p[0]=read end, p[1]=write end
if(fork() == 0) {
    // Child -- will become `wc`, reading from the pipe
    close(0);          // Free fd 0 (stdin) so the next open/dup gets it
    dup(p[0]);         // Duplicate pipe read end -> gets fd 0 (stdin)
    close(p[0]);       // Close the original pipe read fd (now redundant)
    close(p[1]);       // CRITICAL: close write end so wc will see EOF
    exec("/bin/wc", argv);  // wc reads from fd 0, which is now the pipe
} else {
    // Parent -- writes data into the pipe
    close(p[0]);       // Close read end (parent won't read from pipe)
    write(p[1], "hello world\n", 12);  // Send data to child
    close(p[1]);       // Close write end -> triggers EOF on read end
}
Flow:
  1. Parent creates pipe
  2. Fork creates child with copy of pipe fds
  3. Child redirects stdin to pipe read end
  4. Child closes both pipe fds (already has copy at fd 0)
  5. Parent closes read end (won’t read)
  6. Parent writes data to pipe
  7. Parent closes write end (signals EOF)
  8. Child reads from stdin (pipe), processes, exits
[!IMPORTANT] Critical: Why close pipe fds?
  • Child must close write end before exec()
  • Otherwise wc would never see EOF (it has write end open)
  • If ANY process has write end open, read won’t return EOF

Pipe Blocking Behavior

When reading from pipe:
  • If data available: returns data
  • If no data AND write end open: blocks (waits for data)
  • If no data AND all write ends closed: returns 0 (EOF)
This is why closing is important!

Shell Pipeline Implementation

Command: grep fork sh.c | wc -l Process tree:
        shell
          |
          v
      pipe process
       /     \
      /       \
   grep      wc
Steps:
  1. Create pipe
  2. Fork for grep
    • Redirect stdout to pipe write end
    • exec grep
  3. Fork for wc
    • Redirect stdin to pipe read end
    • exec wc
  4. Wait for both children
Complex pipeline: a | b | c
         shell
           |
       process1
        /    \
       a   process2
            /    \
           b      c
Tree structure:
  • Leaves = commands
  • Interior nodes = processes managing pipes

Pipes vs Temporary Files

Command comparison:
# With pipe
echo hello world | wc

# With temp file
echo hello world > /tmp/xyz; wc < /tmp/xyz
Pipe advantages:
  1. Auto cleanup — No temp files to delete; the pipe buffer is reclaimed when both ends close
  2. Streaming — The writer does not need to finish before the reader starts. Data flows through the pipe in a FIFO stream, bounded by a kernel buffer (typically 64 KB on Linux). If the buffer is full, the writer blocks until the reader drains some data.
  3. Parallel execution — Both programs run simultaneously, overlapping CPU and I/O work. grep can start processing output from find while find is still traversing directories.
  4. Backpressure — If the consumer is slow, the pipe buffer fills up and the producer naturally pauses. This prevents unbounded memory growth without any explicit coordination.
Temp file disadvantages:
  1. Must manually clean up (and handle cleanup on errors/crashes)
  2. Need disk space for all data — a pipeline processing 10 TB would require 10 TB of temp storage
  3. Sequential execution (first program must finish before second starts)
  4. Security risk: temp files can be read by other processes unless permissions are carefully managed

File System

Structure

Hierarchical organization:
  • Tree of directories
  • Root directory: /
  • Data files: uninterpreted byte arrays
  • Directories: named references to files/directories
Path examples:
  • /a/b/c - Absolute path from root
  • a/b/c - Relative to current directory

Directory Navigation

Both open same file:
// Method 1
chdir("/a");
chdir("b");
open("c", O_RDONLY);

// Method 2
open("/a/b/c", O_RDONLY);
First method:
  • Changes process current directory to /a, then /a/b
  • Opens c relative to /a/b
Second method:
  • No directory change
  • Direct absolute path

Creating Files and Directories

mkdir("/dir");                        // Create directory
fd = open("/dir/file", O_CREATE|O_WRONLY);  // Create file
close(fd);
mknod("/console", 1, 1);             // Create device file
mknod creates device file:
  • First number = major device number
  • Second number = minor device number
  • Together uniquely identify kernel device
  • read()/write() calls diverted to device driver
Inode and Link Relationships Inode = actual file data structure Contains:
  • File type (file, directory, device)
  • File length
  • Location of content on disk
  • Number of links to file
Link = name for inode
[!IMPORTANT] Key insight: File name ≠ file itself
  • One inode can have multiple names (links)
  • Each link is directory entry: name + inode reference
Example:
open("a", O_CREATE|O_WRONLY);
link("a", "b");
Result:
  • One inode
  • Two names: "a" and "b"
  • Both refer to same content
  • Reading/writing "a" = reading/writing "b"

stat Structure

struct stat {
    int dev;        // File system's disk device
    uint ino;       // Inode number (unique)
    short type;     // T_DIR, T_FILE, or T_DEVICE
    short nlink;    // Number of links to file
    uint64 size;    // Size in bytes
};
Using fstat:
int fd = open("a", O_RDONLY);
struct stat st;
fstat(fd, &st);
printf("Inode: %d, Links: %d\n", st.ino, st.nlink);
After linking:
  • Both "a" and "b" have same ino
  • nlink = 2
Behavior:
  • Removes name from file system
  • Decrements link count
  • Inode and disk space freed ONLY when:
    • Link count = 0 AND
    • No file descriptors refer to it
Example sequence:
open("a", O_CREATE|O_WRONLY);
link("a", "b");
unlink("a");
Result:
  • Name "a" removed
  • File still accessible as "b"
  • Inode unchanged
Temporary file idiom:
fd = open("/tmp/xyz", O_CREATE|O_RDWR);
unlink("/tmp/xyz");
// File has no name now
// Will be deleted when process closes fd or exits

Built-in vs External Commands

External (user-level programs):
  • mkdir
  • ln
  • rm
  • ls
  • cat
  • Most commands
Why external:
  • Anyone can add new commands
  • No need to modify kernel/shell
  • Easier to extend system
Built-in (part of shell):
  • cd
Why cd is built-in:
// If cd was external:
if(fork() == 0) {
    // Child changes its own directory
    chdir("/new/dir");
    exit(0);
}
// Parent (shell) directory unchanged!
[!WARNING] Problem: cd must change SHELL’s directory, not child’s

System Call Flow

Understanding how system calls work is crucial for OS internals. System Call Flow

Tracing fork() from User to Kernel

Step 1: User Program Calls fork()
// user/sh.c
int pid = fork();
Step 2: User Library Wrapper (user/usys.S)
.global fork
fork:
    li a7, SYS_fork    # a7 = system call number (1)
    ecall              # Trap into kernel!
    ret                # Return with result in a0
Step 3: Trap Entry (kernel/trampoline.S)
uservec:
    # Save user registers
    csrw sscratch, a0
    sd ra, 40(a0)
    sd sp, 48(a0)
    # ... save all 32 registers ...
    
    # Load kernel page table
    ld t1, 0(a0)
    csrw satp, t1
    sfence.vma zero, zero
    
    # Jump to usertrap() in C
    jr t0
Step 4: Trap Handler (kernel/trap.c)
void usertrap(void)
{
  struct proc *p = myproc();
  p->trapframe->epc = r_sepc();

  if(r_scause() == 8){
    // System call
    p->trapframe->epc += 4;   // Advance PC past ecall
    intr_on();
    syscall();                 // Handle the system call!
  }

  usertrapret();
}
Step 5: System Call Dispatcher (kernel/syscall.c)
void syscall(void)
{
  int num;
  struct proc *p = myproc();

  num = p->trapframe->a7;      // Get syscall number
  
  if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
    p->trapframe->a0 = syscalls[num]();  // Call handler
  } else {
    printf("%d %s: unknown sys call %d\n", p->pid, p->name, num);
    p->trapframe->a0 = -1;
  }
}
Step 6: Fork Implementation (kernel/proc.c)
int fork(void)
{
  struct proc *np;
  struct proc *p = myproc();

  // Allocate process
  if((np = allocproc()) == 0)
    return -1;

  // Copy user memory from parent to child
  if(uvmcopy(p->pagetable, np->pagetable, p->sz) < 0){
    freeproc(np);
    return -1;
  }
  np->sz = p->sz;

  // Copy saved user registers
  *(np->trapframe) = *(p->trapframe);
  np->trapframe->a0 = 0;  // Child returns 0

  // Copy open file descriptors
  for(int i = 0; i < NOFILE; i++)
    if(p->ofile[i])
      np->ofile[i] = filedup(p->ofile[i]);

  np->state = RUNNABLE;
  return np->pid;  // Parent returns child's PID
}

Real World Context

Unix Philosophy

“Software tools” culture:
  • Small programs doing one thing well
  • Combine via pipes
  • Shell as “scripting language”
Key innovations:
  • Standard file descriptors (0, 1, 2)
  • Pipes for composition
  • Simple but powerful shell syntax

POSIX Standard

Purpose: Standardize Unix system call interface xv6 vs POSIX:
  • xv6 NOT POSIX compliant
  • Missing: lseek(), many other calls
  • Different implementations of existing calls
  • Goal: Simplicity for teaching
Modern systems:
  • Many more system calls
  • Networking, windowing, threads
  • Many device drivers
  • Continuous evolution

Alternative Designs

Plan 9:
  • Extended “everything is a file” concept
  • Applied to networks, graphics
  • Most Unix systems didn’t follow
Multics:
  • Predecessor of Unix
  • Files looked like memory
  • Very different interface
  • Complexity influenced Unix designers to simplify
Other OS models:
  • Different abstractions possible
  • File descriptors not only solution
  • Unix model proven very successful

xv6 Limitations

No user protection:
  • All processes run as root
  • No isolation between users
  • Teaching-focused tradeoff
Limited features:
  • Minimal system calls
  • No networking
  • No windowing
  • Basic functionality only

Key Takeaways

Design Principles

Simple Interfaces

Few powerful mechanisms that combine for generality

Hardware Abstraction

Hide hardware details behind clean interfaces

Process Isolation

Processes isolated from each other for safety

Controlled Communication

Explicit mechanisms for safe interaction

Core Abstractions

  1. Processes - Unit of execution with isolated memory
  2. File descriptors - Unified I/O interface for files, pipes, devices
  3. Pipes - Communication channels between processes
  4. File system - Persistent storage hierarchy with inodes and links

Why These Abstractions Work

File descriptors:
  • Hide differences (files, pipes, devices)
  • Enable I/O redirection
  • Simple but powerful
fork/exec separation:
  • Enables I/O redirection
  • Shell controls child environment
  • Parent unaffected
Pipes:
  • Clean inter-process communication
  • Better than temp files
  • Enable parallel execution
Inodes/links:
  • Separate names from content
  • Multiple names for same file
  • Automatic cleanup when unused

Getting Started with xv6

Installation (Ubuntu/Debian)

# Install dependencies
sudo apt-get install git build-essential gdb-multiarch \
  qemu-system-misc gcc-riscv64-linux-gnu binutils-riscv64-linux-gnu 

# Clone xv6
git clone git://github.com/mit-pdos/xv6-riscv.git
cd xv6-riscv

# Build and run
make qemu

# You'll see xv6 boot and get a shell prompt:
$ 

First Commands to Try

$ ls              # List files
$ cat README      # Show file contents  
$ echo hello xv6  # Print text
$ grep the README # Search in file
$ wc README       # Word count
$ mkdir test      # Create directory
$ sh              # Start new shell

Caveats and Common Pitfalls

xv6 is one of the best ways to learn how an operating system actually works. It is also one of the easiest ways to develop misleading mental models of how real operating systems work. The simplifications that make xv6 readable in 9,000 lines also make it dangerously different from production code. Treat xv6 as a teaching diagram, not a blueprint.
Traps when applying xv6 lessons to real systems:
  1. xv6 is teaching code, not production code. It deliberately omits everything that makes real OS code complex: SMP-aware schedulers (xv6’s scheduler is essentially per-CPU round robin, no load balancing), modern filesystems (xv6’s FS is a simple log-structured layout, no journaling at the level of ext4 or btrfs, no extents, no compression), interrupt coalescing, NUMA awareness, security hardening, or any of the kernel hardening that consumes significant fractions of Linux. If you are studying xv6 and a Linux kernel bug at the same time, do not assume the xv6 mechanism applies.
  2. xv6 is different from real Linux in critical ways — do not transfer assumptions blindly. The system call table, virtual memory layout, scheduler, file system, and interrupt handling are all simplified. xv6 has a single page table per process; Linux has KPTI page tables (split user/kernel) for Meltdown mitigation. xv6 fork is a literal page-by-page copy; Linux fork is COW. xv6 has no preemptive in-kernel preemption; Linux has CONFIG_PREEMPT options that radically change scheduling behavior. Reading xv6 source and assuming “this is how Linux does it” is a recipe for confidently wrong answers in interviews.
  3. RISC-V vs x86 versions of xv6 diverge — pick the one your course aligns with. The original xv6 was x86; the modern MIT version is RISC-V. The two have different boot flows, different trap mechanisms, different privilege models (RISC-V has M/S/U modes, x86 has rings), and different conventions. If your course or job interview is on RISC-V (current MIT 6.S081/6.1810), study the RISC-V version. If it is on x86 (older xv6, many third-party forks), study that one. Mixing them creates confusion that hurts you in interviews.
  4. xv6’s locking is subtler than it looks. xv6 uses spinlocks and disables interrupts inside critical sections, which is correct but creates surprising properties: you cannot sleep with a lock held (would deadlock), and lock acquisition order matters (xv6 has a documented lock ordering). Real systems have RCU, lockdep, sleeping locks, and many more primitives. Do not assume “if xv6 does it this way, it is the right way” — it is the simple way that fits in a teaching kernel.
How to use xv6 effectively:
  • Use xv6 to build mechanism intuition, then read Linux source for production patterns. xv6 teaches you what a page table looks like, how a context switch works, and what fork actually does at the lowest level. Once that is internalized, read arch/x86/mm/fault.c or kernel/sched/fair.c in Linux to see what production code looks like. The contrast is the lesson.
  • Match your xv6 version to your target course or interview. MIT 6.S081 is RISC-V; many older textbooks use x86. Decide which platform matters for your goals and stick with it. If both, learn one fully before touching the other.
  • Treat xv6 modifications as the real exercise. Reading xv6 teaches some; modifying it (adding COW fork, lazy allocation, mmap, multithreading inside a process) teaches everything. The MIT 6.S081 lab assignments are public and excellent. Do them; do not just read solutions.
  • When asked about Linux in an interview, do not reflexively answer with xv6 details. “In xv6, fork copies every page” is true and irrelevant if the interviewer asked about Linux. State “in real Linux this uses copy-on-write; in the simpler xv6 teaching kernel, it is a literal copy” to show you know the difference.
  • Read the xv6 book alongside the source. The xv6 book is one of the best OS texts ever written — ~150 pages of clear prose explaining every line. Reading source without the book is much harder than reading both together.

Interview Deep-Dive

Strong Answer Framework:
  1. xv6’s scheduler is per-CPU round robin. Each CPU independently runs scheduler() in proc.c. It walks the process table looking for a RUNNABLE process and runs it. There is no priority, no fairness accounting, no virtual runtime, no load balancing across CPUs, no I/O priority boost, no sleep-time accounting. A long-running CPU-bound process gets exactly the same share as an interactive shell.
  2. Linux CFS tracks vruntime (virtual runtime). Each task has a vruntime that increments based on actual CPU time consumed, weighted by nice value. The scheduler picks the task with the lowest vruntime, stored in a red-black tree for O(log n) selection. Tasks that sleep accumulate vruntime slowly while sleeping (capped to prevent infinite catch-up), so they get a “wakeup boost” that makes them latency-friendly.
  3. What is missing in xv6. Fairness across nice values, load balancing across CPUs (xv6 actively wastes CPU time when one CPU is idle and another has multiple runnable tasks), CPU affinity hints, group scheduling (cgroups), real-time scheduling classes (SCHED_FIFO / SCHED_RR / SCHED_DEADLINE), and any I/O latency awareness. The scheduler is also not preemptive in the kernel — once a task is running in kernel mode in xv6, it runs until it voluntarily yields.
  4. What Linux gains. Linux CFS handles workloads from a 4-thread embedded device to a 256-CPU server with thousands of tasks. The complexity buys: predictable interactive latency under load, fairness guarantees that hold over time (not just instantaneously), the ability to express priority via cgroups for containers, and the ability to mix latency-sensitive and batch workloads on the same machine.
  5. What Linux trades. Complexity (CFS is several thousand lines vs xv6’s hundred), debugging difficulty (vruntime quirks like the “wakeup buddy” heuristic took years to tune), and edge cases (CFS bandwidth control’s period-boundary throttle problem in containers). Simpler is sometimes a feature — which is why ULE (FreeBSD), CFS, and EEVDF have all evolved over decades.
Real-World Example: In 2023, Linux 6.6 replaced CFS with EEVDF (Earliest Eligible Virtual Deadline First, by Peter Zijlstra). EEVDF was actually proposed in academic literature in 1995 — it took 28 years for the kernel to adopt it because CFS was “good enough” for most workloads, and the community was conservative about scheduler changes. The trigger was that CFS had accumulated too many heuristics for tail-latency tuning, and EEVDF’s deadline model unified them more cleanly. The lesson: scheduler design is hard, and even good ideas take decades to ship.
Senior follow-up 1: “What is the ‘thundering herd’ problem in scheduling, and does xv6 have it?” When many tasks block on one event and all wake at once, they all become RUNNABLE simultaneously. xv6 has it (sleep/wakeup wakes all waiters); Linux mitigates it via wait queue exclusivity flags and EPOLLEXCLUSIVE.
Senior follow-up 2: “Why does Linux have separate scheduling classes (CFS, RT, deadline, idle) instead of one unified scheduler?” Different workloads have fundamentally incompatible requirements. Real-time tasks need bounded worst-case latency; CFS provides fairness; idle tasks should never run when anything else is runnable. A unified scheduler tuned for one class compromises the others. The class hierarchy lets each class implement what it needs.
Senior follow-up 3: “What is preemption disabling in the kernel and why does it matter?” When kernel code holds a spinlock or runs in interrupt context, it cannot be preempted — the scheduler will not switch tasks. Long-held preemption-disabled sections cause latency spikes. CONFIG_PREEMPT_RT (recently merged) replaces most spinlocks with sleeping mutexes to allow preemption almost everywhere, dramatically reducing worst-case latency at some throughput cost.
Common Wrong Answers:
  1. “xv6’s scheduler is the same as Linux, just smaller.” Wrong — the algorithm is fundamentally different (round-robin vs vruntime-based). Saying “same but smaller” misses the point that fairness is an algorithmic property, not a code-size property.
  2. “Linux CFS is fair to the millisecond.” Misleading — CFS targets long-run fairness, not instantaneous fairness. Over a few milliseconds, one task can dominate due to wakeup heuristics; over seconds, weights are honored.
Further Reading:
  • “Operating Systems: Three Easy Pieces”, chapter on scheduling — best free scheduling textbook.
  • Documentation/scheduler/sched-design-CFS.rst in the Linux kernel source.
  • LWN article “An EEVDF CPU scheduler for Linux” (2023) — explains the CFS-to-EEVDF transition.
Strong Answer Framework:
  1. The disk layout. xv6 divides the disk into named regions: boot block (block 0), super block (block 1, contains FS metadata), log blocks (for crash recovery), inode blocks (each block holds several inodes), bitmap blocks (track which data blocks are free), and data blocks (the actual file contents). The super block tells you where each region starts.
  2. Inodes. Every file or directory is an inode — a fixed-size structure containing type (file/dir/device), size, and a list of block pointers. xv6 has 12 direct pointers and 1 indirect pointer (which points to a block of more pointers), giving max file size ~70KB. Real Linux ext4 uses extents and triple-indirect blocks for files in the terabyte range, but the inode concept is identical.
  3. Directory entries. A directory in xv6 is just a file whose contents are an array of struct dirent { ushort inum; char name[14]; }. Looking up a name means scanning the directory’s data blocks for a matching name and returning the inode number.
  4. Resolving /a/b.txt. Start at inode 1 (root directory, hardcoded). Read its data, scan for “a”, find inum N. Read inode N (must be a directory), read its data, scan for “b.txt”, find inum M. Read inode M — this is the file. Each inode read may require a disk I/O (cached in the inode cache after first read).
  5. The block cache and log. Every disk block goes through a buffer cache (buf.c) for performance. Writes go through a write-ahead log: changes are written to the log first, then applied to actual locations after commit. On crash, replay the log to recover. This gives crash consistency similar to ext4 with journaling.
  6. The path lookup function. In xv6, namei() and nameiparent() walk paths. Their pseudocode: split path on /, start at root (or cwd), repeat: get current inode, lock it, look up next component, follow to next inode. Locking matters because two threads renaming files concurrently could deadlock without ordered acquisition.
Real-World Example: In 2018, a CVE in ext4 (CVE-2018-1093) was found in path traversal: a crafted directory entry with specific bytes could cause the kernel to read past the end of a directory block. The fix was a few lines of bounds checking in ext4_dx_find_entry(). The lesson: path resolution is one of the most performance-critical and security-critical paths in any kernel, and the simple xv6 implementation glosses over many edge cases (Unicode normalization, race conditions, symlinks, mount points). The MIT 6.S081 fs lab walks through exactly these issues at a teaching scale.
Senior follow-up 1: “What happens if the system crashes in the middle of writing a directory entry?” Without journaling, you could see a half-written entry — corrupt filesystem. xv6’s log handles this by writing all changes to the log first, then replaying on boot. ext4 does the same with its journal. ZFS/btrfs use copy-on-write so old data is never overwritten until new data is fully durable.
Senior follow-up 2: “How does xv6 handle the inode cache and reference counting?” Each in-memory inode has a reference count. iget increments it; iput decrements and may free if zero. Locking is separate from reference counting — you can hold a reference without holding the lock. Real Linux dcache/icache works similarly but is much more complex due to RCU.
Senior follow-up 3: “What is missing from xv6’s filesystem compared to a real production FS?” No journaling at the metadata level (xv6’s log is per-syscall, not full FS journaling), no extents (extents pack large contiguous ranges into one descriptor), no soft updates or delayed allocation, no copy-on-write snapshots, no compression, no encryption, no extended attributes, no quotas. Each of these features represents a decade of FS research.
Common Wrong Answers:
  1. “xv6 uses ext2-style inodes.” Sort of — ext2 inspired the design, but xv6’s inode is much simpler (12 direct + 1 indirect, no triple indirect, no extended attributes). Saying “ext2-style” without acknowledging the simplifications shows shallow understanding.
  2. “Path resolution is just a series of lookups.” Strictly true but misses the locking, caching, and crash-recovery layers that make real path resolution work. A senior engineer should mention all three.
Further Reading:
  • The xv6 book, chapter 8 on the file system — one of the best filesystem tutorials in print.
  • “Operating Systems: Three Easy Pieces”, chapters on file systems and FFS/LFS.
  • ext4 Wikipedia article and Documentation/filesystems/ext4.rst in Linux source for comparison.
Strong Answer Framework:
  1. What COW fork does. Instead of copying every page on fork (xv6 default), mark all pages read-only in both parent and child page tables. When either writes, take a page fault, copy that page, mark it writable, restore the writer. Most pages are never written before exec(), so most pages are never copied — huge speedup for fork+exec.
  2. Files to modify. kernel/vm.c (page table walk and protection bits), kernel/proc.c (the fork() syscall implementation), kernel/trap.c (the page fault handler — need to detect write-to-RO and handle it), and kernel/kalloc.c (need a reference count per physical page so we know when to free).
  3. The reference counting problem. When parent and child share a page, we cannot free it until both are done. Add an array indexed by physical frame number, with a reference count. Increment on share, decrement on unshare. Free when count hits zero. The reference count must be lock-protected (xv6 is multi-CPU).
  4. The page-fault handler. On a write fault to a RO page that is shared (refcount > 1), allocate a new page, copy contents, decrement old page’s refcount, install new page in the faulting process’s page table as writable. If refcount is exactly 1 (we are the only user), just flip RO to RW — no copy needed. This is the optimization that makes COW efficient.
  5. Gotchas. (a) The kernel itself sometimes touches user pages (copyin, copyout) — these need COW handling too, or they will silently write to a shared page. (b) fork() of a fork()ed child means a page might have refcount 3 — the chain of COW must work correctly. (c) Stacks grow via faults — the COW path must not confuse stack growth faults with COW faults. (d) Race condition: two threads in the parent could both write to the same shared page simultaneously and both take a fault — locking is essential.
Real-World Example: Linux’s COW implementation has had several major CVEs — most famously Dirty COW (CVE-2016-5195), discovered by Phil Oester in 2016. The bug was a race between madvise(MADV_DONTNEED) and the COW page-fault handler that allowed a non-privileged user to write to read-only memory mappings. Even root files like /etc/passwd became writable. The fix was a synchronization change in mm/gup.c. Every modern kernel has been fixed, but the lesson is that COW logic is subtle even after 30 years of refinement. If you implement COW in xv6, expect your first version to have at least one race.
Senior follow-up 1: “How does the MIT 6.S081 ‘cowgc’ (COW garbage collection) lab differ from a real Linux implementation?” The lab uses a per-physical-page refcount array, similar to Linux. The real Linux uses struct page (one per physical page) with a refcount field, plus a separate mapcount for “mapped into how many user page tables.” The two-counter approach handles file-mapped pages (where many processes map the same page-cache page) correctly.
Senior follow-up 2: “Can COW pages cause OOM in unexpected ways?” Yes — if many children fork from a parent that allocates a large array, then each child writes to different parts of it, every write triggers a copy. The system can run out of memory even though logically nothing was allocated. The fix is madvise(MADV_DONTFORK) to mark large mappings as not inherited, or vfork() for the case where the child immediately execs.
Senior follow-up 3: “How does COW interact with the TLB?” When you flip a page from RW to RO, you must invalidate that TLB entry on every CPU that might have it cached. This is a TLB shootdown — expensive on multi-CPU systems. xv6’s sfence.vma handles this for RISC-V; Linux uses inter-processor interrupts for x86 TLB shootdowns. High fork rates can become TLB-shootdown bound.
Common Wrong Answers:
  1. “Just mark pages read-only and copy on fault — done.” Skips reference counting, which means you cannot tell when to free. A naive implementation leaks every COW-shared page.
  2. “COW always saves memory.” Not always — if the child writes to most pages (which is common in some long-lived child processes), you pay the fault overhead and end up copying anyway. The savings are biggest when the child immediately execs.
Further Reading:
  • MIT 6.S081 “cowgc” lab handout (free at pdos.csail.mit.edu) — step-by-step COW implementation.
  • Linux kernel source mm/memory.c, function do_wp_page() — the production COW handler.
  • “Dirty COW” CVE write-ups — excellent for understanding the subtlety of COW races.

Interview Relevance

Understanding OS interfaces helps you:
  • Design better APIs and abstractions
  • Understand performance implications
  • Make informed technology choices
  • Debug production issues
  • How does fork() work? What’s copy-on-write?
  • Explain file descriptors and I/O redirection
  • How do pipes work? When to use them?
  • What’s the difference between hard links and symbolic links?
  • How does the shell implement pipelines?
  • Explain the system call mechanism
  • Understand trade-offs in OS design
  • Explain why Unix chose these abstractions
  • Compare with alternative designs
  • Discuss performance implications
  • Debug issues at the system call level

Resources

Official Materials

Next Steps

Process Management

Deep dive into process lifecycle, scheduling, and context switching

Virtual Memory

Understand paging, page tables, and memory management

File Systems

Learn about inodes, directories, and file system implementation

Linux Internals

Explore how these concepts scale in production systems

[!TIP] Learning Path: Start by reading the xv6 book alongside the source code. Try modifying xv6 to add features or change behavior. This hands-on experience is invaluable for truly understanding operating systems.