Skip to main content

Operating System Interfaces - Complete Study Guide

Learn operating system fundamentals by understanding the xv6 teaching operating system - a simple, Unix-like OS that demonstrates core OS concepts in ~9,000 lines of readable code.

Core Concept: What is an Operating System?

An operating system has three primary jobs:

Abstract Hardware

Programs don’t need to know specific hardware details (e.g., which disk type, GPU model)

Share Resources

Multiple programs run simultaneously (or appear to) by multiplexing CPU, memory, and I/O

Enable Interaction

Programs can communicate and share data safely through controlled mechanisms

Key Design Tension

[!IMPORTANT] The Interface Dilemma
  • Simple/narrow interface = easier to implement correctly, but limited features
  • Complex interface = more features but harder to maintain and secure
  • Solution: Few powerful mechanisms that combine for generality

The xv6 Operating System

What is xv6?

xv6 is a teaching OS based on Unix design by Ken Thompson & Dennis Ritchie. It provides basic Unix interfaces and helps you understand modern operating systems like BSD, Linux, macOS, Solaris, and even Windows. Key Features:
  • Small enough to understand completely (~9,000 lines)
  • Real enough to be useful (actual working OS)
  • Provides essential Unix interfaces
  • Runs on RISC-V architecture

Architecture Overview

User Space vs Kernel Space Architecture Key Terms:
  • Process: Running program with memory (instructions, data, stack)
  • Kernel: Special program providing services to processes
  • System Call: Process requests kernel service (e.g., fork(), open(), read())
  • User Space: Normal programs run here (limited privileges)
  • Kernel Space: Kernel runs here (full hardware access)
Process alternates: User space → System call → Kernel space → Back to user space

System Calls Reference

Process Management

System CallDescriptionReturn Value
fork()Create new processChild PID to parent, 0 to child
exit(status)Terminate processNo return (0=success, 1=failure)
wait(*status)Wait for child exitChild PID or -1
kill(pid)Terminate process0 or -1
getpid()Get current process PIDPID
sleep(n)Pause for n clock ticks0
exec(file, argv[])Replace process with new programNo return on success, -1 on error
sbrk(n)Grow memory by n bytesAddress of new memory

File Operations

System CallDescriptionReturn Value
open(file, flags)Open fileFile descriptor (fd) or -1
read(fd, buf, n)Read n bytesBytes read or 0 at EOF
write(fd, buf, n)Write n bytesBytes written
close(fd)Release file descriptor0 or -1
dup(fd)Duplicate fdNew fd to same file
pipe(p[])Create pipe0 (p[0]=read, p[1]=write)

File System

System CallDescriptionReturn Value
chdir(dir)Change current directory0 or -1
mkdir(dir)Create directory0 or -1
mknod(file, major, minor)Create device file0 or -1
fstat(fd, *st)Get file info from fd0 or -1
stat(file, *st)Get file info from path0 or -1
link(file1, file2)Create new name for file0 or -1
unlink(file)Remove file name0 or -1
[!NOTE] Convention: Unless stated otherwise, system calls return 0 for success and -1 for error.

Processes and Memory

Process Structure

Each process has:
  • User-space memory: instructions + data + stack
  • Kernel-private state: CPU registers, PID, etc.
  • PID: Process identifier (unique number)

fork() - Creating Processes

The fork() system call creates an exact copy of the parent process. Behavior:
  1. Creates exact copy of parent process memory
  2. Both processes continue execution after fork()
  3. Only difference: return value
Return values:
  • Parent gets: child’s PID (positive number)
  • Child gets: 0
  • Error: -1
Fork and Exec Process Flow Example:
int pid = fork();
if(pid > 0) {
    // Parent code
    printf("parent: child=%d\n", pid);
    pid = wait((int *) 0);
    printf("child %d is done\n", pid);
} else if(pid == 0) {
    // Child code
    printf("child: exiting\n");
    exit(0);
} else {
    // Error
    printf("fork error\n");
}
Output order unpredictable:
parent: child=1234
child: exiting
OR
child: exiting
parent: child=1234
Then:
parent: child 1234 is done
[!IMPORTANT] Critical Points:
  • Parent and child have SEPARATE memory
  • Changing variable in one doesn’t affect other
  • Both start with identical memory contents
  • Each has separate registers

wait() - Synchronizing Processes

Purpose: Parent waits for child to finish Behavior:
  • Returns PID of exited child
  • Copies child’s exit status to provided address
  • If no children exited yet, blocks until one does
  • If no children exist, returns -1 immediately
Usage:
int status;
int pid = wait(&status);  // Get exit status
// OR
int pid = wait(0);        // Don't care about status

exec() - Running Programs

Purpose: Replace current process with new program
[!WARNING] Critical characteristic: Does NOT create new process, replaces current one!
Arguments:
  1. Path to executable file
  2. Array of string arguments (NULL-terminated)
File format: ELF (Executable and Linkable Format) Behavior:
  • Loads new program from file
  • Replaces process memory completely
  • Preserves: PID, file descriptors, open files
  • If successful: NEVER returns (new program runs)
  • If error: returns to caller
Example:
char *argv[3];
argv[0] = "echo";        // Program name (convention)
argv[1] = "hello";       // Argument
argv[2] = 0;             // NULL terminator
exec("/bin/echo", argv);
printf("exec error\n");  // Only prints if exec fails
[!NOTE] argv[0] convention: First argument is program name (mostly ignored by program)

How Shell Uses These Calls

Shell main loop:
// Simplified shell structure
while(getcmd(buf, sizeof(buf)) >= 0) {
    if(fork() == 0) {
        // Child: run command
        runcmd(parsecmd(buf));
    }
    // Parent: wait for child
    wait(0);
}
For command echo hello:
  1. Shell forks
  2. Parent waits
  3. Child calls exec("/bin/echo", ["echo", "hello", 0])
  4. echo runs and calls exit()
  5. Parent’s wait() returns
  6. Shell ready for next command
Why fork + exec are separate:
  • Allows I/O redirection between fork() and exec()
  • Shell can modify child’s file descriptors before exec()
  • Parent’s I/O remains unchanged
Alternative (worse) designs:
  • Combined forkexec() call - awkward for I/O redirection
  • Shell modifies own I/O, then undoes - error-prone
  • Every program handles own redirection - duplicated work

Memory Management

Implicit allocation:
  • fork() - Allocates memory for child’s copy
  • exec() - Allocates memory for new executable
Explicit allocation:
  • sbrk(n) - Grow data memory by n bytes
  • Returns location of new memory
  • Used by malloc() implementation
Optimization:
  • fork() uses copy-on-write (COW)
  • Doesn’t actually copy memory until modified
  • Avoids waste when exec() immediately follows fork()

I/O and File Descriptors

File Descriptor Concept

Definition: Small integer representing kernel-managed I/O object Can refer to:
  • Regular files
  • Directories
  • Devices (keyboard, screen, etc.)
  • Pipes
Key abstraction: Everything looks like stream of bytes How to obtain:
  • Open file/directory/device
  • Create pipe
  • Duplicate existing descriptor

File Descriptor Table

File Descriptor Table Per-process table:
  • Each process has private fd space
  • FDs start at 0
  • Kernel uses fd as index into table
Standard convention:
  • FD 0 = standard input (stdin)
  • FD 1 = standard output (stdout)
  • FD 2 = standard error (stderr)
Shell ensures: Always has FDs 0, 1, 2 open (default to console)

read() and write()

read(fd, buf, n):
  • Reads UP TO n bytes from fd
  • Copies into buf
  • Returns number of bytes actually read
  • Returns 0 at end of file
  • Each fd has file offset that advances automatically
write(fd, buf, n):
  • Writes n bytes from buf to fd
  • Returns number of bytes written
  • Less than n only on error
  • Offset advances automatically
Sequential access:
// First read gets bytes 0-99
read(fd, buf, 100);
// Next read gets bytes 100-199
read(fd, buf, 100);

cat Example - The Power of Abstraction

char buf[512];
int n;
for(;;) {
    n = read(0, buf, sizeof buf);  // Read from stdin
    if(n == 0)
        break;                      // EOF
    if(n < 0) {
        fprintf(2, "read error\n"); // Error to stderr
        exit(1);
    }
    if(write(1, buf, n) != n) {     // Write to stdout
        fprintf(2, "write error\n");
        exit(1);
    }
}
Why this is powerful:
  • cat doesn’t know if reading from file, console, or pipe
  • cat doesn’t know if writing to file, console, or pipe
  • Same code works for all cases
  • Shell controls actual I/O sources/destinations

File Descriptor Allocation

close(fd):
  • Releases file descriptor
  • Makes fd available for reuse
New fd allocation:
  • Always uses LOWEST unused number
  • This is critical for I/O redirection

I/O Redirection Mechanism

Example: cat < input.txt
char *argv[2];
argv[0] = "cat";
argv[1] = 0;

if(fork() == 0) {
    close(0);                      // Free fd 0
    open("input.txt", O_RDONLY);   // Gets fd 0 (lowest available)
    exec("cat", argv);             // cat reads from fd 0
}
Why this works:
  1. Child closes stdin (fd 0)
  2. open() gets fd 0 (lowest available)
  3. exec() preserves file descriptor table
  4. cat reads from fd 0, which now refers to input.txt
Parent unaffected: Only child’s fds are modified

open() Flags

Defined in fcntl.h:
  • O_RDONLY - Read only
  • O_WRONLY - Write only
  • O_RDWR - Read and write
  • O_CREATE - Create if doesn’t exist
  • O_TRUNC - Truncate to zero length
Combining flags: Use bitwise OR
fd = open("file", O_CREATE|O_WRONLY);

fork() and File Descriptors

Behavior: fork() copies file descriptor table Shared file offset example:
if(fork() == 0) {
    write(1, "hello ", 6);
    exit(0);
} else {
    wait(0);
    write(1, "world\n", 6);
}
Result: File contains "hello world" Why: Parent and child share same underlying file offset
  • Child writes "hello " at position 0
  • Offset advances to 6
  • Parent writes "world\n" at position 6
Use case: Sequential output from shell commands
(echo hello; echo world) > output.txt

dup() - Duplicating Descriptors

Purpose: Create new fd referring to same file Behavior:
  • Returns new fd (lowest available)
  • Both fds share offset (like fork())
Example:
fd = dup(1);
write(1, "hello ", 6);
write(fd, "world\n", 6);
Result: "hello world" (sequential writes) When offsets are shared:
  • Derived from same original fd by fork()/dup()
When offsets are NOT shared:
  • Separate open() calls, even for same file

Error Redirection

Command: ls existing non-existing > tmp1 2>&1 Meaning:
  • > tmp1 - Redirect stdout (fd 1) to tmp1
  • 2>&1 - Redirect stderr (fd 2) to duplicate of fd 1
Implementation:
close(1);
open("tmp1", O_CREATE|O_WRONLY);  // Gets fd 1
close(2);
dup(1);                            // Gets fd 2, refers to tmp1
Result: Both stdout and stderr go to tmp1

Pipes

Pipe Concept

Definition: Kernel buffer with two file descriptors
  • One fd for reading
  • One fd for writing
Purpose: Inter-process communication Pipe Communication Creation:
int p[2];
pipe(p);
// p[0] = read end
// p[1] = write end

Pipe Example - Running wc

int p[2];
char *argv[2];
argv[0] = "wc";
argv[1] = 0;

pipe(p);
if(fork() == 0) {
    // Child
    close(0);          // Close stdin
    dup(p[0]);         // stdin now reads from pipe
    close(p[0]);       // Close original pipe read fd
    close(p[1]);       // Close pipe write fd
    exec("/bin/wc", argv);
} else {
    // Parent
    close(p[0]);       // Close pipe read end
    write(p[1], "hello world\n", 12);
    close(p[1]);       // Signal EOF to child
}
Flow:
  1. Parent creates pipe
  2. Fork creates child with copy of pipe fds
  3. Child redirects stdin to pipe read end
  4. Child closes both pipe fds (already has copy at fd 0)
  5. Parent closes read end (won’t read)
  6. Parent writes data to pipe
  7. Parent closes write end (signals EOF)
  8. Child reads from stdin (pipe), processes, exits
[!IMPORTANT] Critical: Why close pipe fds?
  • Child must close write end before exec()
  • Otherwise wc would never see EOF (it has write end open)
  • If ANY process has write end open, read won’t return EOF

Pipe Blocking Behavior

When reading from pipe:
  • If data available: returns data
  • If no data AND write end open: blocks (waits for data)
  • If no data AND all write ends closed: returns 0 (EOF)
This is why closing is important!

Shell Pipeline Implementation

Command: grep fork sh.c | wc -l Process tree:
        shell
          |
          v
      pipe process
       /     \
      /       \
   grep      wc
Steps:
  1. Create pipe
  2. Fork for grep
    • Redirect stdout to pipe write end
    • exec grep
  3. Fork for wc
    • Redirect stdin to pipe read end
    • exec wc
  4. Wait for both children
Complex pipeline: a | b | c
         shell
           |
       process1
        /    \
       a   process2
            /    \
           b      c
Tree structure:
  • Leaves = commands
  • Interior nodes = processes managing pipes

Pipes vs Temporary Files

Command comparison:
# With pipe
echo hello world | wc

# With temp file
echo hello world > /tmp/xyz; wc < /tmp/xyz
Pipe advantages:
  1. Auto cleanup - No temp files to delete
  2. Unlimited size - Not limited by disk space
  3. Parallel execution - Both programs run simultaneously
Temp file disadvantages:
  1. Must manually clean up
  2. Need disk space for all data
  3. Sequential execution (first program must finish)

File System

Structure

Hierarchical organization:
  • Tree of directories
  • Root directory: /
  • Data files: uninterpreted byte arrays
  • Directories: named references to files/directories
Path examples:
  • /a/b/c - Absolute path from root
  • a/b/c - Relative to current directory

Directory Navigation

Both open same file:
// Method 1
chdir("/a");
chdir("b");
open("c", O_RDONLY);

// Method 2
open("/a/b/c", O_RDONLY);
First method:
  • Changes process current directory to /a, then /a/b
  • Opens c relative to /a/b
Second method:
  • No directory change
  • Direct absolute path

Creating Files and Directories

mkdir("/dir");                        // Create directory
fd = open("/dir/file", O_CREATE|O_WRONLY);  // Create file
close(fd);
mknod("/console", 1, 1);             // Create device file
mknod creates device file:
  • First number = major device number
  • Second number = minor device number
  • Together uniquely identify kernel device
  • read()/write() calls diverted to device driver
Inode and Link Relationships Inode = actual file data structure Contains:
  • File type (file, directory, device)
  • File length
  • Location of content on disk
  • Number of links to file
Link = name for inode
[!IMPORTANT] Key insight: File name ≠ file itself
  • One inode can have multiple names (links)
  • Each link is directory entry: name + inode reference
Example:
open("a", O_CREATE|O_WRONLY);
link("a", "b");
Result:
  • One inode
  • Two names: "a" and "b"
  • Both refer to same content
  • Reading/writing "a" = reading/writing "b"

stat Structure

struct stat {
    int dev;        // File system's disk device
    uint ino;       // Inode number (unique)
    short type;     // T_DIR, T_FILE, or T_DEVICE
    short nlink;    // Number of links to file
    uint64 size;    // Size in bytes
};
Using fstat:
int fd = open("a", O_RDONLY);
struct stat st;
fstat(fd, &st);
printf("Inode: %d, Links: %d\n", st.ino, st.nlink);
After linking:
  • Both "a" and "b" have same ino
  • nlink = 2
Behavior:
  • Removes name from file system
  • Decrements link count
  • Inode and disk space freed ONLY when:
    • Link count = 0 AND
    • No file descriptors refer to it
Example sequence:
open("a", O_CREATE|O_WRONLY);
link("a", "b");
unlink("a");
Result:
  • Name "a" removed
  • File still accessible as "b"
  • Inode unchanged
Temporary file idiom:
fd = open("/tmp/xyz", O_CREATE|O_RDWR);
unlink("/tmp/xyz");
// File has no name now
// Will be deleted when process closes fd or exits

Built-in vs External Commands

External (user-level programs):
  • mkdir
  • ln
  • rm
  • ls
  • cat
  • Most commands
Why external:
  • Anyone can add new commands
  • No need to modify kernel/shell
  • Easier to extend system
Built-in (part of shell):
  • cd
Why cd is built-in:
// If cd was external:
if(fork() == 0) {
    // Child changes its own directory
    chdir("/new/dir");
    exit(0);
}
// Parent (shell) directory unchanged!
[!WARNING] Problem: cd must change SHELL’s directory, not child’s

System Call Flow

Understanding how system calls work is crucial for OS internals. System Call Flow

Tracing fork() from User to Kernel

Step 1: User Program Calls fork()
// user/sh.c
int pid = fork();
Step 2: User Library Wrapper (user/usys.S)
.global fork
fork:
    li a7, SYS_fork    # a7 = system call number (1)
    ecall              # Trap into kernel!
    ret                # Return with result in a0
Step 3: Trap Entry (kernel/trampoline.S)
uservec:
    # Save user registers
    csrw sscratch, a0
    sd ra, 40(a0)
    sd sp, 48(a0)
    # ... save all 32 registers ...
    
    # Load kernel page table
    ld t1, 0(a0)
    csrw satp, t1
    sfence.vma zero, zero
    
    # Jump to usertrap() in C
    jr t0
Step 4: Trap Handler (kernel/trap.c)
void usertrap(void)
{
  struct proc *p = myproc();
  p->trapframe->epc = r_sepc();

  if(r_scause() == 8){
    // System call
    p->trapframe->epc += 4;   // Advance PC past ecall
    intr_on();
    syscall();                 // Handle the system call!
  }

  usertrapret();
}
Step 5: System Call Dispatcher (kernel/syscall.c)
void syscall(void)
{
  int num;
  struct proc *p = myproc();

  num = p->trapframe->a7;      // Get syscall number
  
  if(num > 0 && num < NELEM(syscalls) && syscalls[num]) {
    p->trapframe->a0 = syscalls[num]();  // Call handler
  } else {
    printf("%d %s: unknown sys call %d\n", p->pid, p->name, num);
    p->trapframe->a0 = -1;
  }
}
Step 6: Fork Implementation (kernel/proc.c)
int fork(void)
{
  struct proc *np;
  struct proc *p = myproc();

  // Allocate process
  if((np = allocproc()) == 0)
    return -1;

  // Copy user memory from parent to child
  if(uvmcopy(p->pagetable, np->pagetable, p->sz) < 0){
    freeproc(np);
    return -1;
  }
  np->sz = p->sz;

  // Copy saved user registers
  *(np->trapframe) = *(p->trapframe);
  np->trapframe->a0 = 0;  // Child returns 0

  // Copy open file descriptors
  for(int i = 0; i < NOFILE; i++)
    if(p->ofile[i])
      np->ofile[i] = filedup(p->ofile[i]);

  np->state = RUNNABLE;
  return np->pid;  // Parent returns child's PID
}

Real World Context

Unix Philosophy

“Software tools” culture:
  • Small programs doing one thing well
  • Combine via pipes
  • Shell as “scripting language”
Key innovations:
  • Standard file descriptors (0, 1, 2)
  • Pipes for composition
  • Simple but powerful shell syntax

POSIX Standard

Purpose: Standardize Unix system call interface xv6 vs POSIX:
  • xv6 NOT POSIX compliant
  • Missing: lseek(), many other calls
  • Different implementations of existing calls
  • Goal: Simplicity for teaching
Modern systems:
  • Many more system calls
  • Networking, windowing, threads
  • Many device drivers
  • Continuous evolution

Alternative Designs

Plan 9:
  • Extended “everything is a file” concept
  • Applied to networks, graphics
  • Most Unix systems didn’t follow
Multics:
  • Predecessor of Unix
  • Files looked like memory
  • Very different interface
  • Complexity influenced Unix designers to simplify
Other OS models:
  • Different abstractions possible
  • File descriptors not only solution
  • Unix model proven very successful

xv6 Limitations

No user protection:
  • All processes run as root
  • No isolation between users
  • Teaching-focused tradeoff
Limited features:
  • Minimal system calls
  • No networking
  • No windowing
  • Basic functionality only

Key Takeaways

Design Principles

Simple Interfaces

Few powerful mechanisms that combine for generality

Hardware Abstraction

Hide hardware details behind clean interfaces

Process Isolation

Processes isolated from each other for safety

Controlled Communication

Explicit mechanisms for safe interaction

Core Abstractions

  1. Processes - Unit of execution with isolated memory
  2. File descriptors - Unified I/O interface for files, pipes, devices
  3. Pipes - Communication channels between processes
  4. File system - Persistent storage hierarchy with inodes and links

Why These Abstractions Work

File descriptors:
  • Hide differences (files, pipes, devices)
  • Enable I/O redirection
  • Simple but powerful
fork/exec separation:
  • Enables I/O redirection
  • Shell controls child environment
  • Parent unaffected
Pipes:
  • Clean inter-process communication
  • Better than temp files
  • Enable parallel execution
Inodes/links:
  • Separate names from content
  • Multiple names for same file
  • Automatic cleanup when unused

Getting Started with xv6

Installation (Ubuntu/Debian)

# Install dependencies
sudo apt-get install git build-essential gdb-multiarch \
  qemu-system-misc gcc-riscv64-linux-gnu binutils-riscv64-linux-gnu 

# Clone xv6
git clone git://github.com/mit-pdos/xv6-riscv.git
cd xv6-riscv

# Build and run
make qemu

# You'll see xv6 boot and get a shell prompt:
$ 

First Commands to Try

$ ls              # List files
$ cat README      # Show file contents  
$ echo hello xv6  # Print text
$ grep the README # Search in file
$ wc README       # Word count
$ mkdir test      # Create directory
$ sh              # Start new shell

Interview Relevance

Understanding OS interfaces helps you:
  • Design better APIs and abstractions
  • Understand performance implications
  • Make informed technology choices
  • Debug production issues
  • How does fork() work? What’s copy-on-write?
  • Explain file descriptors and I/O redirection
  • How do pipes work? When to use them?
  • What’s the difference between hard links and symbolic links?
  • How does the shell implement pipelines?
  • Explain the system call mechanism
  • Understand trade-offs in OS design
  • Explain why Unix chose these abstractions
  • Compare with alternative designs
  • Discuss performance implications
  • Debug issues at the system call level

Resources

Official Materials

Next Steps


[!TIP] Learning Path: Start by reading the xv6 book alongside the source code. Try modifying xv6 to add features or change behavior. This hands-on experience is invaluable for truly understanding operating systems.