> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# System Calls & POSIX

> Understanding the interface between user space and kernel

# System Calls & POSIX

System calls are the gateway between your program and the operating system kernel. Think of them as the reception desk at a secure government building: your program (a visitor) cannot walk into the vault and grab files directly. Instead, you fill out a request form (set up registers), ring a bell (execute the `syscall` instruction), and a trusted employee (the kernel) fetches what you need and hands it back through the window.

Every `printf`, every file open, every network packet your C program sends eventually passes through this gateway. Understanding system calls is the difference between knowing C syntax and understanding how programs actually interact with hardware.

<img src="https://mintcdn.com/devweeekends/CHfRzoAmD5TGW2ch/images/courses/system-calls.svg?fit=max&auto=format&n=CHfRzoAmD5TGW2ch&q=85&s=e167de501b729fa0077c9be636bc59af" alt="System call path from user space to kernel" width="1080" height="1080" data-path="images/courses/system-calls.svg" />

***

## User Space vs Kernel Space

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                           USER SPACE                                         │
│                                                                              │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐                                      │
│  │  Your   │  │  libc   │  │ Other   │                                      │
│  │ Program │  │ (glibc) │  │ Libraries│                                     │
│  └────┬────┘  └────┬────┘  └─────────┘                                      │
│       │           │                                                          │
│       │   printf()│ → write() wrapper                                       │
│       └───────────┼──────────────────────────────────────────────────────┐  │
│                   │                                                      │  │
├───────────────────┴──────────────────────────────────────────────────────┼──┤
│                          SYSTEM CALL INTERFACE                            │  │
│                                                                          │  │
│         syscall(SYS_write, fd, buf, count)                              │  │
│                              │                                           │  │
├──────────────────────────────┼───────────────────────────────────────────┴──┤
│                              ▼                                              │
│                         KERNEL SPACE                                        │
│                                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                      │
│  │   Scheduler  │  │  Filesystem  │  │   Memory     │                      │
│  │              │  │   (VFS)      │  │  Management  │                      │
│  └──────────────┘  └──────────────┘  └──────────────┘                      │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │                        Hardware Drivers                               │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

<img src="https://mintcdn.com/devweeekends/AEOaWh79Ur7CdHHv/images/courses/c-user-kernel-transition.svg?fit=max&auto=format&n=AEOaWh79Ur7CdHHv&q=85&s=0d3d3b83114d8de37476384eb68b1667" alt="User Mode vs Kernel Mode Transition" width="1080" height="1080" data-path="images/courses/c-user-kernel-transition.svg" />

### The Transition Steps

1. **User Mode (Ring 3)**: Your program runs with limited privileges. It cannot access hardware directly.
2. **Library Call**: You call `printf()`. The C library (libc) formats the string and calls `write()`.
3. **System Call**: The `write()` wrapper puts arguments in CPU registers (e.g., `rax`=1 for write) and executes a special instruction (`syscall` on x86-64).
4. **Mode Switch**: The CPU switches to **Kernel Mode (Ring 0)** and jumps to a predefined kernel entry point.
5. **Kernel Execution**: The kernel validates arguments, checks permissions, and performs the operation (e.g., writing to the terminal buffer).
6. **Return**: The kernel executes `sysret`, switching the CPU back to User Mode and returning the result (number of bytes written or error).

***

## Making System Calls

### Via libc Wrappers

In practice, you almost never call system calls directly. Instead, you call libc wrapper functions that set up the arguments, execute the `syscall` instruction, check for errors, and set `errno`. This is like having an assistant who fills out the government forms for you.

```c theme={null}
#include <unistd.h>
#include <fcntl.h>

int main(void) {
    // These are libc wrapper functions, not raw syscalls.
    // open() calls the kernel's sys_open, which allocates a file descriptor,
    // checks permissions, and returns an integer handle to your program.
    int fd = open("file.txt", O_RDONLY);
    
    char buffer[1024];
    // read() asks the kernel to copy bytes from the file into your buffer.
    // The kernel does the actual disk I/O (or returns cached data from the page cache).
    ssize_t bytes = read(fd, buffer, sizeof(buffer));
    
    // write() to STDOUT_FILENO (fd 1) asks the kernel to send bytes to the terminal.
    write(STDOUT_FILENO, buffer, bytes);
    
    // close() tells the kernel we are done with this file descriptor.
    // The kernel frees the fd slot and flushes any pending writes.
    close(fd);
    return 0;
}
```

### Direct System Calls

```c theme={null}
#include <unistd.h>
#include <sys/syscall.h>

int main(void) {
    // Direct syscall (bypasses libc)
    const char *msg = "Hello, kernel!\n";
    
    // SYS_write = 1 on x86-64 Linux
    syscall(SYS_write, STDOUT_FILENO, msg, 15);
    
    // Inline assembly (x86-64)
    // Register convention: rax=syscall#, rdi=arg1, rsi=arg2, rdx=arg3
    long ret;
    __asm__ volatile (
        "syscall"
        : "=a" (ret)
        : "a" (SYS_write), "D" (STDOUT_FILENO), "S" (msg), "d" (15)
        : "rcx", "r11", "memory"
    );
    
    return 0;
}
```

***

## Error Handling

```c theme={null}
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>

int main(void) {
    int fd = open("/nonexistent", O_RDONLY);
    
    if (fd == -1) {
        // errno is set by the system call
        fprintf(stderr, "Error code: %d\n", errno);
        fprintf(stderr, "Error message: %s\n", strerror(errno));
        perror("open");  // Prints: "open: No such file or directory"
        
        // Common errno values
        switch (errno) {
            case ENOENT: printf("File not found\n"); break;
            case EACCES: printf("Permission denied\n"); break;
            case EEXIST: printf("File exists\n"); break;
            case EINTR:  printf("Interrupted\n"); break;
            case EINVAL: printf("Invalid argument\n"); break;
            case ENOMEM: printf("Out of memory\n"); break;
            case ENOSPC: printf("No space left\n"); break;
        }
        
        return 1;
    }
    
    close(fd);
    return 0;
}

// Robust error-handling wrapper
// Why EINTR handling matters: if a signal arrives while your program is blocked
// in a system call, the kernel interrupts the call and returns -1 with errno=EINTR.
// This is not an error -- it just means "try again." Without this retry loop,
// your program would falsely report errors whenever it receives a signal (which
// happens more often than you think: SIGCHLD when a child process exits,
// SIGALRM from timers, etc.).
int safe_open(const char *path, int flags) {
    int fd;
    
    do {
        fd = open(path, flags);
    } while (fd == -1 && errno == EINTR);  // Retry on signal interruption
    
    return fd;
}

// Robust read (handles partial reads and interrupts)
// A common beginner mistake: assuming read() always returns the full count.
// In reality, read() can return fewer bytes than requested for many reasons:
// - Signal interrupted the call
// - Reading from a pipe or socket (data arrives in chunks)
// - Reading near end of file
// - Kernel decided to return a partial buffer
// Production code MUST loop until it gets all requested bytes or hits EOF/error.
ssize_t safe_read(int fd, void *buf, size_t count) {
    ssize_t total = 0;
    char *ptr = buf;
    
    while (count > 0) {
        ssize_t n = read(fd, ptr, count);
        
        if (n == -1) {
            if (errno == EINTR) continue;  // Signal interrupted us, just retry
            return -1;  // Actual error (EBADF, EIO, etc.)
        }
        
        if (n == 0) break;  // EOF -- no more data available
        
        total += n;
        ptr += n;
        count -= n;
    }
    
    return total;
}
```

***

## File Descriptors

File descriptors are the kernel's universal handle for "anything you can read from or write to." Files, pipes, sockets, terminals, even special devices like `/dev/null` -- they all look the same to your program: just an integer you pass to `read()` and `write()`. This uniform interface is one of Unix's most powerful design decisions, and it is why shell pipes like `cat file.txt | grep pattern | wc -l` just work.

```c theme={null}
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>

int main(void) {
    // Standard file descriptors -- every process starts with these three open:
    // 0 = stdin  (where input comes from, usually the keyboard)
    // 1 = stdout (where output goes, usually the terminal)
    // 2 = stderr (where error messages go, also the terminal but separate)
    
    // Open returns the lowest available fd number.
    // Since 0, 1, 2 are already taken, the first open() usually returns 3.
    int fd = open("file.txt", O_RDWR | O_CREAT, 0644);
    printf("Opened fd: %d\n", fd);  // Usually 3
    
    // Duplicate file descriptor
    int fd2 = dup(fd);  // fd2 points to same file
    
    // Duplicate to specific number
    int fd3 = dup2(fd, 10);  // fd 10 now points to same file
    
    // Get/set file descriptor flags
    int flags = fcntl(fd, F_GETFL);
    fcntl(fd, F_SETFL, flags | O_NONBLOCK);
    
    // Get file info
    struct stat st;
    fstat(fd, &st);
    printf("Size: %ld bytes\n", st.st_size);
    printf("Mode: %o\n", st.st_mode);
    printf("Is regular file: %d\n", S_ISREG(st.st_mode));
    printf("Is directory: %d\n", S_ISDIR(st.st_mode));
    
    close(fd);
    close(fd2);
    close(fd3);
    
    return 0;
}
```

***

## Process Information

```c theme={null}
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>

int main(void) {
    printf("PID:  %d\n", getpid());
    printf("PPID: %d\n", getppid());
    printf("UID:  %d\n", getuid());
    printf("EUID: %d\n", geteuid());
    printf("GID:  %d\n", getgid());
    
    // Get username
    struct passwd *pw = getpwuid(getuid());
    printf("User: %s\n", pw->pw_name);
    printf("Home: %s\n", pw->pw_dir);
    
    // Current working directory
    char cwd[1024];
    if (getcwd(cwd, sizeof(cwd))) {
        printf("CWD:  %s\n", cwd);
    }
    
    // Hostname
    char hostname[256];
    gethostname(hostname, sizeof(hostname));
    printf("Host: %s\n", hostname);
    
    return 0;
}
```

***

## Environment Variables

```c theme={null}
#include <stdio.h>
#include <stdlib.h>

extern char **environ;  // Global environment

int main(int argc, char *argv[], char *envp[]) {
    // Get environment variable
    const char *path = getenv("PATH");
    if (path) {
        printf("PATH: %s\n", path);
    }
    
    // Set environment variable
    setenv("MY_VAR", "my_value", 1);  // 1 = overwrite
    putenv("ANOTHER_VAR=another_value");  // Alternative
    
    // Remove environment variable
    unsetenv("MY_VAR");
    
    // Iterate all environment variables
    printf("\n=== All Environment Variables ===\n");
    for (char **env = environ; *env; env++) {
        printf("%s\n", *env);
    }
    
    // Or using the envp parameter
    // for (int i = 0; envp[i]; i++) { ... }
    
    return 0;
}
```

***

## Time and Date

```c theme={null}
#include <stdio.h>
#include <time.h>
#include <sys/time.h>

int main(void) {
    // Current time (seconds since epoch)
    time_t now = time(NULL);
    printf("Epoch time: %ld\n", now);
    
    // Human-readable
    struct tm *local = localtime(&now);
    printf("Local: %04d-%02d-%02d %02d:%02d:%02d\n",
           local->tm_year + 1900, local->tm_mon + 1, local->tm_mday,
           local->tm_hour, local->tm_min, local->tm_sec);
    
    // UTC
    struct tm *utc = gmtime(&now);
    printf("UTC:   %04d-%02d-%02d %02d:%02d:%02d\n",
           utc->tm_year + 1900, utc->tm_mon + 1, utc->tm_mday,
           utc->tm_hour, utc->tm_min, utc->tm_sec);
    
    // Formatted string
    char buf[100];
    strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M:%S %Z", local);
    printf("Formatted: %s\n", buf);
    
    // High-resolution time
    struct timeval tv;
    gettimeofday(&tv, NULL);
    printf("Microseconds: %ld.%06ld\n", tv.tv_sec, tv.tv_usec);
    
    // Monotonic clock (for measuring intervals)
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    printf("Monotonic: %ld.%09ld\n", ts.tv_sec, ts.tv_nsec);
    
    // Sleep
    sleep(1);           // Seconds
    usleep(100000);     // Microseconds
    nanosleep(&(struct timespec){0, 100000000}, NULL);  // 100ms
    
    return 0;
}

// Measuring execution time
void measure_time(void (*func)(void)) {
    struct timespec start, end;
    
    clock_gettime(CLOCK_MONOTONIC, &start);
    func();
    clock_gettime(CLOCK_MONOTONIC, &end);
    
    double elapsed = (end.tv_sec - start.tv_sec) +
                     (end.tv_nsec - start.tv_nsec) / 1e9;
    printf("Elapsed: %.6f seconds\n", elapsed);
}
```

***

## Resource Limits

```c theme={null}
#include <stdio.h>
#include <sys/resource.h>

int main(void) {
    struct rlimit rl;
    
    // Get limits
    getrlimit(RLIMIT_NOFILE, &rl);
    printf("Max open files: %ld (soft), %ld (hard)\n",
           rl.rlim_cur, rl.rlim_max);
    
    getrlimit(RLIMIT_STACK, &rl);
    printf("Stack size: %ld KB (soft), %ld KB (hard)\n",
           rl.rlim_cur / 1024, rl.rlim_max / 1024);
    
    getrlimit(RLIMIT_AS, &rl);
    printf("Address space: %ld MB (soft)\n", rl.rlim_cur / (1024*1024));
    
    // Set limits
    rl.rlim_cur = 1024 * 1024 * 100;  // 100 MB
    setrlimit(RLIMIT_AS, &rl);
    
    // Resource usage
    struct rusage usage;
    getrusage(RUSAGE_SELF, &usage);
    printf("User time: %ld.%06ld s\n",
           usage.ru_utime.tv_sec, usage.ru_utime.tv_usec);
    printf("System time: %ld.%06ld s\n",
           usage.ru_stime.tv_sec, usage.ru_stime.tv_usec);
    printf("Max RSS: %ld KB\n", usage.ru_maxrss);
    
    return 0;
}
```

***

## POSIX Portability

```c theme={null}
#include <stdio.h>
#include <unistd.h>

int main(void) {
    // Check POSIX features
    #ifdef _POSIX_VERSION
        printf("POSIX version: %ld\n", _POSIX_VERSION);
    #endif
    
    // Runtime capability checks
    long val;
    
    val = sysconf(_SC_PAGESIZE);
    printf("Page size: %ld\n", val);
    
    val = sysconf(_SC_NPROCESSORS_ONLN);
    printf("CPUs online: %ld\n", val);
    
    val = sysconf(_SC_OPEN_MAX);
    printf("Max open files: %ld\n", val);
    
    val = sysconf(_SC_CLK_TCK);
    printf("Clock ticks/sec: %ld\n", val);
    
    // Path configuration
    val = pathconf("/", _PC_NAME_MAX);
    printf("Max filename length: %ld\n", val);
    
    val = pathconf("/", _PC_PATH_MAX);
    printf("Max path length: %ld\n", val);
    
    return 0;
}
```

***

## Common System Call Reference

| Category        | System Calls                                      |
| --------------- | ------------------------------------------------- |
| **File I/O**    | open, close, read, write, lseek, pread, pwrite    |
| **File Info**   | stat, fstat, lstat, access, chmod, chown          |
| **Directories** | mkdir, rmdir, chdir, getcwd, opendir, readdir     |
| **Processes**   | fork, exec\*, wait, waitpid, exit, \_exit         |
| **Signals**     | kill, sigaction, sigprocmask, pause, sigsuspend   |
| **Memory**      | mmap, munmap, mprotect, brk, sbrk                 |
| **IPC**         | pipe, socketpair, shmget, semget, msgget          |
| **Network**     | socket, bind, listen, accept, connect, send, recv |
| **Time**        | time, gettimeofday, clock\_gettime, nanosleep     |
| **Misc**        | ioctl, fcntl, dup, dup2, select, poll, epoll      |

***

## Exercises

<Steps>
  <Step title="System Info Tool">
    Build a tool that prints comprehensive system information (CPU, memory, disk, network).
  </Step>

  <Step title="Safe Wrapper Library">
    Create a library of safe wrappers for common system calls with proper error handling and EINTR retry.
  </Step>

  <Step title="Syscall Tracer">
    Use `ptrace` to build a simple strace-like tool.
  </Step>

  <Step title="Resource Monitor">
    Build a tool that monitors a process's resource usage over time.
  </Step>
</Steps>

***

## Next Up

<Card title="Concurrency" icon="arrow-right" href="/courses/c-programming/concurrency">
  Process and thread programming
</Card>
