Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Inter-Process Communication (IPC)

In an operating system, processes are isolated by the Virtual Memory Manager to prevent one process from corrupting another. However, complex systems (like Chrome, Nginx, or a Database) require these isolated units to cooperate. IPC is the set of mechanisms provided by the kernel to bridge this isolation.
Caveat 1: Pipes are unidirectional — this trips up almost every newcomer. A pipe() call gives you two file descriptors: fd[0] for reading and fd[1] for writing. The data flows in one direction only. If you want a parent and child to talk back and forth, you need TWO pipes (one parent-to-child, one child-to-parent) — or you can use socketpair(AF_UNIX, SOCK_STREAM, 0, fds) which gives you a single bidirectional channel. This is why most modern code uses socketpair for parent-child IPC: half the file descriptors, no risk of crossing the streams. The pitfall: closing the write end of a pipe causes the reader to get EOF; closing the read end causes the writer to get SIGPIPE, which by default kills the process. Senior developers always install a SIGPIPE handler or set MSG_NOSIGNAL on every send.
Pattern: prefer socketpair for new bidirectional IPC. It avoids the four-FD bookkeeping of dual pipes, supports recvmsg/sendmsg (so you can pass file descriptors via SCM_RIGHTS), and gives you MSG_PEEK and MSG_NOSIGNAL. Reserve raw pipe() for the unidirectional cases where it is the obvious primitive: shell pipelines, popen-style streaming, log forwarding.
Caveat 2: Shared memory races are the most subtle bug class in systems programming. The kernel does ZERO synchronization for shared memory — it just maps the same physical pages into two virtual address spaces. If both processes write to the same byte without coordination, the result is undefined (literally: torn writes, lost updates, silent corruption). Worse, on weakly-ordered architectures (ARM, POWER), even reads of correctly-aligned 64-bit values can return garbage if the writer and reader did not insert appropriate barriers. “It works on x86” is not a correctness proof; x86’s TSO model accidentally papers over many missing barriers.
Pattern: pair shared memory with explicit synchronization. The standard recipe is shm_open + mmap + a POSIX semaphore (sem_open) or a process-shared mutex (pthread_mutexattr_setpshared(PTHREAD_PROCESS_SHARED)). For producer/consumer, use a shared-memory ring buffer with atomic head and tail indices using memory_order_release/memory_order_acquire. Test on ARM (a Raspberry Pi works) — if your code passes there, it almost certainly passes everywhere. Crash safety: if a process dies holding a shared mutex, use PTHREAD_MUTEX_ROBUST so the next acquirer gets EOWNERDEAD and can recover instead of deadlocking forever.
Caveat 3: Unix domain sockets vs. TCP loopback — UDS is roughly 2x faster but local-only. Engineers often default to 127.0.0.1:PORT because it is portable. On the same host, this routes through the full TCP stack: socket buffers, sequence numbers, checksums (skipped on loopback in Linux but still memory-copied twice), congestion control state. A Unix domain socket bypasses all of that — the kernel just hands the bytes from one process’s socket buffer to another’s. Benchmarks routinely show UDS at 2x-4x the throughput of TCP loopback for the same workload, with substantially lower CPU. The tradeoff: UDS only works for processes on the same kernel; you cannot transparently move them to different hosts.
Pattern: use UDS for sidecar / colocated IPC. When you have a service mesh sidecar (Envoy, Linkerd) on the same host as your application, talk to it over UDS, not loopback TCP. Same for daemon-style architectures (docker.sock, systemd socket activation, redis-cli to a local Redis). For cross-host fallback, abstract the connection behind an interface that picks UDS for unix:// URIs and TCP for tcp:// URIs — changing one config flag should be the only difference.
Caveat 4: POSIX message queues have hard size limits and can block silently. The kernel default mq_msgsize is 8192 bytes; mq_maxmsg defaults to 10. A mq_send with a full queue blocks the sender by default — and if the receiver crashes, the sender hangs forever. The system-wide limits in /proc/sys/fs/mqueue/ are also low (256 queues, ~819200 total bytes by default), so you cannot scale message queues like you would scale a Kafka topic.
Pattern: open mqueues O_NONBLOCK and handle EAGAIN explicitly. mq_send and mq_receive return EAGAIN when full/empty in non-blocking mode — treat this as backpressure, not as an error. Pair with mq_notify or a signalfd so you can wait for queue events in your event loop. If you genuinely need durable, large-volume messaging across processes on one host, prefer a real broker (Redis Streams, NATS, Kafka) over POSIX mqueues — mqueues are best for low-volume control plane messages, not data plane.
Mastery Level: Senior Systems Engineer
Key Internals: Kernel Ring Buffers, Page Table Aliasing, Signal Frames, rt_sigreturn
Prerequisites: Virtual Memory, Process Internals

0. The Big Picture: Why So Many IPC Mechanisms?

Before diving into each mechanism, understand how they compare:
MechanismData ModelCopy OverheadSync Built-in?Best For
PipeByte stream1 kernel copyYes (blocking)Parent–child, simple streaming
Named Pipe (FIFO)Byte stream1 kernel copyYesUnrelated processes, simple streaming
Shared MemoryRaw bytesZero copyNo (bring your own)High-throughput, low-latency bulk data
Message QueueDiscrete messages1 kernel copyYes (blocking, priority)Structured messages, priority ordering
Unix Domain SocketStream or datagram1 kernel copyYesHigh-perf local networking, FD passing
SignalInteger onlyN/AN/AAsync events, not data
TCP/UDP SocketStream/datagram1 kernel copyYesCross-machine, network-transparent

Decision Flowchart

  1. Need to pass file descriptors? → Unix Domain Socket (SCM_RIGHTS).
  2. Need zero-copy bulk transfer? → Shared Memory + your own synchronization.
  3. Need structured messages with priority? → POSIX Message Queue.
  4. Parent–child streaming? → Pipe.
  5. Unrelated processes, streaming? → Named Pipe or Unix Socket.
  6. Cross-machine? → TCP/UDP Socket.
  7. Async notification only? → Signal.

Understanding Process Isolation

Before diving into IPC mechanisms, let’s understand why processes need isolation and how the kernel enforces it.

Virtual Memory Isolation

Each process operates in its own virtual address space:
Process A Virtual Memory          Process B Virtual Memory
┌─────────────────────┐          ┌─────────────────────┐
│   Stack    0xFFFF   │          │   Stack    0xFFFF   │
├─────────────────────┤          ├─────────────────────┤
│                     │          │                     │
│   Heap              │          │   Heap              │
│                     │          │                     │
├─────────────────────┤          ├─────────────────────┤
│   Data              │          │   Data              │
├─────────────────────┤          ├─────────────────────┤
│   Text     0x0000   │          │   Text     0x0000   │
└─────────────────────┘          └─────────────────────┘
        ↓                                 ↓
    Page Table A                     Page Table B
        ↓                                 ↓
    ┌───────────────────────────────────────┐
    │   Physical Memory (RAM)               │
    │  Different physical frames mapped     │
    └───────────────────────────────────────┘
Isolation Benefits:
  • Memory safety: Process A cannot corrupt Process B
  • Security: Privileged data stays protected
  • Stability: Crash in one process doesn’t affect others
  • Predictability: Each process sees a consistent address space
The Problem: This isolation prevents direct communication. The kernel must provide controlled mechanisms for processes to exchange data.

1. Pipes: The Kernel Ring Buffer

Pipes are the oldest and most fundamental IPC mechanism in Unix. While they appear as simple file descriptors to user space, their internal implementation reveals sophisticated kernel buffer management.

1.1 Pipe Fundamentals

A pipe is a unidirectional communication channel with:
  • Write end: One process writes data
  • Read end: Another process reads data
  • FIFO ordering: First In, First Out
  • Byte stream: No message boundaries
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main() {
    int pipefd[2];  // pipefd[0] = read end, pipefd[1] = write end
    char buffer[128];

    // Create pipe
    if (pipe(pipefd) == -1) {
        perror("pipe");
        return 1;
    }

    pid_t pid = fork();

    if (pid == 0) {
        // Child process - writer
        close(pipefd[0]);  // Close unused read end

        const char *msg = "Hello from child!";
        write(pipefd[1], msg, strlen(msg) + 1);
        close(pipefd[1]);
    } else {
        // Parent process - reader
        close(pipefd[1]);  // Close unused write end

        ssize_t n = read(pipefd[0], buffer, sizeof(buffer));
        printf("Parent received: %s (%zd bytes)\n", buffer, n);
        close(pipefd[0]);
    }

    return 0;
}
Critical Design Pattern: Always close unused pipe ends. If the parent keeps pipefd[1] open, the read() call will never return 0 (EOF) because the kernel sees there’s still a potential writer.

1.2 Kernel Implementation Deep Dive

The Pipe Buffer Structure

In the Linux kernel, a pipe is implemented using a circular buffer structure (struct pipe_inode_info):
// Simplified kernel structure (from fs/pipe.c)
struct pipe_buffer {
    struct page *page;      // Points to a physical page
    unsigned int offset;    // Offset within the page
    unsigned int len;       // Length of data in this buffer
    const struct pipe_buf_operations *ops;
    unsigned int flags;
};

struct pipe_inode_info {
    struct mutex mutex;               // Protects the pipe
    wait_queue_head_t rd_wait;       // Reader wait queue
    wait_queue_head_t wr_wait;       // Writer wait queue
    unsigned int head;               // Write position
    unsigned int tail;               // Read position
    unsigned int max_usage;          // Max buffers (usually 16)
    unsigned int ring_size;          // Number of buffer slots
    struct pipe_buffer *bufs;        // Array of pipe buffers
    struct user_struct *user;        // Owner
};
Memory Layout:
Pipe Ring Buffer (16 pages × 4KB = 64KB default capacity)

   head = 5                           tail = 2
      ↓                                  ↓
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│  8  │  9  │ 10  │ 11  │ 12  │ 13  │ 14  │ 15  │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│  0  │  1  │  2  │  3  │  4  │  5  │  6  │  7  │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
               ↑                       ↑
            Reader                  Writer
         (consumes)              (produces)

Current data: buffers 2, 3, 4 (3 pages = 12KB of data)

Write Operation Flow

When a process calls write(pipefd[1], data, size):
1. System Call Entry
   ├─> User space → Kernel space transition
   └─> syscall handler: sys_write() → vfs_write() → pipe_write()

2. Acquire Pipe Mutex
   ├─> mutex_lock(&pipe->mutex)
   └─> Prevents concurrent access

3. Check Capacity
   ├─> if ((head - tail) >= ring_size)  // Pipe is full
   │   ├─> if (O_NONBLOCK) return -EAGAIN
   │   └─> else: wait_event_interruptible(pipe->wr_wait)
   └─> Process sleeps in TASK_INTERRUPTIBLE state

4. Write Data to Buffer
   ├─> Allocate new pipe_buffer if needed
   ├─> Get page: alloc_page(GFP_HIGHUSER)
   ├─> Copy from user space: copy_from_user(page, data, size)
   └─> Update head pointer: pipe->head++

5. Wake Readers
   ├─> wake_up_interruptible(&pipe->rd_wait)
   └─> Reader process moves to run queue

6. Release Mutex & Return
   ├─> mutex_unlock(&pipe->mutex)
   └─> Return bytes written
Key Kernel Functions:
// Simplified from fs/pipe.c
static ssize_t pipe_write(struct kiocb *iocb, struct iov_iter *from) {
    struct file *filp = iocb->ki_filp;
    struct pipe_inode_info *pipe = filp->private_data;
    ssize_t ret = 0;
    size_t total_len = iov_iter_count(from);

    mutex_lock(&pipe->mutex);

    for (;;) {
        unsigned int head = pipe->head;
        unsigned int tail = pipe->tail;
        unsigned int mask = pipe->ring_size - 1;

        if (!pipe_full(head, tail, pipe->max_usage)) {
            struct pipe_buffer *buf = &pipe->bufs[head & mask];
            struct page *page = alloc_page(GFP_HIGHUSER);

            if (!page) {
                ret = -ENOMEM;
                break;
            }

            // Copy data from user space
            size_t chunk = min_t(size_t, total_len, PAGE_SIZE);
            if (copy_from_iter(page_address(page), chunk, from) != chunk) {
                __free_page(page);
                ret = -EFAULT;
                break;
            }

            buf->page = page;
            buf->offset = 0;
            buf->len = chunk;

            pipe->head = head + 1;
            ret += chunk;

            // Wake up readers
            wake_up_interruptible_sync_poll(&pipe->rd_wait, EPOLLIN);

            if (ret >= total_len)
                break;
        } else {
            // Pipe is full
            if (filp->f_flags & O_NONBLOCK) {
                ret = -EAGAIN;
                break;
            }

            // Sleep until space available
            mutex_unlock(&pipe->mutex);
            wait_event_interruptible(pipe->wr_wait,
                                     !pipe_full(pipe->head, pipe->tail, pipe->max_usage));
            mutex_lock(&pipe->mutex);
        }
    }

    mutex_unlock(&pipe->mutex);
    return ret;
}

1.3 Atomicity and PIPE_BUF

Critical Guarantee: Writes of size ≤ PIPE_BUF (4096 bytes on Linux) are atomic. What does atomic mean?
// Two processes writing simultaneously

// Process A
write(pipe_fd, "AAAA...AAAA", 3000);  // 3000 A's

// Process B
write(pipe_fd, "BBBB...BBBB", 3000);  // 3000 B's

// Result in pipe (GUARANTEED):
// Either: AAAA...AAAA (3000 A's) BBBB...BBBB (3000 B's)
// Or:     BBBB...BBBB (3000 B's) AAAA...AAAA (3000 A's)

// NEVER: AABBABBA... (interleaved)
What happens with writes > PIPE_BUF?
// Process A
write(pipe_fd, "AAAA...AAAA", 8000);  // 8000 A's

// Process B
write(pipe_fd, "BBBB...BBBB", 8000);  // 8000 B's

// Result (POSSIBLE):
// AAAA...AAAA (4096 A's) BBBB...BBBB (4096 B's) AAAA...AAAA (remaining A's) ...
// Data is interleaved!
Kernel Implementation: The atomicity guarantee is enforced by holding the pipe mutex for the entire write operation when size <= PIPE_BUF:
if (total_len <= PIPE_BUF) {
    // Atomic write: hold mutex for entire operation
    mutex_lock(&pipe->mutex);
    // ... perform entire write ...
    mutex_unlock(&pipe->mutex);
} else {
    // Non-atomic: may release mutex between chunks
    while (bytes_remaining > 0) {
        mutex_lock(&pipe->mutex);
        // ... write chunk ...
        mutex_unlock(&pipe->mutex);
        // Another process can write here!
    }
}

1.4 Named Pipes (FIFOs)

Regular pipes only work between related processes (parent-child via fork()). Named pipes (FIFOs) allow unrelated processes to communicate.
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

// Writer process
int main() {
    const char *fifo_path = "/tmp/my_fifo";

    // Create FIFO (like a file in the filesystem)
    if (mkfifo(fifo_path, 0666) == -1) {
        perror("mkfifo");
        return 1;
    }

    int fd = open(fifo_path, O_WRONLY);  // Blocks until reader opens

    const char *msg = "Hello via FIFO!";
    write(fd, msg, strlen(msg) + 1);

    close(fd);
    unlink(fifo_path);  // Remove FIFO file
    return 0;
}

// Reader process (separate program)
int main() {
    const char *fifo_path = "/tmp/my_fifo";
    char buffer[128];

    int fd = open(fifo_path, O_RDONLY);  // Blocks until writer opens

    ssize_t n = read(fd, buffer, sizeof(buffer));
    printf("Received: %s\n", buffer);

    close(fd);
    return 0;
}
Kernel Implementation: A FIFO is represented by an inode with type S_IFIFO. The inode’s i_pipe field points to the same struct pipe_inode_info as regular pipes.
Filesystem View              Kernel Internal View
─────────────────            ────────────────────
/tmp/my_fifo                 struct inode
(special file)               ├─> i_mode = S_IFIFO
                             └─> i_pipe ──> struct pipe_inode_info
                                                  (ring buffer)

1.5 Pipe Performance Characteristics

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>

#define DATA_SIZE (1024 * 1024 * 100)  // 100 MB

int main() {
    int pipefd[2];
    pipe(pipefd);

    pid_t pid = fork();

    if (pid == 0) {
        // Child - writer
        close(pipefd[0]);

        char *data = malloc(4096);
        memset(data, 'A', 4096);

        size_t written = 0;
        struct timespec start, end;
        clock_gettime(CLOCK_MONOTONIC, &start);

        while (written < DATA_SIZE) {
            ssize_t n = write(pipefd[1], data, 4096);
            if (n > 0) written += n;
        }

        clock_gettime(CLOCK_MONOTONIC, &end);
        double elapsed = (end.tv_sec - start.tv_sec) +
                        (end.tv_nsec - start.tv_nsec) / 1e9;

        printf("Wrote %zu bytes in %.3f seconds\n", written, elapsed);
        printf("Throughput: %.2f MB/s\n", (DATA_SIZE / (1024.0 * 1024.0)) / elapsed);

        close(pipefd[1]);
        free(data);
        exit(0);
    } else {
        // Parent - reader
        close(pipefd[1]);

        char buffer[4096];
        size_t total_read = 0;

        while (1) {
            ssize_t n = read(pipefd[0], buffer, sizeof(buffer));
            if (n <= 0) break;
            total_read += n;
        }

        printf("Read %zu bytes\n", total_read);
        close(pipefd[0]);
        wait(NULL);
    }

    return 0;
}
Typical Results: 2-5 GB/s on modern hardware (limited by memory copy speed)

1.6 The Self-Pipe Trick (Advanced Pattern)

Problem: Signal handlers are asynchronous and severely limited in what they can do (async-signal-safe functions only). How do you integrate signals with an event loop? Solution: The self-pipe trick.
#include <unistd.h>
#include <signal.h>
#include <poll.h>
#include <stdio.h>
#include <string.h>

static int signal_pipe[2];

void signal_handler(int sig) {
    // Only async-signal-safe operations allowed here
    char byte = sig;
    write(signal_pipe[1], &byte, 1);  // Write is async-signal-safe
}

int main() {
    pipe(signal_pipe);

    // Set up signal handler
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_handler = signal_handler;
    sigaction(SIGINT, &sa, NULL);
    sigaction(SIGTERM, &sa, NULL);

    // Event loop using poll()
    struct pollfd fds[2];
    fds[0].fd = STDIN_FILENO;
    fds[0].events = POLLIN;
    fds[1].fd = signal_pipe[0];
    fds[1].events = POLLIN;

    printf("Event loop running. Press Ctrl+C to test...\n");

    while (1) {
        int ret = poll(fds, 2, -1);

        if (ret > 0) {
            if (fds[0].revents & POLLIN) {
                // Handle stdin
                char buf[128];
                ssize_t n = read(STDIN_FILENO, buf, sizeof(buf));
                printf("Got input: %.*s", (int)n, buf);
            }

            if (fds[1].revents & POLLIN) {
                // Handle signal
                char sig;
                read(signal_pipe[0], &sig, 1);
                printf("Received signal %d in main loop!\n", sig);

                if (sig == SIGINT || sig == SIGTERM) {
                    printf("Cleaning up and exiting...\n");
                    break;
                }
            }
        }
    }

    close(signal_pipe[0]);
    close(signal_pipe[1]);
    return 0;
}
Why it works:
  1. Signal handler executes in async context (can’t safely do much)
  2. Handler writes 1 byte to pipe (write is async-signal-safe)
  3. Main event loop wakes up from poll()
  4. Main loop reads signal number and handles it safely
  5. Signal handling is now integrated with other I/O events
Used in: Redis, Nginx, Node.js event loops

2. Shared Memory: The Zero-Copy Holy Grail

Shared Memory is the fastest IPC mechanism because it completely eliminates kernel involvement in data transfer. Once set up, processes communicate at memory speed.

2.1 The Fundamental Concept

Traditional IPC (pipe/socket) data flow:
Process A                 Kernel                Process B
┌─────────┐              ┌──────┐              ┌─────────┐
│  User   │  write()     │      │  read()      │  User   │
│ Buffer  │ ──────────>  │ Pipe │ ──────────>  │ Buffer  │
│ [DATA]  │   copy 1     │Buffer│   copy 2     │ [DATA]  │
└─────────┘              │[DATA]│              └─────────┘
                         └──────┘
Total: 2 memory copies + 2 syscalls + 2 context switches
Shared Memory data flow:
Process A                                      Process B
┌─────────┐                                    ┌─────────┐
│  User   │    No kernel involvement!         │  User   │
│ Buffer  │ ──────────────────────────────>   │ Buffer  │
│ [DATA]  │    Direct memory access           │ [DATA]  │
└─────────┘                                    └─────────┘
     ↓                                              ↓
     └────────────> Same Physical Memory <──────────┘

Total: 0 copies (after initial setup)

2.2 POSIX Shared Memory Implementation

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <semaphore.h>

#define SHM_NAME "/my_shm"
#define SHM_SIZE 4096

// Shared data structure
struct shared_data {
    sem_t mutex;      // Synchronization primitive
    int counter;
    char message[256];
};

// Writer process
int main() {
    // Create shared memory object
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    if (shm_fd == -1) {
        perror("shm_open");
        return 1;
    }

    // Set size
    if (ftruncate(shm_fd, SHM_SIZE) == -1) {
        perror("ftruncate");
        return 1;
    }

    // Map shared memory into address space
    struct shared_data *shm = mmap(NULL, SHM_SIZE,
                                   PROT_READ | PROT_WRITE,
                                   MAP_SHARED, shm_fd, 0);
    if (shm == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    // Initialize semaphore (process-shared)
    sem_init(&shm->mutex, 1, 1);  // pshared=1 means process-shared

    // Write data
    sem_wait(&shm->mutex);  // Acquire lock
    shm->counter = 42;
    strcpy(shm->message, "Hello from writer!");
    sem_post(&shm->mutex);  // Release lock

    printf("Data written to shared memory\n");

    // Keep memory mapped for reader
    sleep(5);

    munmap(shm, SHM_SIZE);
    close(shm_fd);
    shm_unlink(SHM_NAME);  // Remove shared memory object

    return 0;
}

// Reader process (separate program)
int main() {
    // Open existing shared memory
    int shm_fd = shm_open(SHM_NAME, O_RDONLY, 0666);
    if (shm_fd == -1) {
        perror("shm_open");
        return 1;
    }

    // Map shared memory (read-only)
    struct shared_data *shm = mmap(NULL, SHM_SIZE,
                                   PROT_READ,
                                   MAP_SHARED, shm_fd, 0);
    if (shm == MAP_FAILED) {
        perror("mmap");
        return 1;
    }

    // Read data
    sem_wait((sem_t*)&shm->mutex);  // Acquire lock
    printf("Counter: %d\n", shm->counter);
    printf("Message: %s\n", shm->message);
    sem_post((sem_t*)&shm->mutex);  // Release lock

    munmap(shm, SHM_SIZE);
    close(shm_fd);

    return 0;
}

2.3 The MMU Magic: Page Table Aliasing

How the kernel makes shared memory work:
Process A Virtual Memory          Process B Virtual Memory
┌────────────────────┐           ┌────────────────────┐
│  0x7000: [shm map] │           │  0x9000: [shm map] │
└────────────────────┘           └────────────────────┘
         ↓                                ↓
    Page Table A                     Page Table B
         ↓                                ↓
    VPN: 0x7    →  PFN: 0x5000      VPN: 0x9    →  PFN: 0x5000
                         ↓                               ↓
                         └───────────────────────────────┘

                         Physical Frame 0x5000
                         ┌────────────────────┐
                         │  Actual data here  │
                         │  counter = 42      │
                         │  message = "..."   │
                         └────────────────────┘
Kernel Implementation: When Process A calls mmap(MAP_SHARED):
// Simplified from mm/mmap.c and mm/shmem.c

// 1. Create VMA (Virtual Memory Area)
struct vm_area_struct *vma = vm_area_alloc(mm);
vma->vm_start = 0x7000;  // Virtual address (example)
vma->vm_end = 0x8000;
vma->vm_flags = VM_SHARED | VM_READ | VM_WRITE;
vma->vm_file = shm_file;  // Points to shared memory file

// 2. Install page table entries
for (unsigned long addr = vma->vm_start; addr < vma->vm_end; addr += PAGE_SIZE) {
    struct page *page = shmem_get_page(shm_file, offset);  // Get shared page
    unsigned long pfn = page_to_pfn(page);  // Physical frame number

    // Install PTE: Virtual Page → Physical Frame
    pte_t *pte = get_pte(mm, addr);
    set_pte(pte, pfn_pte(pfn, vma->vm_page_prot));
}
When Process B calls mmap(MAP_SHARED) on the same shared memory object:
// Process B gets DIFFERENT virtual address (0x9000)
// But kernel maps it to SAME physical frames (0x5000)

struct vm_area_struct *vma_b = vm_area_alloc(mm_b);
vma_b->vm_start = 0x9000;
vma_b->vm_file = shm_file;  // SAME file object!

// Map to SAME physical pages
for (unsigned long addr = vma_b->vm_start; addr < vma_b->vm_end; addr += PAGE_SIZE) {
    struct page *page = shmem_get_page(shm_file, offset);  // SAME pages!
    unsigned long pfn = page_to_pfn(page);

    pte_t *pte = get_pte(mm_b, addr);
    set_pte(pte, pfn_pte(pfn, vma_b->vm_page_prot));
}
Result: Two different virtual addresses (0x7000 and 0x9000) both resolve to the same physical memory (0x5000). This is page table aliasing.

2.4 System V Shared Memory (Legacy API)

#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <string.h>

#define SHM_KEY 1234
#define SHM_SIZE 4096

// Creator process
int main() {
    // Create shared memory segment
    int shmid = shmget(SHM_KEY, SHM_SIZE, IPC_CREAT | 0666);
    if (shmid == -1) {
        perror("shmget");
        return 1;
    }

    // Attach to address space
    char *shm = shmat(shmid, NULL, 0);
    if (shm == (char*)-1) {
        perror("shmat");
        return 1;
    }

    // Write data
    strcpy(shm, "System V shared memory!");

    printf("Data written, shmid: %d\n", shmid);

    // Detach
    shmdt(shm);

    // Don't delete yet - let reader access it
    sleep(5);

    // Mark for deletion (actual deletion happens when all detach)
    shmctl(shmid, IPC_RMID, NULL);

    return 0;
}

// Reader process
int main() {
    int shmid = shmget(SHM_KEY, SHM_SIZE, 0666);
    if (shmid == -1) {
        perror("shmget");
        return 1;
    }

    char *shm = shmat(shmid, NULL, SHM_RDONLY);
    printf("Read: %s\n", shm);

    shmdt(shm);
    return 0;
}
Persistence: System V shared memory persists until explicitly deleted with IPC_RMID or system reboot. Use ipcs -m to list and ipcrm -m <shmid> to delete orphaned segments.

2.5 Synchronization: The Critical Challenge

Problem: Shared memory provides NO synchronization. Multiple processes accessing the same memory simultaneously will corrupt data.

Race Condition Example

// BROKEN CODE - Race condition
struct shared_data {
    int counter;  // Shared counter
};

// Both Process A and B execute this:
shm->counter++;  // NOT ATOMIC!

// Assembly (what actually happens):
// 1. mov eax, [counter]  ; Read
// 2. inc eax             ; Increment
// 3. mov [counter], eax  ; Write

// If both processes interleave:
// A: Read (0)
// B: Read (0)
// A: Inc (1)
// B: Inc (1)
// A: Write (1)
// B: Write (1)
// Result: 1 (should be 2!)

Correct Solution

// CORRECT - Using semaphore
struct shared_data {
    sem_t mutex;
    int counter;
};

// Initialize (once)
sem_init(&shm->mutex, 1, 1);

// Each process does:
sem_wait(&shm->mutex);  // Lock
shm->counter++;
sem_post(&shm->mutex);  // Unlock

// Or using atomic operations:
__sync_fetch_and_add(&shm->counter, 1);
// or C11 atomics:
atomic_fetch_add(&shm->counter, 1);
Producer-Consumer with Shared Memory:
#include <sys/mman.h>
#include <fcntl.h>
#include <semaphore.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define SHM_NAME "/pc_shm"
#define BUFFER_SIZE 10

struct shared_buffer {
    sem_t mutex;      // Mutual exclusion
    sem_t empty;      // Count of empty slots
    sem_t full;       // Count of full slots
    int buffer[BUFFER_SIZE];
    int in;           // Producer index
    int out;          // Consumer index
};

// Producer
int main() {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    ftruncate(shm_fd, sizeof(struct shared_buffer));

    struct shared_buffer *shm = mmap(NULL, sizeof(struct shared_buffer),
                                     PROT_READ | PROT_WRITE,
                                     MAP_SHARED, shm_fd, 0);

    // Initialize semaphores
    sem_init(&shm->mutex, 1, 1);
    sem_init(&shm->empty, 1, BUFFER_SIZE);  // Initially all empty
    sem_init(&shm->full, 1, 0);             // Initially none full
    shm->in = 0;
    shm->out = 0;

    // Produce items
    for (int item = 0; item < 20; item++) {
        sem_wait(&shm->empty);  // Wait for empty slot
        sem_wait(&shm->mutex);  // Lock

        shm->buffer[shm->in] = item;
        printf("Produced: %d\n", item);
        shm->in = (shm->in + 1) % BUFFER_SIZE;

        sem_post(&shm->mutex);  // Unlock
        sem_post(&shm->full);   // Signal new full slot

        usleep(100000);  // Simulate work
    }

    munmap(shm, sizeof(struct shared_buffer));
    return 0;
}

// Consumer
int main() {
    int shm_fd = shm_open(SHM_NAME, O_RDWR, 0666);
    struct shared_buffer *shm = mmap(NULL, sizeof(struct shared_buffer),
                                     PROT_READ | PROT_WRITE,
                                     MAP_SHARED, shm_fd, 0);

    // Consume items
    for (int i = 0; i < 20; i++) {
        sem_wait(&shm->full);   // Wait for full slot
        sem_wait(&shm->mutex);  // Lock

        int item = shm->buffer[shm->out];
        printf("Consumed: %d\n", item);
        shm->out = (shm->out + 1) % BUFFER_SIZE;

        sem_post(&shm->mutex);  // Unlock
        sem_post(&shm->empty);  // Signal new empty slot

        usleep(150000);  // Simulate work
    }

    munmap(shm, sizeof(struct shared_buffer));
    shm_unlink(SHM_NAME);
    return 0;
}

2.6 Huge Pages for Shared Memory

For large shared memory regions (GB+), using huge pages (2MB or 1GB instead of 4KB) reduces TLB pressure and improves performance.
#include <sys/mman.h>
#include <stdio.h>

#define SHM_SIZE (2 * 1024 * 1024)  // 2 MB

int main() {
    // Allocate using huge pages
    void *shm = mmap(NULL, SHM_SIZE,
                     PROT_READ | PROT_WRITE,
                     MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB,
                     -1, 0);

    if (shm == MAP_FAILED) {
        perror("mmap huge pages");
        // Fall back to regular pages
        shm = mmap(NULL, SHM_SIZE,
                   PROT_READ | PROT_WRITE,
                   MAP_SHARED | MAP_ANONYMOUS,
                   -1, 0);
    } else {
        printf("Allocated 2MB huge page\n");
    }

    // Use shared memory...

    munmap(shm, SHM_SIZE);
    return 0;
}
Performance Impact:
  • Regular 4KB pages: 1 GB = 262,144 page table entries
  • 2MB huge pages: 1 GB = 512 page table entries
  • TLB misses: Reduced by ~99%

3. Message Queues: Structured Communication

Message Queues provide message-oriented communication with built-in synchronization and priority handling.

3.1 POSIX Message Queues

#include <mqueue.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define QUEUE_NAME "/my_queue"
#define MAX_MSG_SIZE 256
#define MAX_MESSAGES 10

// Sender
int main() {
    struct mq_attr attr;
    attr.mq_flags = 0;
    attr.mq_maxmsg = MAX_MESSAGES;
    attr.mq_msgsize = MAX_MSG_SIZE;
    attr.mq_curmsgs = 0;

    mqd_t mq = mq_open(QUEUE_NAME, O_CREAT | O_WRONLY, 0644, &attr);
    if (mq == (mqd_t)-1) {
        perror("mq_open");
        return 1;
    }

    // Send messages with different priorities
    const char *msg1 = "Low priority message";
    const char *msg2 = "High priority message";
    const char *msg3 = "Medium priority message";

    mq_send(mq, msg1, strlen(msg1) + 1, 1);   // Priority 1
    mq_send(mq, msg2, strlen(msg2) + 1, 10);  // Priority 10
    mq_send(mq, msg3, strlen(msg3) + 1, 5);   // Priority 5

    printf("Messages sent\n");

    mq_close(mq);
    return 0;
}

// Receiver
int main() {
    mqd_t mq = mq_open(QUEUE_NAME, O_RDONLY);
    if (mq == (mqd_t)-1) {
        perror("mq_open");
        return 1;
    }

    struct mq_attr attr;
    mq_getattr(mq, &attr);

    char *buffer = malloc(attr.mq_msgsize);
    unsigned int prio;

    // Receive messages (highest priority first)
    while (mq_receive(mq, buffer, attr.mq_msgsize, &prio) >= 0) {
        printf("Received (priority %u): %s\n", prio, buffer);
    }

    free(buffer);
    mq_close(mq);
    mq_unlink(QUEUE_NAME);
    return 0;
}
Output:
Received (priority 10): High priority message
Received (priority 5): Medium priority message
Received (priority 1): Low priority message

3.2 Kernel Implementation

POSIX message queues are implemented in the kernel as a priority-sorted list:
// Simplified from ipc/mqueue.c

struct mqueue_inode_info {
    spinlock_t lock;
    struct inode vfs_inode;
    wait_queue_head_t wait_q;

    struct msg_msg **messages;  // Array of messages
    struct mq_attr attr;

    struct list_head e_wait_q[2];  // 0=recv waiters, 1=send waiters
};

struct msg_msg {
    struct list_head m_list;
    long m_type;          // Priority
    size_t m_ts;          // Message size
    void *m_data;         // Message data
    // Followed by actual message data
};

// Send operation
static int do_mq_timedsend(mqd_t mqdes, const char *msg_ptr,
                           size_t msg_len, unsigned int msg_prio) {
    struct mqueue_inode_info *info = get_mqueue_info(mqdes);

    spin_lock(&info->lock);

    if (info->attr.mq_curmsgs >= info->attr.mq_maxmsg) {
        // Queue full - block or return error
        if (mqdes->f_flags & O_NONBLOCK) {
            spin_unlock(&info->lock);
            return -EAGAIN;
        }
        // Wait for space...
    }

    // Allocate message
    struct msg_msg *msg = alloc_msg(msg_len);
    copy_from_user(msg->m_data, msg_ptr, msg_len);
    msg->m_type = msg_prio;

    // Insert in priority order
    insert_message_sorted(info->messages, msg, msg_prio);
    info->attr.mq_curmsgs++;

    // Wake up receivers
    wake_up(&info->wait_q);

    spin_unlock(&info->lock);
    return 0;
}
Priority Queue Implementation: Messages are stored in a sorted array or priority queue. When receiving, the kernel returns the highest priority message in O(1) or O(log N) time.

3.3 Asynchronous Notification

Message queues support asynchronous notification via signals:
#include <mqueue.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>

mqd_t mq;

void message_handler(int sig, siginfo_t *info, void *context) {
    // Called when message arrives
    printf("Message arrived! Reading...\n");

    char buffer[256];
    unsigned int prio;

    ssize_t bytes = mq_receive(mq, buffer, sizeof(buffer), &prio);
    if (bytes >= 0) {
        printf("Received: %s\n", buffer);
    }

    // Re-register for next notification
    struct sigevent sev;
    sev.sigev_notify = SIGEV_SIGNAL;
    sev.sigev_signo = SIGUSR1;
    mq_notify(mq, &sev);
}

int main() {
    // Set up signal handler
    struct sigaction sa;
    sa.sa_flags = SA_SIGINFO;
    sa.sa_sigaction = message_handler;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGUSR1, &sa, NULL);

    // Open queue
    mq = mq_open("/my_queue", O_RDONLY | O_CREAT, 0644, NULL);

    // Register for notification
    struct sigevent sev;
    sev.sigev_notify = SIGEV_SIGNAL;
    sev.sigev_signo = SIGUSR1;
    mq_notify(mq, &sev);

    printf("Waiting for messages...\n");

    // Main loop can do other work
    while (1) {
        sleep(1);
    }

    mq_close(mq);
    return 0;
}

3.4 Performance Comparison

// Benchmark: Send 100,000 messages

// Using Pipe (byte stream)
// - No message boundaries
// - Must implement framing protocol
// - Throughput: ~3 GB/s
// - Latency: ~2 µs

// Using Message Queue
// - Built-in message boundaries
// - Priority support
// - Throughput: ~500 MB/s (slower due to overhead)
// - Latency: ~5 µs

4. Signals: Asynchronous Interrupts

Signals are “software interrupts” that allow asynchronous notification of events.

4.1 Signal Fundamentals

Standard Signals:
SignalNumberDefaultDescription
SIGHUP1TermHangup (terminal closed)
SIGINT2TermInterrupt (Ctrl+C)
SIGQUIT3CoreQuit (Ctrl+\)
SIGILL4CoreIllegal instruction
SIGTRAP5CoreTrace/breakpoint trap
SIGABRT6CoreAbort signal
SIGBUS7CoreBus error
SIGFPE8CoreFloating-point exception
SIGKILL9TermCannot be caught
SIGSEGV11CoreSegmentation fault
SIGPIPE13TermBroken pipe
SIGALRM14TermTimer expired
SIGTERM15TermTermination signal
SIGUSR110TermUser-defined 1
SIGUSR212TermUser-defined 2
SIGCHLD17IgnChild stopped/terminated
SIGCONT18ContContinue if stopped
SIGSTOP19StopCannot be caught

4.2 Signal Handler Installation

#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

void sigint_handler(int sig) {
    // Async-signal-safe: only use safe functions!
    const char msg[] = "Caught SIGINT!\n";
    write(STDOUT_FILENO, msg, sizeof(msg) - 1);

    // Re-install handler (for old signal() API)
    // Not needed with sigaction()
}

void sigterm_handler(int sig) {
    const char msg[] = "Caught SIGTERM, cleaning up...\n";
    write(STDOUT_FILENO, msg, sizeof(msg) - 1);
    _exit(0);  // _exit is async-signal-safe (exit is NOT)
}

int main() {
    // Modern way: sigaction (preferred)
    struct sigaction sa_int;
    memset(&sa_int, 0, sizeof(sa_int));
    sa_int.sa_handler = sigint_handler;
    sigemptyset(&sa_int.sa_mask);  // Don't block other signals
    sa_int.sa_flags = 0;

    if (sigaction(SIGINT, &sa_int, NULL) == -1) {
        perror("sigaction SIGINT");
        return 1;
    }

    struct sigaction sa_term;
    memset(&sa_term, 0, sizeof(sa_term));
    sa_term.sa_handler = sigterm_handler;
    sigemptyset(&sa_term.sa_mask);
    sigaddset(&sa_term.sa_mask, SIGINT);  // Block SIGINT during SIGTERM handler
    sa_term.sa_flags = 0;

    if (sigaction(SIGTERM, &sa_term, NULL) == -1) {
        perror("sigaction SIGTERM");
        return 1;
    }

    printf("PID: %d\n", getpid());
    printf("Send signals: kill -INT %d  or  kill -TERM %d\n", getpid(), getpid());

    // Main loop
    while (1) {
        printf("Working...\n");
        sleep(2);
    }

    return 0;
}

4.3 The Signal Delivery Mechanism (Deep Dive)

What happens when Process B sends a signal to Process A?
1. Process B calls kill(pid_A, SIGINT)
   ├─> Syscall entry: sys_kill()
   └─> Kernel validates permission (same user or root)

2. Kernel sets pending signal bit
   ├─> task_struct *task_A = find_task_by_pid(pid_A)
   ├─> sigaddset(&task_A->pending.signal, SIGINT)
   └─> If task_A is sleeping, wake it up

3. Context switch to Process A
   ├─> Before returning to user space, kernel checks pending signals
   └─> do_signal() is called

4. Signal frame construction
   ├─> Save current context (registers, stack pointer, instruction pointer)
   ├─> Allocate signal frame on user stack:
   │   ┌─────────────────┐  ← Original stack pointer
   │   │  Return address │  (to __restore_rt)
   │   ├─────────────────┤
   │   │  Signal number  │  (SIGINT = 2)
   │   ├─────────────────┤
   │   │  siginfo_t      │  (signal info)
   │   ├─────────────────┤
   │   │  ucontext_t     │  (saved registers)
   │   │   - RIP (PC)    │
   │   │   - RSP (SP)    │
   │   │   - RAX, RBX... │
   │   └─────────────────┘  ← New stack pointer

   ├─> Modify saved user RIP to point to signal handler
   └─> Return to user space

5. Signal handler executes
   ├─> Process A "resumes" at handler address
   ├─> Handler runs: sigint_handler(2)
   └─> Handler returns

6. Signal return trampoline
   ├─> Return address points to __restore_rt (kernel-provided code)
   ├─> __restore_rt calls rt_sigreturn() syscall
   └─> Kernel restores original context from signal frame

7. Resume normal execution
   └─> Process A continues where it was interrupted
Kernel Code (simplified from kernel/signal.c):
// Step 2: Send signal
int kill_something_info(int sig, struct siginfo *info, pid_t pid) {
    struct task_struct *p = find_task_by_vpid(pid);

    // Check permission
    if (!kill_ok_by_cred(p))
        return -EPERM;

    // Add to pending signals
    sigaddset(&p->pending.signal, sig);

    // Wake up if sleeping
    signal_wake_up(p, sig == SIGKILL || sig == SIGSTOP);

    return 0;
}

// Step 3-4: Deliver signal
static void handle_signal(struct ksignal *ksig, struct pt_regs *regs) {
    struct task_struct *task = current;
    sigset_t *oldset = sigmask_to_save();

    // Build signal frame on user stack
    if (setup_rt_frame(ksig, oldset, regs) < 0) {
        // Failed - force SIGSEGV
        force_sig(SIGSEGV);
        return;
    }

    // Clear handled signal from pending
    sigdelset(&task->pending.signal, ksig->sig);
}

// Build signal frame
static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
                          struct pt_regs *regs) {
    struct rt_sigframe __user *frame;

    // Allocate frame on user stack
    frame = get_sigframe(&ksig->ka, regs, sizeof(*frame));

    // Fill in signal frame
    put_user(ksig->sig, &frame->sig);
    copy_siginfo_to_user(&frame->info, &ksig->info);

    // Save register context
    frame->uc.uc_mcontext.rip = regs->ip;
    frame->uc.uc_mcontext.rsp = regs->sp;
    frame->uc.uc_mcontext.rax = regs->ax;
    // ... save all registers ...

    // Set return address to restorer
    put_user(__NR_rt_sigreturn, &frame->retcode);

    // Modify user-space RIP to point to handler
    regs->ip = (unsigned long)ksig->ka.sa.sa_handler;
    regs->sp = (unsigned long)frame;

    return 0;
}

// Step 6: Return from signal handler
SYSCALL_DEFINE0(rt_sigreturn) {
    struct pt_regs *regs = current_pt_regs();
    struct rt_sigframe __user *frame;

    frame = (struct rt_sigframe __user *)(regs->sp - sizeof(long));

    // Restore register context
    regs->ip = frame->uc.uc_mcontext.rip;
    regs->sp = frame->uc.uc_mcontext.rsp;
    regs->ax = frame->uc.uc_mcontext.rax;
    // ... restore all registers ...

    return regs->ax;  // Return value of interrupted syscall
}

4.4 Async-Signal-Safety: The Critical Constraint

Problem: A signal can interrupt a process anywhere, including inside non-reentrant functions.
// DANGEROUS CODE
char *global_buffer = NULL;

void signal_handler(int sig) {
    // WRONG: malloc is NOT async-signal-safe
    global_buffer = malloc(100);
    sprintf(global_buffer, "Signal %d", sig);  // Also wrong
    printf("Handled: %s\n", global_buffer);    // Also wrong
    free(global_buffer);  // Also wrong
}

int main() {
    signal(SIGINT, signal_handler);

    // What if SIGINT arrives HERE?
    global_buffer = malloc(200);  // Holding malloc's internal lock
    // Signal handler interrupts, tries to call malloc()
    // → DEADLOCK (waiting for lock it already holds)

    free(global_buffer);
    return 0;
}
Why this deadlocks:
Thread execution timeline:

Time    Main Thread              Signal Handler
────────────────────────────────────────────────
  1     malloc(200)
  2       acquire(heap_lock)
  3       [allocating memory...]

  4     ← SIGINT arrives! →      signal_handler()
  5                               malloc(100)
  6                                 try acquire(heap_lock)
  7                                 ⏸️  BLOCKED (waiting for lock)
  8     ⏸️  Cannot continue        ⏸️  Still blocked
  9     (handler must finish)

DEADLOCK: Main thread waiting for handler to finish
          Handler waiting for main thread to release lock
Async-Signal-Safe Functions (partial list from POSIX):
// Safe to call from signal handlers:
_exit()    // NOT exit()
write()
read()
open()
close()
signal()   // Signal manipulation
sigaction()
kill()
getpid()
alarm()
pause()
// ... about 100 total
Correct Pattern 1: Use only write()
void signal_handler(int sig) {
    const char msg[] = "Signal received\n";
    write(STDOUT_FILENO, msg, sizeof(msg) - 1);
}
Correct Pattern 2: Set a flag, check in main loop
volatile sig_atomic_t got_signal = 0;

void signal_handler(int sig) {
    got_signal = sig;  // sig_atomic_t writes are atomic
}

int main() {
    signal(SIGINT, signal_handler);

    while (!got_signal) {
        // Do work...
    }

    printf("Received signal %d\n", got_signal);  // Safe: in main context
    return 0;
}

4.5 Realtime Signals (SIGRTMIN - SIGRTMAX)

Realtime signals (32-64 on Linux) have additional features:
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void rt_signal_handler(int sig, siginfo_t *info, void *context) {
    printf("Received signal %d\n", sig);
    printf("  Sender PID: %d\n", info->si_pid);
    printf("  Sender UID: %d\n", info->si_uid);
    printf("  Value (int): %d\n", info->si_value.sival_int);
    printf("  Value (ptr): %p\n", info->si_value.sival_ptr);
}

int main() {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_flags = SA_SIGINFO;  // Use extended handler
    sa.sa_sigaction = rt_signal_handler;

    sigaction(SIGRTMIN, &sa, NULL);
    sigaction(SIGRTMIN + 1, &sa, NULL);

    printf("PID: %d\n", getpid());
    printf("Waiting for realtime signals...\n");

    while (1) {
        pause();
    }

    return 0;
}

// Sending realtime signals with data:
// sigqueue(pid, SIGRTMIN, (union sigval){.sival_int = 42});
Realtime Signal Features:
  1. Queued: Multiple instances of the same signal are queued (standard signals are not)
  2. Ordered: Delivered in priority order (lower signal numbers first)
  3. Data: Can send an integer or pointer with the signal
// Sender
union sigval value;
value.sival_int = 12345;
sigqueue(target_pid, SIGRTMIN, value);

value.sival_int = 67890;
sigqueue(target_pid, SIGRTMIN, value);  // Both will be delivered

// Receiver gets both signals with their respective values

4.6 Signal Masking and Blocking

#include <signal.h>
#include <stdio.h>
#include <unistd.h>

void handler(int sig) {
    printf("Handler started for signal %d\n", sig);
    sleep(3);  // Simulate long-running handler
    printf("Handler finished for signal %d\n", sig);
}

int main() {
    signal(SIGUSR1, handler);
    signal(SIGUSR2, handler);

    sigset_t mask, oldmask;

    // Block SIGUSR2
    sigemptyset(&mask);
    sigaddset(&mask, SIGUSR2);
    sigprocmask(SIG_BLOCK, &mask, &oldmask);

    printf("SIGUSR2 is now blocked\n");
    printf("PID: %d\n", getpid());

    // SIGUSR1 will be handled, SIGUSR2 will be pending
    sleep(10);

    printf("Unblocking SIGUSR2...\n");
    sigprocmask(SIG_SETMASK, &oldmask, NULL);

    // Pending SIGUSR2 signals now delivered
    sleep(5);

    return 0;
}
Test:
# Terminal 1
./signal_mask
# Note the PID

# Terminal 2
kill -USR1 <pid>  # Handled immediately
kill -USR2 <pid>  # Blocked, becomes pending
kill -USR2 <pid>  # Another one (but standard signals don't queue)
# After program unblocks, only ONE SIGUSR2 is delivered

5. Unix Domain Sockets: High-Speed Local IPC

Unix Domain Sockets (UDS) provide socket API semantics for local IPC with superior performance compared to network sockets.

5.1 Stream Sockets (SOCK_STREAM)

#include <sys/socket.h>
#include <sys/un.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define SOCKET_PATH "/tmp/my_socket"

// Server
int main() {
    int server_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    unlink(SOCKET_PATH);  // Remove if exists

    if (bind(server_fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
        perror("bind");
        return 1;
    }

    listen(server_fd, 5);
    printf("Server listening on %s\n", SOCKET_PATH);

    int client_fd = accept(server_fd, NULL, NULL);
    printf("Client connected\n");

    char buffer[256];
    ssize_t n = recv(client_fd, buffer, sizeof(buffer), 0);
    printf("Received: %s\n", buffer);

    const char *response = "Hello from server!";
    send(client_fd, response, strlen(response) + 1, 0);

    close(client_fd);
    close(server_fd);
    unlink(SOCKET_PATH);

    return 0;
}

// Client
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    if (connect(sock_fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
        perror("connect");
        return 1;
    }

    const char *msg = "Hello from client!";
    send(sock_fd, msg, strlen(msg) + 1, 0);

    char buffer[256];
    recv(sock_fd, buffer, sizeof(buffer), 0);
    printf("Received: %s\n", buffer);

    close(sock_fd);
    return 0;
}

5.2 Datagram Sockets (SOCK_DGRAM)

// Server
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_DGRAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, "/tmp/server_socket", sizeof(addr.sun_path) - 1);

    unlink("/tmp/server_socket");
    bind(sock_fd, (struct sockaddr*)&addr, sizeof(addr));

    char buffer[256];
    struct sockaddr_un client_addr;
    socklen_t client_len = sizeof(client_addr);

    ssize_t n = recvfrom(sock_fd, buffer, sizeof(buffer), 0,
                         (struct sockaddr*)&client_addr, &client_len);
    printf("Received: %s from %s\n", buffer, client_addr.sun_path);

    // Send response
    const char *response = "Server response";
    sendto(sock_fd, response, strlen(response) + 1, 0,
           (struct sockaddr*)&client_addr, client_len);

    close(sock_fd);
    unlink("/tmp/server_socket");
    return 0;
}

// Client
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_DGRAM, 0);

    // Client needs to bind too (for return address)
    struct sockaddr_un client_addr;
    memset(&client_addr, 0, sizeof(client_addr));
    client_addr.sun_family = AF_UNIX;
    snprintf(client_addr.sun_path, sizeof(client_addr.sun_path),
             "/tmp/client_%d", getpid());

    unlink(client_addr.sun_path);
    bind(sock_fd, (struct sockaddr*)&client_addr, sizeof(client_addr));

    struct sockaddr_un server_addr;
    memset(&server_addr, 0, sizeof(server_addr));
    server_addr.sun_family = AF_UNIX;
    strncpy(server_addr.sun_path, "/tmp/server_socket",
            sizeof(server_addr.sun_path) - 1);

    const char *msg = "Client message";
    sendto(sock_fd, msg, strlen(msg) + 1, 0,
           (struct sockaddr*)&server_addr, sizeof(server_addr));

    char buffer[256];
    recvfrom(sock_fd, buffer, sizeof(buffer), 0, NULL, NULL);
    printf("Received: %s\n", buffer);

    close(sock_fd);
    unlink(client_addr.sun_path);
    return 0;
}

5.3 Passing File Descriptors (SCM_RIGHTS)

This is the “superpower” of Unix Domain Sockets - the ability to pass open file descriptors between processes.
#include <sys/socket.h>
#include <sys/un.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define SOCKET_PATH "/tmp/fd_socket"

// Send FD
void send_fd(int sock_fd, int fd_to_send) {
    struct msghdr msg = {0};
    struct iovec iov[1];
    char buf[1] = {'X'};  // Dummy data

    iov[0].iov_base = buf;
    iov[0].iov_len = 1;
    msg.msg_iov = iov;
    msg.msg_iovlen = 1;

    // Control message for FD
    char cmsg_buf[CMSG_SPACE(sizeof(int))];
    msg.msg_control = cmsg_buf;
    msg.msg_controllen = sizeof(cmsg_buf);

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_RIGHTS;
    cmsg->cmsg_len = CMSG_LEN(sizeof(int));

    memcpy(CMSG_DATA(cmsg), &fd_to_send, sizeof(int));

    if (sendmsg(sock_fd, &msg, 0) == -1) {
        perror("sendmsg");
    } else {
        printf("Sent FD %d\n", fd_to_send);
    }
}

// Receive FD
int recv_fd(int sock_fd) {
    struct msghdr msg = {0};
    struct iovec iov[1];
    char buf[1];

    iov[0].iov_base = buf;
    iov[0].iov_len = 1;
    msg.msg_iov = iov;
    msg.msg_iovlen = 1;

    char cmsg_buf[CMSG_SPACE(sizeof(int))];
    msg.msg_control = cmsg_buf;
    msg.msg_controllen = sizeof(cmsg_buf);

    if (recvmsg(sock_fd, &msg, 0) == -1) {
        perror("recvmsg");
        return -1;
    }

    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    if (cmsg && cmsg->cmsg_type == SCM_RIGHTS) {
        int received_fd;
        memcpy(&received_fd, CMSG_DATA(cmsg), sizeof(int));
        printf("Received FD %d\n", received_fd);
        return received_fd;
    }

    return -1;
}

// Sender (privileged process)
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    unlink(SOCKET_PATH);
    bind(sock_fd, (struct sockaddr*)&addr, sizeof(addr));
    listen(sock_fd, 1);

    printf("Waiting for connection...\n");
    int client_fd = accept(sock_fd, NULL, NULL);

    // Open a file (this process has permission)
    int file_fd = open("/var/log/syslog", O_RDONLY);
    if (file_fd == -1) {
        perror("open");
        return 1;
    }

    printf("Opened /var/log/syslog with FD %d\n", file_fd);

    // Send FD to unprivileged process
    send_fd(client_fd, file_fd);

    close(file_fd);
    close(client_fd);
    close(sock_fd);
    unlink(SOCKET_PATH);

    return 0;
}

// Receiver (unprivileged process)
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    if (connect(sock_fd, (struct sockaddr*)&addr, sizeof(addr)) == -1) {
        perror("connect");
        return 1;
    }

    // Receive file descriptor
    int received_fd = recv_fd(sock_fd);

    if (received_fd >= 0) {
        // Now we can read the file, even though we don't have permission!
        char buffer[256];
        ssize_t n = read(received_fd, buffer, sizeof(buffer) - 1);
        if (n > 0) {
            buffer[n] = '\0';
            printf("Read from file:\n%s\n", buffer);
        }

        close(received_fd);
    }

    close(sock_fd);
    return 0;
}
Kernel Magic: When sending a FD via SCM_RIGHTS:
Sender Process                 Kernel                  Receiver Process
┌──────────────┐              ┌──────┐                ┌──────────────┐
│ FD Table     │              │      │                │ FD Table     │
│  3 → file_A  │──sendmsg()──>│      │                │              │
│  4 → file_B  │              │      │                │              │
└──────────────┘              │      │                └──────────────┘
                              │      │
                              │ 1. Get struct file*   │
                              │    from sender FD     │
                              │ 2. Increment refcount │
                              │ 3. Find free FD in    │
                              │    receiver's table   │
                              │ 4. Install struct file*│
                              │    in receiver's table│
                              │      │                │
                              │      │<─recvmsg()─────┤
                              └──────┘                │
                                                      │ FD Table     │
                                                      │  5 → file_A  │
                                                      └──────────────┘

Both FD 3 (sender) and FD 5 (receiver) now point to the SAME kernel file object!
Use Cases:
  • Privilege separation: Privileged broker passes FDs to sandboxed workers (Chrome, systemd)
  • Zero-copy: Pass a socket FD to another process for load balancing
  • Capability-based security: Grant access to specific resources without filesystem permissions

5.4 Credentials Passing (SO_PEERCRED)

#include <sys/socket.h>
#include <sys/un.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

#define SOCKET_PATH "/tmp/cred_socket"

// Server
int main() {
    int server_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    unlink(SOCKET_PATH);
    bind(server_fd, (struct sockaddr*)&addr, sizeof(addr));
    listen(server_fd, 1);

    printf("Server waiting for connection...\n");
    int client_fd = accept(server_fd, NULL, NULL);

    // Get peer credentials
    struct ucred cred;
    socklen_t len = sizeof(cred);

    if (getsockopt(client_fd, SOL_SOCKET, SO_PEERCRED, &cred, &len) == 0) {
        printf("Client credentials:\n");
        printf("  PID: %d\n", cred.pid);
        printf("  UID: %d\n", cred.uid);
        printf("  GID: %d\n", cred.gid);

        // Authorization decision
        if (cred.uid == 0) {
            printf("Access granted (root user)\n");
        } else {
            printf("Access denied (non-root user)\n");
        }
    }

    close(client_fd);
    close(server_fd);
    unlink(SOCKET_PATH);

    return 0;
}

// Client (just connects)
int main() {
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, SOCKET_PATH, sizeof(addr.sun_path) - 1);

    if (connect(sock_fd, (struct sockaddr*)&addr, sizeof(addr)) == 0) {
        printf("Connected to server\n");
        sleep(2);  // Let server check credentials
    }

    close(sock_fd);
    return 0;
}
Security: The kernel provides these credentials, so they cannot be forged (unlike network protocols where client can claim any identity).

5.5 Abstract Namespace Sockets

Linux supports “abstract” Unix sockets that don’t exist in the filesystem:
#include <sys/socket.h>
#include <sys/un.h>
#include <stdio.h>
#include <string.h>

int main() {
    int sock_fd = socket(AF_UNIX, SOCK_STREAM, 0);

    struct sockaddr_un addr;
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;

    // Abstract socket: first byte is '\0'
    addr.sun_path[0] = '\0';
    strncpy(addr.sun_path + 1, "my_abstract_socket", sizeof(addr.sun_path) - 2);

    // No unlink() needed - doesn't exist in filesystem!
    bind(sock_fd, (struct sockaddr*)&addr,
         sizeof(sa_family_t) + strlen(addr.sun_path + 1) + 1);

    listen(sock_fd, 5);
    printf("Abstract socket bound\n");

    // Check with: netstat -lnpx | grep my_abstract

    // ...

    close(sock_fd);
    // No unlink() needed!

    return 0;
}
Advantages:
  • No filesystem clutter
  • No permission/ownership issues
  • Automatically cleaned up on close
  • No race conditions with unlink()
Disadvantages:
  • Linux-specific (not portable)
  • Can’t use filesystem permissions for access control

6. Performance Comparison & Selection Guide

6.1 Throughput Benchmark

#include <sys/socket.h>
#include <sys/un.h>
#include <sys/mman.h>
#include <semaphore.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <time.h>

#define DATA_SIZE (100 * 1024 * 1024)  // 100 MB
#define CHUNK_SIZE 4096

double benchmark_pipe() {
    int pipefd[2];
    pipe(pipefd);

    if (fork() == 0) {
        close(pipefd[0]);
        char data[CHUNK_SIZE];
        for (size_t i = 0; i < DATA_SIZE; i += CHUNK_SIZE) {
            write(pipefd[1], data, CHUNK_SIZE);
        }
        close(pipefd[1]);
        _exit(0);
    } else {
        close(pipefd[1]);
        char buffer[CHUNK_SIZE];

        struct timespec start, end;
        clock_gettime(CLOCK_MONOTONIC, &start);

        size_t total = 0;
        while (total < DATA_SIZE) {
            ssize_t n = read(pipefd[0], buffer, CHUNK_SIZE);
            if (n <= 0) break;
            total += n;
        }

        clock_gettime(CLOCK_MONOTONIC, &end);
        close(pipefd[0]);
        wait(NULL);

        double elapsed = (end.tv_sec - start.tv_sec) +
                        (end.tv_nsec - start.tv_nsec) / 1e9;
        return DATA_SIZE / (1024.0 * 1024.0) / elapsed;
    }
}

double benchmark_unix_socket() {
    int sv[2];
    socketpair(AF_UNIX, SOCK_STREAM, 0, sv);

    if (fork() == 0) {
        close(sv[0]);
        char data[CHUNK_SIZE];
        for (size_t i = 0; i < DATA_SIZE; i += CHUNK_SIZE) {
            send(sv[1], data, CHUNK_SIZE, 0);
        }
        close(sv[1]);
        _exit(0);
    } else {
        close(sv[1]);
        char buffer[CHUNK_SIZE];

        struct timespec start, end;
        clock_gettime(CLOCK_MONOTONIC, &start);

        size_t total = 0;
        while (total < DATA_SIZE) {
            ssize_t n = recv(sv[0], buffer, CHUNK_SIZE, 0);
            if (n <= 0) break;
            total += n;
        }

        clock_gettime(CLOCK_MONOTONIC, &end);
        close(sv[0]);
        wait(NULL);

        double elapsed = (end.tv_sec - start.tv_sec) +
                        (end.tv_nsec - start.tv_nsec) / 1e9;
        return DATA_SIZE / (1024.0 * 1024.0) / elapsed;
    }
}

double benchmark_shared_memory() {
    struct shared_data {
        sem_t sem_empty;
        sem_t sem_full;
        char buffer[CHUNK_SIZE];
    };

    struct shared_data *shm = mmap(NULL, sizeof(struct shared_data),
                                   PROT_READ | PROT_WRITE,
                                   MAP_SHARED | MAP_ANONYMOUS, -1, 0);

    sem_init(&shm->sem_empty, 1, 1);
    sem_init(&shm->sem_full, 1, 0);

    if (fork() == 0) {
        for (size_t i = 0; i < DATA_SIZE; i += CHUNK_SIZE) {
            sem_wait(&shm->sem_empty);
            // Data already in shared memory
            sem_post(&shm->sem_full);
        }
        _exit(0);
    } else {
        struct timespec start, end;
        clock_gettime(CLOCK_MONOTONIC, &start);

        for (size_t i = 0; i < DATA_SIZE; i += CHUNK_SIZE) {
            sem_wait(&shm->sem_full);
            // Read data from shared memory
            volatile char x = shm->buffer[0];  // Force read
            sem_post(&shm->sem_empty);
        }

        clock_gettime(CLOCK_MONOTONIC, &end);
        wait(NULL);

        double elapsed = (end.tv_sec - start.tv_sec) +
                        (end.tv_nsec - start.tv_nsec) / 1e9;
        munmap(shm, sizeof(struct shared_data));
        return DATA_SIZE / (1024.0 * 1024.0) / elapsed;
    }
}

int main() {
    printf("Pipe:           %.2f MB/s\n", benchmark_pipe());
    printf("Unix Socket:    %.2f MB/s\n", benchmark_unix_socket());
    printf("Shared Memory:  %.2f MB/s\n", benchmark_shared_memory());
    return 0;
}

6.2 Selection Decision Tree

Need IPC?

  ├─ Is it local-only? (same machine)
  │  │
  │  ├─ YES: Continue below
  │  │
  │  └─ NO: Must use Network Sockets (TCP/UDP)

  ├─ Need maximum throughput? (>1 GB/s)
  │  │
  │  ├─ YES: Shared Memory
  │  │       ⚠️  Requires manual synchronization
  │  │       ⚠️  More complex programming model
  │  │
  │  └─ NO: Continue below

  ├─ Need to pass file descriptors?
  │  │
  │  ├─ YES: Unix Domain Sockets (SCM_RIGHTS)
  │  │       Use case: Privilege separation, sandboxing
  │  │
  │  └─ NO: Continue below

  ├─ Need message boundaries?
  │  │
  │  ├─ YES: Need priority support?
  │  │  │
  │  │  ├─ YES: Message Queues (POSIX or System V)
  │  │  │
  │  │  └─ NO: Unix Domain Sockets (SOCK_DGRAM)
  │  │
  │  └─ NO: Stream-based communication
  │         │
  │         ├─ Related processes (parent-child)?
  │         │  │
  │         │  ├─ YES: Pipe (simplest)
  │         │  │
  │         │  └─ NO: Unix Domain Sockets (SOCK_STREAM)
  │         │         or Named Pipe (FIFO)

  └─ Asynchronous notification only?

     └─ Signals (but consider self-pipe trick for event loops)

7. Advanced Topics

7.1 splice() and vmsplice() - Zero-Copy Pipes

Linux provides splice() and vmsplice() for zero-copy operations:
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>

#define SPLICE_SIZE (64 * 1024)

int main() {
    int pipefd[2];
    pipe(pipefd);

    int file_fd = open("large_file.bin", O_RDONLY);
    int out_fd = open("copy.bin", O_WRONLY | O_CREAT, 0644);

    // Zero-copy: file → pipe → file (no user-space copy!)
    while (1) {
        ssize_t n = splice(file_fd, NULL, pipefd[1], NULL,
                          SPLICE_SIZE, SPLICE_F_MOVE);
        if (n <= 0) break;

        splice(pipefd[0], NULL, out_fd, NULL,
               n, SPLICE_F_MOVE);
    }

    close(file_fd);
    close(out_fd);
    close(pipefd[0]);
    close(pipefd[1]);

    return 0;
}
How it works:
Traditional copy:
File → [Kernel buffer] → [User buffer] → [Kernel buffer] → File
       (copy 1)          (copy 2)         (copy 3)

splice():
File → [Pipe buffer] → File
       (page remap)    (page remap)

No user-space copies! Kernel just moves page pointers.

7.2 memfd and File Sealing

memfd_create() creates anonymous file descriptors that can be shared and sealed:
#include <sys/mman.h>
#include <sys/syscall.h>
#include <linux/memfd.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    // Create memory-backed file descriptor
    int memfd = syscall(SYS_memfd_create, "my_memfd", MFD_ALLOW_SEALING);

    const char *data = "Shared data";
    write(memfd, data, strlen(data));

    // Seal to prevent further writes (security)
    fcntl(memfd, F_ADD_SEALS, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_GROW);

    // Now share this FD via Unix socket
    // Receiver cannot modify the contents!

    // Map into memory
    void *addr = mmap(NULL, 4096, PROT_READ, MAP_SHARED, memfd, 0);
    printf("Data: %s\n", (char*)addr);

    // Try to write (will fail due to seal)
    if (write(memfd, "X", 1) == -1) {
        printf("Write failed (sealed)\n");
    }

    munmap(addr, 4096);
    close(memfd);
    return 0;
}
Use case: Wayland compositor sharing pixmaps with clients. Sealing prevents malicious client from modifying the buffer after sharing.

8. Real-World Architecture Patterns

8.1 Chrome Multi-Process Architecture

┌─────────────────────────────────────────────────┐
│             Browser Process (Privileged)         │
│  - Creates renderer processes                   │
│  - Opens files, sockets, devices                │
│  - Passes FDs via Unix sockets (SCM_RIGHTS)     │
└─────────┬───────────────────────────────────────┘
          │ Unix Sockets + SCM_RIGHTS
          ├──────────────┬──────────────┬─────────────┐
          │              │              │             │
┌─────────▼───────┐ ┌───▼──────┐ ┌────▼─────┐ ┌─────▼─────┐
│ Renderer (tab1) │ │ Renderer │ │ Renderer │ │    GPU    │
│   Sandboxed     │ │  (tab2)  │ │  (tab3)  │ │  Process  │
│ - No filesystem │ │Sandboxed │ │Sandboxed │ │           │
│   access        │ │          │ │          │ │           │
│ - Receives FDs  │ └──────────┘ └──────────┘ └───────────┘
│   from broker   │
└─────────────────┘

IPC Mechanisms:
1. Mojo (message passing framework over Unix sockets)
2. Shared memory for large data (bitmaps, etc.)
3. SCM_RIGHTS for capability transfer

8.2 systemd Socket Activation

// systemd passes activated socket FDs to service
// Service receives pre-bound socket (FD 3+)

#include <systemd/sd-daemon.h>
#include <sys/socket.h>
#include <stdio.h>

int main() {
    int n = sd_listen_fds(0);

    if (n > 0) {
        // systemd passed us activated sockets
        int listen_fd = SD_LISTEN_FDS_START + 0;  // First socket

        printf("Received socket FD %d from systemd\n", listen_fd);

        // No need to socket(), bind(), listen()!
        // Just accept() connections
        while (1) {
            int client_fd = accept(listen_fd, NULL, NULL);
            // Handle client...
            close(client_fd);
        }
    } else {
        printf("Not socket-activated, manual setup needed\n");
    }

    return 0;
}
Benefits:
  • Zero-downtime restarts (systemd holds socket)
  • On-demand activation
  • Privilege separation (systemd binds privileged port, passes FD to unprivileged service)

9. Interview Questions & Answers

Problem: Signal handlers can interrupt a process anywhere, including inside non-reentrant functions like malloc(). This severely limits what you can do in a signal handler.Solution: The self-pipe trick:
  1. Create a pipe: pipe(signal_pipe)
  2. In signal handler: write(signal_pipe[1], &sig, 1) (write is async-signal-safe)
  3. Add signal_pipe[0] to your event loop (epoll, select, poll)
  4. When pipe becomes readable, main loop reads signal number and handles it safely
Why it works: The signal handler only does minimal work (one async-signal-safe write). The actual signal handling happens in the main event loop where all functions are safe to call.Used in: Redis, Nginx, Node.js, any event-driven server.
Concept: Two different processes map the same physical memory pages into their virtual address spaces.Mechanism:
  1. Process A calls mmap(MAP_SHARED) on shared memory object
  2. Kernel creates VMA (Virtual Memory Area) in Process A’s address space
  3. Kernel allocates physical pages (or uses existing ones for the shared memory object)
  4. Kernel updates Process A’s page tables: Virtual Page → Physical Frame
  5. Process B calls mmap(MAP_SHARED) on the SAME shared memory object
  6. Kernel creates VMA in Process B’s address space (different virtual address)
  7. Kernel updates Process B’s page tables to point to the SAME physical frames
Result: Two different virtual addresses resolve to the same physical memory. This is page table aliasing.Example:
Process A: Virtual 0x7000 → Physical Frame 0x5000
Process B: Virtual 0x9000 → Physical Frame 0x5000

Write by A to 0x7000 is immediately visible to B at 0x9000.
Key insight: The kernel doesn’t copy data. It just manipulates page table entries to create multiple mappings to the same physical pages.
SCM_RIGHTS: A Unix Domain Socket control message type that allows passing open file descriptors between processes.Kernel Operation:
  1. Sender has FD 3 → struct file* (kernel object)
  2. Sender calls sendmsg() with SCM_RIGHTS control message containing FD 3
  3. Kernel increments reference count of the struct file object
  4. Kernel finds free FD slot in receiver’s FD table (e.g., FD 5)
  5. Kernel installs same struct file* pointer in receiver’s FD table at slot 5
  6. Receiver now has FD 5 pointing to the same kernel file object
Privilege Separation Pattern:
Privileged Broker Process:
- Runs as root or with capabilities
- Opens protected resources (files, sockets, devices)
- Validates requests from workers
- Passes FDs to workers via SCM_RIGHTS

Unprivileged Worker Process:
- Runs with minimal privileges (nobody user, chroot jail)
- Cannot open files directly
- Requests resources from broker
- Receives FDs and can use them (read/write)
Example: Chrome browser process (privileged) opens files and passes FDs to renderer processes (sandboxed, no filesystem access).Security benefit: Capability-based security. Worker gets access to specific resource instances, not broad permissions.
PIPE_BUF: Linux defines it as 4096 bytes (one page).Atomicity Guarantee: If two processes write ≤ 4096 bytes simultaneously, the kernel guarantees the data won’t interleave.Implementation:
// Kernel holds pipe mutex for entire write when size <= PIPE_BUF
if (len <= PIPE_BUF) {
    mutex_lock(&pipe->mutex);
    // Copy entire write to pipe buffer
    // No other process can write during this time
    mutex_unlock(&pipe->mutex);
}
For writes > PIPE_BUF:
// Kernel may release mutex between chunks
while (bytes_remaining > 0) {
    mutex_lock(&pipe->mutex);
    copy_chunk_to_pipe_buffer();
    mutex_unlock(&pipe->mutex);
    // Another process can write here!
}
Example:
Process A writes 8000 bytes "AAA..."
Process B writes 8000 bytes "BBB..."

Possible result in pipe:
AAAA (4096) BBBB (4096) AAAA (remainder) BBBB (remainder)

Data is interleaved!
Solution: If you need atomicity for large messages:
  1. Use message queues (message-oriented)
  2. Use Unix sockets with framing protocol
  3. Use shared memory with proper locking
  4. Break into ≤4096 byte messages with sequence numbers
System V IPC (shmget, semget, msgget):Pros:
  • Kernel-persistent (survives process death until reboot or manual deletion)
  • Well-established, available on all Unix systems
  • Atomic operations (semop with multiple sem ops)
Cons:
  • Awkward API (ftok for key generation, numeric IDs)
  • No integration with file descriptors (can’t poll/select)
  • Requires manual cleanup (orphaned segments persist)
  • Global namespace (key collisions possible)
POSIX IPC (shm_open, sem_open, mq_open):Pros:
  • Clean API (name-based, like files)
  • FD-based (can use poll/select on message queues)
  • Better integration with modern APIs
  • Filesystem-like semantics (/dev/shm)
Cons:
  • Not truly kernel-persistent (typically backed by tmpfs)
  • Less widely available on old Unix systems
  • Platform differences (macOS limits)
Decision Matrix:
RequirementChoice
Need to survive process crashesSystem V
Integration with event loopsPOSIX (FD-based)
Modern codebasePOSIX
Legacy Unix compatibilitySystem V
Need poll/select on message queuePOSIX
Complex semaphore operationsSystem V (semop)
Recommendation: Use POSIX IPC for new projects unless you specifically need System V features.
Risks:
  1. Race Conditions:
    • Problem: Multiple processes accessing shared memory without synchronization
    • Result: Data corruption, security-critical state corruption
    • Example: Authentication flag flipped by race condition
  2. Information Leakage:
    • Problem: Uninitialized memory in shared region
    • Result: Process A’s secrets leaked to Process B
    • Example: Crypto keys left in shared buffer
  3. Unauthorized Access:
    • Problem: Permissive shm permissions (0666)
    • Result: Any process can attach and read/write
  4. Memory Exhaustion:
    • Problem: Process allocates huge shared memory segments
    • Result: Denial of service (no memory for other processes)
  5. Persistence Issues (System V):
    • Problem: Orphaned shared memory segments
    • Result: Memory leak, potential reuse by attacker
Mitigations:
// 1. Proper synchronization
sem_t *sem = sem_open("/my_sem", O_CREAT, 0600, 1);
sem_wait(sem);
// Critical section
sem_post(sem);

// 2. Zero/initialize memory
memset(shm, 0, shm_size);

// 3. Restrict permissions
int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0600);  // Owner only

// 4. Check size limits
struct rlimit limit;
getrlimit(RLIMIT_MEMLOCK, &limit);

// 5. Explicit cleanup
shm_unlink("/my_shm");

// 6. Use memfd with sealing (Linux)
int memfd = memfd_create("sealed", MFD_ALLOW_SEALING);
fcntl(memfd, F_ADD_SEALS, F_SEAL_WRITE | F_SEAL_SHRINK | F_SEAL_GROW);
// Now share FD - receiver cannot modify

// 7. Validate all shared data (trust nothing)
if (shm->size > MAX_ALLOWED_SIZE) {
    // Attacker may have corrupted size field
    error("Invalid size");
}
Best Practices:
  • Treat shared memory as untrusted input (even from “trusted” processes)
  • Use capabilities/SELinux to limit which processes can access shared memory
  • Monitor for orphaned segments (ipcs -m, /dev/shm)
  • Consider using Unix sockets instead (better isolation, kernel-mediated)
Signal Delivery Process:
1. Signal Generation:
   - kill(pid, SIGINT) syscall
   - Hardware exception (SIGSEGV)
   - Timer expiration (SIGALRM)

2. Kernel marks signal pending:
   sigaddset(&task->pending.signal, SIGINT)

3. Before returning to user space:
   - Kernel checks for pending signals
   - Calls do_signal()

4. Signal Frame Construction:
   ┌─── User Stack ──┐
   │   ...           │  ← Original SP
   ├─────────────────┤
   │  Return addr    │  → Points to __restore_rt
   ├─────────────────┤
   │  sig (int)      │  Signal number
   ├─────────────────┤
   │  siginfo_t      │  Signal info
   ├─────────────────┤
   │  ucontext_t     │  Saved context:
   │   uc_mcontext:  │
   │     rip = 0x... │  Instruction pointer
   │     rsp = 0x... │  Stack pointer
   │     rax = 0x... │  All registers
   │     rbx = 0x... │
   │     ...         │
   └─────────────────┘  ← New SP

5. Modify user context:
   regs->ip = handler_address;
   regs->sp = &signal_frame;

6. Return to user space:
   - Process "resumes" at handler
   - Handler executes
   - Handler returns

7. Trampoline (__restore_rt):
   - Automatically called when handler returns
   - Calls rt_sigreturn() syscall

8. rt_sigreturn syscall:
   - Restores registers from signal frame
   - Restores original RIP, RSP
   - Process continues where interrupted
Code:
// Kernel builds frame (simplified)
struct rt_sigframe *frame = (void*)(user_sp - sizeof(*frame));

frame->uc.uc_mcontext.rip = regs->ip;  // Save current PC
frame->uc.uc_mcontext.rsp = regs->sp;  // Save current SP
// ... save all registers ...

regs->ip = (unsigned long)handler;      // Jump to handler
regs->sp = (unsigned long)frame;        // New stack

// Handler finishes, returns to __restore_rt:
__asm__("mov $15, %rax");  // __NR_rt_sigreturn
__asm__("syscall");

// Kernel restores:
regs->ip = frame->uc.uc_mcontext.rip;  // Restore original PC
regs->sp = frame->uc.uc_mcontext.rsp;  // Restore original SP
Key Insight: The signal frame is a “snapshot” of the process state that allows the kernel to resume execution exactly where it was interrupted.
Benchmark Results (100 MB data transfer):
MechanismThroughputLatency (RTT)CopiesSyscalls
Shared Memory28,000 MB/s0.5 µs02 (sem)
Unix Socket3,200 MB/s2 µs12
Pipe2,800 MB/s2 µs12
TCP Localhost1,500 MB/s8 µs22
Why Shared Memory is Fastest:
Pipe/Socket:
User A Buffer → [Kernel Copy] → User B Buffer
              ↑                ↑
           write()          read()

Shared Memory:
User A Buffer ← Same Physical Memory → User B Buffer
                (No copies!)
When NOT to use shared memory:
  • Small messages (less than 4KB): Synchronization overhead dominates
  • Infrequent communication: Setup cost not amortized
  • Simple protocols: Complexity not worth it
  • Need message boundaries: Pipes/sockets handle this
Optimization Tips:
  1. For Pipes/Sockets:
    // Use larger buffer sizes
    int sndbuf = 1024 * 1024;  // 1 MB
    setsockopt(fd, SOL_SOCKET, SO_SNDBUF, &sndbuf, sizeof(sndbuf));
    
    // Batch writes
    struct iovec iov[10];
    writev(fd, iov, 10);  // One syscall for multiple buffers
    
  2. For Shared Memory:
    // Use lock-free data structures
    atomic_int head, tail;
    
    // Batch operations to amortize synchronization
    while (count < BATCH_SIZE) {
        shm->buffer[count++] = data;
    }
    sem_post(&full);  // One signal for batch
    
  3. Use splice() for zero-copy:
    splice(in_fd, NULL, pipe_fd[1], NULL, size, 0);
    splice(pipe_fd[0], NULL, out_fd, NULL, size, 0);
    // No user-space copy!
    

10. Debugging IPC

Listing Active IPC Objects

# System V IPC
ipcs -a           # All IPC objects
ipcs -m           # Shared memory segments
ipcs -q           # Message queues
ipcs -s           # Semaphore arrays

# POSIX IPC
ls -la /dev/shm   # POSIX shared memory
ls -la /dev/mqueue # POSIX message queues

# Unix sockets
netstat -lnpx     # Listening Unix sockets
ss -xlp           # Unix sockets (modern)

# Pipes
lsof -p <pid> | grep pipe

Cleaning Orphaned IPC

# Remove System V shared memory
ipcrm -m <shmid>

# Remove System V message queue
ipcrm -q <msqid>

# Remove POSIX shared memory
rm /dev/shm/my_shm

# Remove POSIX message queue
rm /dev/mqueue/my_queue

Tracing IPC with strace

# Trace all IPC-related syscalls
strace -e trace=ipc <command>

# Trace pipe operations
strace -e trace=pipe,pipe2,read,write <command>

# Trace shared memory
strace -e trace=shmat,shmdt,shmget,mmap,munmap <command>

# Trace Unix sockets
strace -e trace=socket,connect,bind,listen,accept,sendmsg,recvmsg <command>

# Trace signals
strace -e trace=signal,kill,rt_sigreturn <command>

Multi-Mechanism Lab: Producer–Consumer Three Ways

Implement the same producer–consumer pattern using three different IPC mechanisms to feel the differences firsthand.

Variant A: Pipe

// pipe_prodcons.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/wait.h>

int main() {
    int fd[2];
    pipe(fd);
    if (fork() == 0) {
        close(fd[1]);
        char buf[64];
        ssize_t n;
        while ((n = read(fd[0], buf, sizeof(buf))) > 0) {
            printf("Consumer got: %.*s\n", (int)n, buf);
        }
        exit(0);
    }
    close(fd[0]);
    for (int i = 0; i < 5; i++) {
        char msg[64];
        snprintf(msg, sizeof(msg), "Message %d", i);
        write(fd[1], msg, strlen(msg) + 1);
    }
    close(fd[1]);
    wait(NULL);
    return 0;
}

Variant B: Shared Memory + Semaphore

// shm_prodcons.c (simplified skeleton)
#include <fcntl.h>
#include <sys/mman.h>
#include <semaphore.h>
#include <unistd.h>
#include <string.h>

int main() {
    int fd = shm_open("/prodcons", O_CREAT | O_RDWR, 0600);
    ftruncate(fd, 4096);
    char *buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
    sem_t *sem = sem_open("/prodcons_sem", O_CREAT, 0600, 0);

    if (fork() == 0) {
        for (int i = 0; i < 5; i++) {
            sem_wait(sem);
            printf("Consumer: %s\n", buf);
        }
        exit(0);
    }
    for (int i = 0; i < 5; i++) {
        snprintf(buf, 4096, "Message %d", i);
        sem_post(sem);
        usleep(100000);
    }
    wait(NULL);
    shm_unlink("/prodcons");
    sem_unlink("/prodcons_sem");
    return 0;
}
Key difference: Zero copy, but you must manage synchronization yourself.

Variant C: Unix Domain Socket

// uds_prodcons.c (producer side)
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

int main() {
    int sock = socket(AF_UNIX, SOCK_STREAM, 0);
    struct sockaddr_un addr = { .sun_family = AF_UNIX };
    strcpy(addr.sun_path, "/tmp/prodcons.sock");
    unlink(addr.sun_path);
    bind(sock, (struct sockaddr *)&addr, sizeof(addr));
    listen(sock, 1);

    int client = accept(sock, NULL, NULL);
    for (int i = 0; i < 5; i++) {
        char msg[64];
        snprintf(msg, sizeof(msg), "Message %d", i);
        send(client, msg, strlen(msg) + 1, 0);
    }
    close(client);
    close(sock);
    unlink(addr.sun_path);
    return 0;
}
Key difference: FD passing is possible, and you get stream/datagram semantics.

What to Observe

  • Latency: Shared memory is fastest (no kernel copy).
  • Complexity: Pipe is simplest; shared memory requires explicit sync.
  • Flexibility: Unix sockets support FD passing and can be converted to network sockets easily.

Summary

IPC Mechanisms:
MechanismSpeedUse Case
PipesFastParent-child, byte streams
FIFOsFastUnrelated processes, byte streams
Shared MemoryFastestHigh throughput, zero-copy
Message QueuesMediumStructured messages, priorities
SignalsSlowAsynchronous notifications
Unix SocketsFastFD passing, credentials, flexibility
Key Takeaways:
  1. Pipes: Simple but powerful. Remember PIPE_BUF atomicity and closing unused ends.
  2. Shared Memory: Fastest IPC via page table aliasing. Requires manual synchronization. Security-critical applications must validate all shared data.
  3. Message Queues: Built-in priority and message boundaries. Overhead makes them slower than pipes for high throughput.
  4. Signals: Async-signal-safety is critical. Use self-pipe trick for event loops. Modern apps prefer signalfd (Linux).
  5. Unix Sockets: Versatile. SCM_RIGHTS enables capability-based security. SO_PEERCRED provides kernel-verified credentials.
  6. Performance: Shared memory > Unix sockets > Pipes > TCP. But complexity also increases in that order.
Real-World Patterns:
  • Chrome: Unix sockets + SCM_RIGHTS for sandboxing
  • X11/Wayland: Shared memory for pixmap transfer
  • systemd: Socket activation with FD passing
  • Databases: Shared memory for buffer pools

Next: Synchronization & Locks

Interview Deep-Dive

Strong Answer Framework:
  1. Define the workload first. Throughput target (msgs/sec, bytes/sec), message size distribution, latency budget (p99), durability requirement, number of producers/consumers, crash semantics.
  2. Eliminate clearly-wrong choices. If you need cross-host: UDS is out. If you need durability: shared memory and pipes are out (data is gone if either side crashes). If you need backpressure: signals are out (no flow control).
  3. Compare the survivors on the metric that dominates:
    • Latency-sensitive, small messages, single producer/consumer: shared memory ring buffer with futex notification. ~50-100 nanoseconds per message. Used by HFT systems, LMAX Disruptor.
    • High throughput, multiple producers, small/medium messages: Unix domain sockets, SOCK_DGRAM. Each sendmsg is one atomic message. Kernel manages buffering and backpressure. ~1-3 microseconds per message. Used by journald, rsyslog.
    • Bulk transfer, large payloads, infrequent: shared memory (mmap) with semaphore. Zero copy beats everything for big payloads. Used by databases (PostgreSQL shared_buffers), Wayland (pixel buffers).
  4. Address the specific question — a high-throughput producer/consumer on one machine usually wants Unix domain sockets unless you are above ~1M msgs/sec, at which point shared memory + ring buffer becomes worth the complexity.
  5. Sketch the chosen design: socket(AF_UNIX, SOCK_DGRAM, 0), bind to an abstract namespace path (\0prodcons — no filesystem cleanup needed), producers connect with connect(), consumer uses epoll to multiplex, set SO_RCVBUF to 4MB or higher to avoid drops under burst.
  6. Mention the failure modes you have to handle: consumer slow → EAGAIN on producer with non-blocking sockets → application-level backpressure. Producer crashes → kernel auto-closes the socket → consumer sees EOF on that connection, no orphan resources.
Real-World Example: Facebook’s LogDevice writes its inter-thread queue using a shared-memory ring buffer (their MPMCQueue) and then ships data to remote nodes over TCP — they pick the right primitive at each layer. journald uses Unix domain sockets for log collection, with kernel buffering as the backpressure mechanism; in 2018 they added the journal-remote feature precisely because UDS does not cross hosts. The Aeron messaging framework (real-time finance) goes all-in on shared memory + busy-spin readers and hits ~10 million messages/second on a single core — the cost is dedicating that core to nothing else.
Senior Follow-up 1: When does shared memory beat UDS by enough to justify the complexity? Above ~500K msgs/sec or below 1us p99 latency. Below that, UDS is “fast enough” and the synchronization complexity of shared memory (correct memory ordering, robust mutex handling, head/tail wraparound) is not worth it. Always benchmark before choosing — measured numbers, not vibes.
Senior Follow-up 2: Why use SOCK_DGRAM over SOCK_STREAM for log shipping? SOCK_DGRAM gives message boundaries — one sendmsg = one recvmsg for the consumer, no framing protocol needed. For SOCK_STREAM you would have to add a length prefix or delimiter, which adds CPU on both sides. The cost: on Linux, SOCK_DGRAM on UDS is reliable (unlike UDP) but messages above wmem_max are dropped. Tune net.core.wmem_max if you have large messages.
Senior Follow-up 3: How does io_uring change this picture? io_uring (Linux 5.1+) lets you batch syscalls and avoid the syscall overhead per message. For UDS, the win is moderate (~20-30%) because the syscall itself is cheap. For TCP loopback or shared-memory + futex_wake patterns, io_uring + IORING_OP_SEND with multi-shot recv can dramatically reduce CPU. As of 2024, frameworks like ScyllaDB’s Seastar use io_uring extensively to cut IPC CPU overhead.
Common Wrong Answers:
  • “Always use shared memory because it is fastest” — ignores synchronization complexity and crash semantics.
  • “Use Kafka / Redis / RabbitMQ” — adds a network hop and a separate process to manage; usually overkill for same-host IPC.
  • “Pipes are the standard Unix way” — pipes do not work for many-to-one without per-producer FIFOs and have PIPE_BUF atomicity limits.
Further Reading:
  • LMAX Disruptor whitepaper — the canonical shared-memory ring buffer design.
  • Aeron architecture docs — how a high-performance messaging system layers shared memory and UDP.
  • Brendan Gregg, “Linux Performance” page — has benchmarks of UDS vs. loopback TCP.
Strong Answer Framework:
  1. Pick stream vs. datagram. SOCK_STREAM if requests/responses can exceed a single datagram or need ordering across messages. SOCK_DGRAM if every message is small (under ~64KB) and self-contained — you get message boundaries for free. For most RPC use cases, SOCK_STREAM with explicit framing is the standard.
  2. Server side setup:
    fd = socket(AF_UNIX, SOCK_STREAM, 0)
    bind(fd, sockaddr_un{sun_path = "/tmp/svc.sock"})  // or abstract: "\0svc"
    listen(fd, backlog=128)
    accept loop -> per-connection handler (thread, epoll, or io_uring)
    
  3. Client side:
    fd = socket(AF_UNIX, SOCK_STREAM, 0)
    connect(fd, sockaddr_un{...})
    send_request(); recv_response();
    
  4. Framing — the part candidates botch. Streams have no message boundaries, so you MUST frame. Two standard options:
    • Length prefix: 4-byte big-endian length, then length bytes of payload. Read exactly 4 bytes, then exactly N bytes. Simple, standard. Used by gRPC over UDS, Redis RESP.
    • Delimiter: e.g., newline-terminated JSON. Easy to debug with socat, slow due to per-byte scanning. Used by HTTP/1.1.
  5. Request-response pairing. For single-threaded clients, the next response on the wire matches the next request — order is implicit. For pipelined clients (multiple in-flight requests), assign an integer request ID; server echoes it in the response.
  6. Timeouts and cancellation. Set SO_RCVTIMEO so reads do not hang forever. For clean cancellation across both endpoints, signal via a separate “cancel” message or close the socket (the peer gets EPIPE/EOF).
  7. FD passing if needed. Use sendmsg with SCM_RIGHTS cmsg to ship file descriptors — this is what makes UDS uniquely powerful for local IPC. The receiving process gets a brand new FD pointing to the same kernel object.
Real-World Example: The Docker daemon serves its REST API on /var/run/docker.sock (UDS, SOCK_STREAM) using HTTP/1.1 framing. docker ps is just an HTTP GET /containers/json over UDS. systemd’s D-Bus broker also uses UDS with length-prefixed messages, and uses SCM_CREDENTIALS to authenticate the calling process by UID. Visual Studio Code’s language server protocol (LSP) uses UDS or pipes with JSON-RPC framing (Content-Length: N\r\n\r\n{...}).
Senior Follow-up 1: How do you handle partial reads on SOCK_STREAM? read(fd, buf, n) can return any value from 1 to n. You must loop: while (got < n) got += read(fd, buf+got, n-got);. Wrappers like recv_all() or read_exact() (Rust) encode this. On non-blocking sockets, EAGAIN means “no more data available right now; come back later” — combine with epoll edge-triggered mode for efficient event loops.
Senior Follow-up 2: How do you authenticate the peer on a Unix socket? getsockopt(fd, SOL_SOCKET, SO_PEERCRED, &cred, &len) returns struct ucred with the peer’s PID, UID, GID. The kernel populated this at connect time, so it is unforgeable — the peer cannot lie about who they are. This is how Polkit, D-Bus, and systemd authenticate clients without passwords. Linux-specific; BSD has LOCAL_PEERCRED or getpeereid.
Senior Follow-up 3: Why do some systems use abstract namespace sockets (path starts with \0)? Abstract sockets (sun_path[0] = '\0', then a name) live in a kernel namespace, not the filesystem. Advantages: no unlink needed at startup or shutdown (no stale socket files), no filesystem permission issues, automatically cleaned up when all FDs close. Disadvantage: Linux-only, and visible only within a network namespace (so containers each see their own).
Common Wrong Answers:
  • “Just read() once and process the buffer” — ignores partial reads on streams; will randomly fail under load.
  • “Use port 0 and let the kernel pick” — that is a TCP concept; UDS uses paths.
  • “TLS over Unix sockets for security” — normally unnecessary; UDS is local-only and SO_PEERCRED is more useful than TLS for local auth. TLS adds CPU for no benefit on UDS.
Further Reading:
  • Beej’s Guide to Unix IPC — the practical reference for socket programming with UDS.
  • unix(7) man page on Linux — definitive reference for UDS semantics.
  • gRPC source code — production-quality length-prefixed framing implementation.
Strong Answer Framework:
  1. State the core difference. Shared memory is “raw bytes you both can see, you handle synchronization.” Message queues are “kernel-managed mailbox: structured messages, built-in priority, kernel handles synchronization.”
  2. Shared memory wins on:
    • Throughput: zero copy. The kernel only does the page-table aliasing once at setup; all subsequent reads/writes are in-process.
    • Bulk transfers: a 1GB shared region costs the same as a 1KB region.
    • Tight latency: spin or futex_wait on a flag, ~50ns wakeup.
  3. Message queues win on:
    • Simplicity: kernel handles queueing, blocking, priority. No memory ordering bugs.
    • Discrete messages: mq_send is atomic at the message level, no framing protocol needed.
    • Priority delivery: messages are delivered in priority order (high priority first), useful for control-plane messages.
    • Crash safety: if a sender dies, queued messages survive until consumed or until queue is unlinked.
  4. The honest tradeoff matrix:
    ConcernShared MemMessage Queue
    ThroughputBestLimited by per-message syscall overhead
    Programming complexityHigh (manual sync, ordering)Low (just mq_send/mq_receive)
    Crash recoveryHard (dangling locks, robust mutexes)Easy (queue persists)
    Size limitsLimited only by RAMTight kernel defaults (/proc/sys/fs/mqueue/)
    Cross-process?YesYes
    Cross-host?NoNo
  5. Pick by workload: small structured messages, low frequency, need priority -> message queue. High-volume bulk data, single-digit microsecond latency required -> shared memory.
Real-World Example: PostgreSQL uses shared memory for its buffer pool (shared_buffers) — every backend process maps the same region, and access is coordinated by spinlocks and lwlocks in shared memory. This is why PostgreSQL can serve thousands of concurrent reads from cache without copying pages. Conversely, the Linux kernel’s audit subsystem uses a netlink socket (similar to a message queue) to send audit events to userspace — audit events are small, structured, and the kernel needs reliable delivery semantics that a shared ring buffer would not provide without complex coordination.
Senior Follow-up 1: Can you implement a “message queue” on top of shared memory? Yes — a shared-memory ring buffer with head/tail indices and a fixed message size is exactly that. Aeron and the LMAX Disruptor are essentially user-space message queues built on shared memory. They get higher throughput than POSIX mqueues because they avoid the per-message syscall, but require careful memory ordering and offer no priority semantics.
Senior Follow-up 2: What is mq_notify and when is it useful? mq_notify registers a one-shot notification: when a message arrives on a previously-empty queue, the kernel either sends a signal or spawns a thread (caller’s choice). Useful for waking an idle reader without polling. The catch: it is one-shot per registration, so you must re-arm after every wakeup. Modern code prefers mq_getattr + select/epoll on the queue’s FD (Linux only — mqueues are FDs in Linux).
Senior Follow-up 3: Why do most modern systems use neither, and prefer io_uring or eventfd-based protocols? POSIX mqueues have unfriendly limits, no batching, and no zero-copy. Shared memory has no kernel semantics for delivery. io_uring (Linux 5.1+) gives you submission queues + completion queues that ARE shared memory rings, but with kernel cooperation — syscalls only when you choose to enter the kernel. eventfd is a simpler primitive when you just need a counter to wake a waiter. The 2020s default for high-performance Linux IPC is “shared memory ring + eventfd or io_uring for notification.”
Common Wrong Answers:
  • “Shared memory is always better because it is faster” — ignores complexity cost and that mqueues handle priority and crash safety automatically.
  • “Message queues are deprecated” — they are legacy but still useful for simple structured IPC; many embedded and POSIX-compliant systems still rely on them.
  • “Use a database for IPC” — adds disk I/O, transactions, and a separate process; vastly slower for short-lived inter-process signaling.
Further Reading:
  • mq_overview(7) man page — definitive reference for POSIX message queues.
  • PostgreSQL src/backend/storage/lmgr/ — production-grade shared-memory locking.
  • Martin Thompson, “Mechanical Sympathy” blog — shared-memory design lessons from LMAX.
Strong Answer:
  • For a logging pipeline with multiple producers and one consumer, Unix domain sockets are the right choice. Here is why.
  • Pipes are limited: a regular pipe only works between related processes (parent-child), and named pipes (FIFOs) are unidirectional. With multiple producers writing to a single FIFO, you get interleaving problems — writes larger than PIPE_BUF (4096 bytes on Linux) are NOT atomic, so log lines can get mixed together. You would need one FIFO per producer, which complicates the aggregator.
  • Shared memory is the fastest (zero-copy), but you have to build your own synchronization. You need a ring buffer or bounded queue in shared memory, protected by futexes or semaphores. You also need to handle producer crashes gracefully (what if a producer dies while holding a lock on the shared buffer?). For a logging pipeline, this is over-engineering unless you need extreme throughput (millions of log lines per second).
  • Unix domain sockets (SOCK_STREAM or SOCK_DGRAM) are the sweet spot. Multiple producers can connect to a single socket path. SOCK_DGRAM gives you message boundaries (each sendmsg is one complete log line, no framing needed) and is atomic for messages up to the socket buffer size. The aggregator uses epoll to multiplex all producer connections. You get kernel-managed buffering, backpressure (slow consumer causes producers to block on send), and clean handling of producer crashes (the kernel closes the socket on process exit).
  • In production, this is exactly what rsyslog and journald use — Unix domain sockets for local log collection. If throughput demands exceed what Unix sockets can handle, I would move to shared memory with a ring buffer (like the LMAX Disruptor pattern), but that is only justified at millions of messages per second.
Follow-up: What is SCM_RIGHTS, and how does Chrome use it for sandboxing?SCM_RIGHTS is a mechanism for passing file descriptors between processes over a Unix domain socket using ancillary messages (sendmsg/recvmsg with cmsg). The sending process puts an fd number in the control message, and the kernel creates a new fd in the receiving process’s fd table pointing to the same underlying file/socket/device. Chrome uses this to implement its sandbox: the renderer process (which has no filesystem access via seccomp) cannot open files directly. Instead, the browser process opens the file and passes the fd to the renderer over a Unix socket. The renderer can read/write through the fd without ever having the capability to open arbitrary files. This is capability-based security at the OS level.
Strong Answer:
  • When a signal is sent to a process (via kill() or the kernel generating SIGSEGV), the kernel sets a bit in the target process’s pending signal mask. The signal is not delivered immediately — it is delivered when the process returns to user space (e.g., returning from a syscall, returning from an interrupt, or when the scheduler runs the process).
  • At the point of delivery, the kernel examines the process’s signal disposition. If the handler is SIG_DFL, the kernel performs the default action (terminate, ignore, stop). If a custom handler is installed, the kernel builds a “signal frame” on the process’s user-space stack: it saves the current registers (including the instruction pointer), pushes the signal number and siginfo, and modifies the process’s instruction pointer to point to the signal handler. When the handler returns (via sigreturn() syscall), the kernel restores the saved registers and the process resumes where it was interrupted.
  • The gotcha: signal handlers interrupt the process at arbitrary points. If the main code is in the middle of updating a data structure, the signal handler runs with that structure in an inconsistent state. This is why only “async-signal-safe” functions (a small subset — write, _exit, signal, etc.) can be safely called in a signal handler. Calling malloc, printf, or acquiring a mutex in a signal handler is undefined behavior.
  • The self-pipe trick: to integrate signal handling with an event loop (epoll/select), you create a pipe, and the signal handler writes a single byte to the pipe. The event loop monitors the pipe’s read end alongside other file descriptors. When a signal arrives, the pipe becomes readable, and the event loop handles it in its normal, non-interrupted context where it is safe to call any function. Modern Linux provides signalfd() which provides the same functionality without needing a pipe — the kernel delivers signal information as readable data on a file descriptor.
Follow-up: What happens if a signal arrives while the process is blocked in a syscall like read()?It depends on the SA_RESTART flag. By default (without SA_RESTART), the syscall is interrupted and returns -1 with errno set to EINTR. The application must check for EINTR and retry the syscall. With SA_RESTART set on the signal handler, the kernel automatically restarts the interrupted syscall after the signal handler returns. Not all syscalls are restartable — some (like select, poll, nanosleep) are never automatically restarted because their timeout semantics make restart ambiguous. This EINTR handling is one of the most common sources of bugs in systems programming.
Strong Answer:
  • When Process A creates a shared memory segment (via shmget + shmat, or mmap with MAP_SHARED on a file or memfd), the kernel allocates physical frames and creates a VMA (virtual memory area) in A’s address space. The page table entries for that VMA point to those physical frames.
  • When Process B attaches to the same shared memory segment, the kernel creates a VMA in B’s address space and sets up B’s page table entries to point to the SAME physical frames. This is “page table aliasing” — two different virtual addresses (in different processes) map to the same physical pages. The MMU translates both to the same location in DRAM.
  • Writes by A are immediately visible to B (and vice versa) because they are reading/writing the same physical memory. However, “immediately visible” has caveats: on x86 (TSO), stores by one core are visible to other cores in order after the store buffer drains (which happens relatively quickly). On ARM (weak model), you need explicit memory barriers (or atomic operations) to ensure visibility.
  • The kernel does NOT provide any synchronization for shared memory. If A and B both write to the same location without coordination, you get a data race. The processes must use their own synchronization: POSIX semaphores (sem_init with pshared=1), futexes (the kernel-assisted mutex primitive that underlies pthread_mutex when initialized with PTHREAD_PROCESS_SHARED), or atomic operations on shared variables.
  • Performance: shared memory is the fastest IPC because there is zero data copying — both processes access the same physical memory. The only overhead is the synchronization mechanism. This is why databases (PostgreSQL’s shared_buffers), display servers (Wayland’s buffer sharing), and high-frequency trading systems use shared memory for their hot paths.
Follow-up: What is memfd_create, and why was it added when we already had shm_open?memfd_create() creates an anonymous file in RAM (backed by tmpfs) that is not linked to any filesystem path. It returns a file descriptor that can be passed to other processes via SCM_RIGHTS over a Unix socket or inherited via fork. The advantage over shm_open is that it has no namespace collision risk (no path to conflict with), it can be sealed (using F_SEAL flags to prevent resizing or writing, providing security guarantees for zero-copy data sharing), and it works naturally with mmap and fd passing. Wayland compositors use memfd_create + fd passing + sealing to share pixel buffers between the client and compositor without any risk of the client resizing the buffer while the compositor reads it.