Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Signals & Inter-Process Communication

Understanding signals and IPC is essential for debugging, container orchestration, and building robust systems. This module covers the kernel implementation of these critical mechanisms.
Interview Frequency: High (especially signal handling)
Key Topics: Signal delivery, handlers, shared memory, pipes, Unix sockets
Time to Master: 10-12 hours

Signals Overview

Signals are software interrupts for processes:
┌─────────────────────────────────────────────────────────────────────────────┐
│                        SIGNAL DELIVERY                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Signal Sources                         Target Process                       │
│  ──────────────                         ──────────────                       │
│                                                                              │
│  ┌─────────────┐                        ┌─────────────────────────────────┐ │
│  │   Kernel    │                        │                                 │ │
│  │ (SIGSEGV,   │                        │  task_struct                    │ │
│  │  SIGKILL)   │─────┐                  │  ├── signal (signal_struct)     │ │
│  └─────────────┘     │                  │  │   └── pending signals       │ │
│                      │                  │  ├── sighand (sighand_struct)   │ │
│  ┌─────────────┐     │     signal       │  │   └── handlers[]            │ │
│  │   Another   │     └────────────────► │  ├── blocked (sigset_t)        │ │
│  │   Process   │─────────────────────►  │  │   └── blocked signals       │ │
│  │  (kill())   │                        │  └── TIF_SIGPENDING flag       │ │
│  └─────────────┘     ┌────────────────► │                                 │ │
│                      │                  └─────────────────────────────────┘ │
│  ┌─────────────┐     │                                                      │
│  │   Terminal  │     │                                                      │
│  │   (Ctrl+C,  │─────┘                                                      │
│  │    Ctrl+Z)  │                                                            │
│  └─────────────┘                                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Signal Types

Standard Signals (1-31)

SignalNumberDefault ActionDescription
SIGHUP1TerminateHangup detected
SIGINT2TerminateInterrupt from keyboard (Ctrl+C)
SIGQUIT3Core dumpQuit from keyboard (Ctrl+)
SIGILL4Core dumpIllegal instruction
SIGTRAP5Core dumpTrace/breakpoint trap
SIGABRT6Core dumpAbort signal from abort()
SIGBUS7Core dumpBus error (bad memory access)
SIGFPE8Core dumpFloating-point exception
SIGKILL9TerminateKill signal (cannot be caught)
SIGSEGV11Core dumpInvalid memory reference
SIGPIPE13TerminateBroken pipe
SIGALRM14TerminateTimer signal from alarm()
SIGTERM15TerminateTermination signal
SIGCHLD17IgnoreChild stopped or terminated
SIGCONT18ContinueContinue if stopped
SIGSTOP19StopStop process (cannot be caught)
SIGTSTP20StopStop from terminal (Ctrl+Z)

Real-Time Signals (32-64)

// Real-time signals are queued (unlike standard signals)
// and can carry data

union sigval {
    int sival_int;
    void *sival_ptr;
};

// Send with data
sigqueue(pid, SIGRTMIN + 1, (union sigval){ .sival_int = 42 });

Signal Kernel Implementation

Signal Data Structures

// include/linux/sched/signal.h
struct signal_struct {
    refcount_t sigcnt;
    atomic_t live;
    struct list_head thread_head;
    
    // Shared pending signals (for thread group)
    struct sigpending shared_pending;
    
    // Job control
    int group_exit_code;
    int group_stop_count;
    unsigned int flags;
    
    // Resource limits, timers, etc.
    struct rlimit rlim[RLIM_NLIMITS];
    // ...
};

// Per-thread signal handling
struct sighand_struct {
    spinlock_t siglock;
    refcount_t count;
    struct k_sigaction action[_NSIG];  // Handler for each signal
    // ...
};

// Signal action
struct k_sigaction {
    struct sigaction sa;
};

struct sigaction {
    __sighandler_t sa_handler;     // SIG_DFL, SIG_IGN, or handler
    unsigned long sa_flags;         // SA_SIGINFO, SA_RESTART, etc.
    sigset_t sa_mask;              // Signals to block during handler
};

Signal Delivery Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SIGNAL DELIVERY INTERNALS                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. SIGNAL GENERATION (kill(), kernel, terminal)                            │
│     ──────────────────────────────────────────                              │
│     do_send_sig_info()                                                      │
│     └── __send_signal()                                                     │
│         ├── Allocate sigqueue structure                                     │
│         ├── Add to target's pending queue                                   │
│         ├── Set TIF_SIGPENDING flag                                         │
│         └── Wake up target if needed                                        │
│                                                                              │
│  2. SIGNAL DELIVERY (on return to user space)                               │
│     ─────────────────────────────────────────                               │
│     exit_to_user_mode_prepare()                                             │
│     └── do_signal()                                                         │
│         ├── get_signal() - dequeue pending signal                          │
│         ├── Check: blocked? ignored?                                        │
│         └── handle_signal()                                                 │
│             ├── Setup signal frame on user stack                            │
│             ├── Save registers, return address                              │
│             └── Set RIP to handler address                                  │
│                                                                              │
│  3. HANDLER EXECUTION (user space)                                          │
│     ──────────────────────────────                                          │
│     User's signal handler runs                                              │
│     └── Returns via sigreturn syscall                                       │
│         └── Kernel restores original context                                │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Signal Frame (x86-64)

// arch/x86/include/asm/sigframe.h
struct rt_sigframe {
    char __user *pretcode;           // Return address (sigreturn)
    struct ucontext_t uc;            // User context
    struct siginfo info;             // Signal info
    // ... saved FPU state ...
};

struct ucontext_t {
    unsigned long uc_flags;
    struct ucontext_t *uc_link;
    stack_t uc_stack;                // Alternate signal stack
    struct sigcontext_64 uc_mcontext; // Saved registers
    sigset_t uc_sigmask;             // Blocked signals
};

Signal Handling Best Practices

Async-Signal-Safe Functions

// ONLY these functions are safe to call from signal handlers:
// - write() (not printf!)
// - _exit() (not exit!)
// - signal()
// - Simple flag setting

volatile sig_atomic_t got_signal = 0;

void handler(int sig) {
    got_signal = 1;  // Safe: atomic write
    
    // UNSAFE in signal handler:
    // printf("Got signal\n");  // NO! Uses locks
    // malloc(100);             // NO! Uses locks
    // exit(0);                 // NO! Calls atexit handlers
    
    // Safe:
    write(STDERR_FILENO, "Got signal\n", 11);
}

int main() {
    signal(SIGINT, handler);
    while (!got_signal) {
        // Do work
    }
    printf("Exiting cleanly\n");  // Safe here, not in handler
    return 0;
}

sigaction() vs signal()

// Prefer sigaction() over signal()
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sa.sa_flags = SA_RESTART;           // Restart interrupted syscalls
sigemptyset(&sa.sa_mask);
sigaddset(&sa.sa_mask, SIGTERM);    // Block SIGTERM during handler
sigaction(SIGINT, &sa, NULL);

// Flags:
// SA_RESTART   - Restart interrupted syscalls
// SA_NOCLDSTOP - Don't get SIGCHLD when child stops
// SA_SIGINFO   - Use sa_sigaction instead of sa_handler
// SA_NODEFER   - Don't block signal during handler
// SA_RESETHAND - Reset to default after first delivery

signalfd for Event Loop Integration

#include <sys/signalfd.h>

// Block signals in normal path
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, SIGINT);
sigaddset(&mask, SIGTERM);
sigprocmask(SIG_BLOCK, &mask, NULL);

// Create signalfd
int sfd = signalfd(-1, &mask, SFD_NONBLOCK);

// Use in event loop (epoll, select, poll)
struct signalfd_siginfo info;
ssize_t n = read(sfd, &info, sizeof(info));
if (n > 0) {
    if (info.ssi_signo == SIGINT) {
        printf("Got SIGINT from PID %d\n", info.ssi_pid);
    }
}

Pipes

The simplest form of IPC:
┌─────────────────────────────────────────────────────────────────────────────┐
│                         PIPE IMPLEMENTATION                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  User Space                                                                  │
│  ┌────────────────┐                    ┌────────────────┐                   │
│  │   Process A    │                    │   Process B    │                   │
│  │                │                    │                │                   │
│  │  write(fd[1])  │                    │  read(fd[0])   │                   │
│  └───────┬────────┘                    └───────▲────────┘                   │
│          │                                     │                            │
│  ════════│═════════════════════════════════════│════════════════════════   │
│          │         Kernel Space                │                            │
│  ════════│═════════════════════════════════════│════════════════════════   │
│          │                                     │                            │
│          ▼                                     │                            │
│  ┌─────────────────────────────────────────────┴─────────────────────────┐  │
│  │                        struct pipe_inode_info                         │  │
│  │                                                                       │  │
│  │  ┌─────────────────────────────────────────────────────────────────┐ │  │
│  │  │              Circular Buffer (pipe_buffer array)                 │ │  │
│  │  │                                                                  │ │  │
│  │  │  ┌──────┬──────┬──────┬──────┬──────┬──────┬──────┬──────┐      │ │  │
│  │  │  │ buf0 │ buf1 │ buf2 │ buf3 │ buf4 │ buf5 │ buf6 │ buf7 │      │ │  │
│  │  │  └──────┴──────┴──────┴──────┴──────┴──────┴──────┴──────┘      │ │  │
│  │  │    ▲                                                             │ │  │
│  │  │    └── Each buffer is a page                                    │ │  │
│  │  │                                                                  │ │  │
│  │  │  head ──► next write position                                   │ │  │
│  │  │  tail ──► next read position                                    │ │  │
│  │  │                                                                  │ │  │
│  │  └──────────────────────────────────────────────────────────────────┘ │  │
│  │                                                                       │  │
│  │  wait_queue:  readers/writers waiting for data/space                 │  │
│  │  spinlock:    protects buffer state                                  │  │
│  │                                                                       │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  Default: 16 pages = 64KB buffer (adjustable via F_SETPIPE_SZ)              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Pipe Internals

// fs/pipe.c (simplified)
struct pipe_inode_info {
    struct mutex mutex;
    wait_queue_head_t rd_wait, wr_wait;
    unsigned int head;        // Write position
    unsigned int tail;        // Read position
    unsigned int max_usage;   // Max buffers
    unsigned int ring_size;   // Buffer array size
    unsigned int readers;     // Reader count
    unsigned int writers;     // Writer count
    struct pipe_buffer *bufs; // Buffer array
};

struct pipe_buffer {
    struct page *page;        // Page containing data
    unsigned int offset;      // Start offset in page
    unsigned int len;         // Length of data
    unsigned int flags;       // PIPE_BUF_FLAG_*
};

Named Pipes (FIFOs)

# Create named pipe
mkfifo /tmp/myfifo

# Writer
echo "Hello" > /tmp/myfifo

# Reader (in another terminal)
cat /tmp/myfifo
// Kernel creates inode with pipe_inode_info
// Accessed like regular file but with pipe semantics
int fd = open("/tmp/myfifo", O_RDONLY);

Unix Domain Sockets

For high-performance local IPC:
┌─────────────────────────────────────────────────────────────────────────────┐
│                    UNIX DOMAIN SOCKET TYPES                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  SOCK_STREAM (like TCP, connection-oriented)                                 │
│  ─────────────────────────────────────────                                  │
│  ┌──────────┐     connection     ┌──────────┐                               │
│  │ Server   │◄──────────────────►│ Client   │                               │
│  │          │     bi-directional │          │                               │
│  │ accept() │     byte stream    │ connect()│                               │
│  └──────────┘                    └──────────┘                               │
│                                                                              │
│  SOCK_DGRAM (like UDP, connectionless)                                       │
│  ──────────────────────────────────────                                     │
│  ┌──────────┐                    ┌──────────┐                               │
│  │ Process A│     datagrams      │ Process B│                               │
│  │          │────────────────────►│          │                               │
│  │ sendto() │◄────────────────────│ recvfrom │                               │
│  └──────────┘                    └──────────┘                               │
│                                                                              │
│  SOCK_SEQPACKET (connection-oriented with message boundaries)                │
│  ─────────────────────────────────────────────────────────                  │
│  ┌──────────┐    [msg1][msg2]    ┌──────────┐                               │
│  │ Server   │◄──────────────────►│ Client   │                               │
│  │          │   messages intact  │          │                               │
│  └──────────┘                    └──────────┘                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
#include <sys/socket.h>

int sv[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, sv);

if (fork() == 0) {
    // Child
    close(sv[0]);
    write(sv[1], "Hello from child", 16);
    close(sv[1]);
    exit(0);
} else {
    // Parent
    close(sv[1]);
    char buf[100];
    read(sv[0], buf, sizeof(buf));
    printf("Received: %s\n", buf);
    close(sv[0]);
}

File Descriptor Passing

Unix sockets can pass file descriptors between processes:
// Send file descriptor
void send_fd(int sock, int fd)
{
    struct msghdr msg = {0};
    struct cmsghdr *cmsg;
    char buf[CMSG_SPACE(sizeof(int))];
    
    msg.msg_control = buf;
    msg.msg_controllen = sizeof(buf);
    
    cmsg = CMSG_FIRSTHDR(&msg);
    cmsg->cmsg_level = SOL_SOCKET;
    cmsg->cmsg_type = SCM_RIGHTS;
    cmsg->cmsg_len = CMSG_LEN(sizeof(int));
    *(int *)CMSG_DATA(cmsg) = fd;
    
    struct iovec iov = { .iov_base = "x", .iov_len = 1 };
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    
    sendmsg(sock, &msg, 0);
}

// Receive file descriptor
int recv_fd(int sock)
{
    struct msghdr msg = {0};
    char buf[CMSG_SPACE(sizeof(int))];
    char dummy[1];
    struct iovec iov = { .iov_base = dummy, .iov_len = 1 };
    
    msg.msg_control = buf;
    msg.msg_controllen = sizeof(buf);
    msg.msg_iov = &iov;
    msg.msg_iovlen = 1;
    
    recvmsg(sock, &msg, 0);
    
    struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
    return *(int *)CMSG_DATA(cmsg);
}

Shared Memory

The fastest IPC - memory is shared directly:

POSIX Shared Memory

#include <sys/mman.h>
#include <fcntl.h>

// Create or open shared memory object
int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0666);
ftruncate(fd, 4096);  // Set size

// Map into address space
void *ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);  // Can close fd after mmap

// Use the shared memory
strcpy(ptr, "Hello from process A");

// Cleanup
munmap(ptr, 4096);
shm_unlink("/my_shm");  // Delete when done

Kernel Implementation

// mm/shmem.c - tmpfs-backed shared memory
// The shared memory region is backed by tmpfs

struct shmem_inode_info {
    spinlock_t lock;
    unsigned int seals;      // For memfd sealing
    unsigned long flags;
    unsigned long alloced;   // Pages allocated
    unsigned long swapped;   // Pages swapped out
    struct list_head shrinklist;
    struct list_head swaplist;
    struct simple_xattrs xattrs;
    struct inode vfs_inode;
};

Memory-Mapped Files

// Map a regular file for sharing
int fd = open("/path/to/file", O_RDWR);
void *ptr = mmap(NULL, file_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

// Changes visible to other processes mapping same file
// Also persisted to disk

// For anonymous shared memory (no file)
void *ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
// Shared with children after fork()

Message Queues

Structured messages with types:

POSIX Message Queues

#include <mqueue.h>

// Create queue
struct mq_attr attr = {
    .mq_maxmsg = 10,    // Max messages in queue
    .mq_msgsize = 256   // Max message size
};
mqd_t mq = mq_open("/my_queue", O_CREAT | O_RDWR, 0666, &attr);

// Send message
char msg[] = "Hello";
mq_send(mq, msg, sizeof(msg), 1);  // Priority 1

// Receive message (blocks if empty)
char buf[256];
unsigned int prio;
mq_receive(mq, buf, sizeof(buf), &prio);
printf("Received: %s (priority %u)\n", buf, prio);

// Cleanup
mq_close(mq);
mq_unlink("/my_queue");

POSIX vs System V Message Queues

FeaturePOSIXSystem V
APImq_open, mq_sendmsgget, msgsnd
Namespace/dev/mqueue filesystemInteger keys
Notificationmq_notify (signals, threads)None
Cleanupmq_unlinkExplicit or ipcrm
Message sizeConfigurableFixed (MSGMAX)

Semaphores

For process synchronization:

POSIX Named Semaphores

#include <semaphore.h>

// Create or open
sem_t *sem = sem_open("/my_sem", O_CREAT, 0666, 1);  // Initial value 1

// Wait (decrement)
sem_wait(sem);      // Blocks if 0
sem_trywait(sem);   // Non-blocking

// Post (increment)
sem_post(sem);

// Cleanup
sem_close(sem);
sem_unlink("/my_sem");

Unnamed Semaphores (shared memory)

// In shared memory between processes
sem_t *sem = mmap(NULL, sizeof(sem_t), PROT_READ | PROT_WRITE,
                  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
sem_init(sem, 1, 1);  // 1 = shared between processes, initial value 1

// Use normally
sem_wait(sem);
// ... critical section ...
sem_post(sem);

// Cleanup
sem_destroy(sem);
munmap(sem, sizeof(sem_t));

eventfd: Lightweight Notification

#include <sys/eventfd.h>

// Create eventfd
int efd = eventfd(0, EFD_NONBLOCK | EFD_SEMAPHORE);

// Writer: signal event
uint64_t val = 1;
write(efd, &val, sizeof(val));

// Reader: wait for event
uint64_t count;
read(efd, &count, sizeof(count));
// In EFD_SEMAPHORE mode: count is 1 and decrements counter
// Normal mode: count is total and resets to 0

// Use with epoll/select for event loops

IPC Performance Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    IPC MECHANISM COMPARISON                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Mechanism       Latency    Throughput   Synchronization   Notes            │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  Shared Memory   ~100ns     Highest      Manual (locks)    Fastest for      │
│                                                            large data       │
│                                                                              │
│  Unix Socket     ~1μs       High         Built-in         FD passing,      │
│  (stream)                                (recv blocks)     reliable         │
│                                                                              │
│  Pipe            ~1μs       Medium       Built-in         Simple,          │
│                                                            unidirectional   │
│                                                                              │
│  Message Queue   ~10μs      Medium       Built-in         Structured,      │
│                                          (blocking recv)   priority         │
│                                                                              │
│  Signal          ~10μs      Low          N/A              Notification     │
│                                                            only, limited    │
│                                                                              │
│  eventfd         ~100ns     N/A          Built-in         Counter/flag,    │
│                                                            epoll-friendly   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Container IPC Considerations

IPC Namespaces

// Each container can have isolated IPC:
// - System V IPC (semaphores, message queues, shared memory)
// - POSIX message queues

unshare(CLONE_NEWIPC);  // New IPC namespace

// After this, IPC objects are invisible to other namespaces

Sharing Between Containers

# Docker: share IPC namespace
docker run --ipc=container:other_container myimage

# Kubernetes: share IPC
spec:
  shareProcessNamespace: true  # Implies shared IPC

Interview Questions

Answer:SIGKILL is special:
  • Cannot be caught, blocked, or ignored
  • Kernel handles it directly
  • Process is terminated immediately (after current syscall)
The flow:
  1. Signal is queued to target process
  2. TIF_SIGPENDING flag is set
  3. On next return to userspace (or wakeup), kernel checks flag
  4. get_signal() sees SIGKILL
  5. do_exit() called immediately
  6. No handler, no cleanup - process dies
Exceptions: Uninterruptible sleep (D state) - process won’t die until it wakes up. This is why zombie processes with disk I/O can’t be killed.
Answer:Several approaches:
  1. Shared memory + semaphores (fastest):
// Shared circular buffer
// Semaphores: empty (producer waits), full (consumer waits), mutex
sem_wait(empty);
sem_wait(mutex);
// produce into buffer
sem_post(mutex);
sem_post(full);
  1. Unix socket pair (simpler):
// No explicit locking needed
// Built-in flow control via socket buffers
write(sock[1], data, size);  // Producer
read(sock[0], data, size);   // Consumer (blocks if empty)
  1. Pipe (if one direction only):
pipe(pipefd);
// Producer writes to pipefd[1]
// Consumer reads from pipefd[0]
Recommendation: Unix socket for most cases (reliable, bidirectional, FD passing). Shared memory only if latency-critical and you understand the locking.
Answer:The problem: Signal handlers run asynchronously - they can interrupt your code at almost any point.Dangers:
  1. Non-reentrant functions: malloc(), printf() use internal locks. If interrupted mid-call and handler calls same function → deadlock
  2. Non-atomic operations:
// Main code
if (flag) {      // Interrupted here
    use_data();  // Handler sets flag=0
}                // Now using stale data
  1. errno clobbering: Handler might set errno, affecting interrupted code
Safe practices:
  • Use volatile sig_atomic_t for flags
  • Only call async-signal-safe functions
  • Save/restore errno if needed
  • Keep handlers minimal - just set flag
  • Use signalfd for complex handling
Answer:Unix sockets can pass FDs between unrelated processes using ancillary (control) messages:The mechanism:
  1. Sender uses sendmsg() with SCM_RIGHTS control message
  2. Kernel takes sender’s FD, finds underlying file object
  3. Kernel creates new FD in receiver’s FD table pointing to same file
  4. Receiver uses recvmsg() to get new FD number
Key points:
  • Underlying file object is shared (same offset, flags)
  • FD numbers may differ in sender/receiver
  • Works across fork() and exec()
  • Used by container runtimes, systemd socket activation
Use cases:
  • Pass socket from parent to worker process
  • Zero-downtime server restart (pass listening socket)
  • Container file sharing

Debugging IPC

# View IPC resources
ipcs                    # System V IPC
ipcs -m                 # Shared memory only
ipcs -q                 # Message queues only
ipcs -s                 # Semaphores only

# View process signals
cat /proc/PID/status | grep -i sig
# SigPnd: 0000000000000000  (pending signals)
# SigBlk: 0000000000010000  (blocked signals)
# SigIgn: 0000000000001000  (ignored signals)
# SigCgt: 0000000180004007  (caught signals)

# Trace signal delivery
strace -e signal,kill ./myprogram

# View pipe buffer sizes
cat /proc/sys/fs/pipe-max-size

# View message queues
ls /dev/mqueue/

# View shared memory
ls /dev/shm/

Summary

MechanismBest ForLimitations
SignalsNotifications, process controlLimited data, async complexity
PipesStreaming between related processesUnidirectional, related only
Unix SocketsGeneral IPC, FD passingMore setup than pipes
Shared MemoryHigh-throughput data sharingManual synchronization
Message QueuesStructured messages, prioritiesFixed message sizes
SemaphoresSynchronizationJust counting, no data
eventfdEvent notificationJust counter, no data

Next Steps


Interview Deep-Dive

Strong Answer:
  • When Docker calls kill(container_pid, SIGTERM), the kernel’s do_send_sig_info() allocates a sigqueue structure, adds it to the target task’s pending signal queue, sets the TIF_SIGPENDING flag on the task, and wakes the task if it is sleeping. The signal is delivered when the task returns to user space: exit_to_user_mode_prepare() calls do_signal(), which dequeues the signal and either invokes the registered handler or performs the default action (terminate for SIGTERM).
  • The reason many containers do not shut down gracefully is that the application runs as PID 1 and does not register a SIGTERM handler. PID 1 in a PID namespace is special: the kernel does not deliver signals with default actions to PID 1 unless a handler is registered. This is because killing PID 1 would destroy the namespace, so the kernel protects it from accidental termination. If the application does not call sigaction(SIGTERM, ...), SIGTERM is silently ignored.
  • After the 10-second timeout, Docker sends SIGKILL. SIGKILL cannot be caught, blocked, or ignored — not even by PID 1. The kernel’s get_signal() function detects SIGKILL and calls do_exit() immediately, bypassing any handler. The process is terminated, all its resources are cleaned up, and all other processes in the PID namespace receive SIGKILL as well (because PID 1 exited).
  • The fix is to either use an init system like tini (Docker’s --init flag) that properly handles signals and forwards them to the application, or explicitly register a SIGTERM handler in the application code.
Follow-up: What happens if the application is in an uninterruptible sleep (D state) when SIGKILL arrives?Follow-up Answer:
  • If the process is in TASK_UNINTERRUPTIBLE state (typically waiting for disk I/O or a kernel lock), even SIGKILL cannot immediately terminate it. The kernel sets the TIF_SIGPENDING flag and the signal remains queued, but the process does not check for signals until it transitions out of the D state. This is why kill -9 sometimes appears to have no effect on processes stuck in D state. The process can only be killed when the I/O completes or the lock is released, at which point the signal delivery happens. Newer kernels introduced TASK_KILLABLE (a variant of uninterruptible sleep that responds to SIGKILL) to reduce the frequency of this problem for common code paths.
Strong Answer:
  • The strategy is: start the new process, pass the listening socket’s fd to it via a Unix domain socket, have the new process start accepting connections, then gracefully drain and stop the old process.
  • The fd-passing mechanism uses ancillary (control) messages on Unix domain sockets. The sender constructs a msghdr with a cmsghdr containing SCM_RIGHTS and the fd number. When sendmsg() processes this, the kernel does not just send the integer — it looks up the sender’s struct file * for that fd, increments its reference count, and serializes a reference in the control message.
  • On the receiving side, recvmsg() creates a new fd in the receiver’s file descriptor table pointing to the same struct file. The receiving process now has an independent fd number (possibly different from the sender’s) that references the same underlying socket. Both processes can now accept() connections on the same listening socket.
  • The kernel mechanism in scm_send() and scm_recv() in net/core/scm.c handles the fd translation: it calls fget() on the sender’s fd to get the struct file *, stores it in the scm_cookie, and on the receiving side calls receive_fd() to install the file in the receiver’s fd table with __receive_fd().
  • For zero-downtime: the new process calls accept() on the inherited socket. Both old and new processes can accept simultaneously during the transition window. The old process stops accepting, drains in-flight requests, and exits. The socket’s listen backlog is shared, so no connections are dropped during the transition.
Follow-up: What happens to connections that were already accepted by the old process? How would you drain them gracefully?Follow-up Answer:
  • Already-accepted connections are individual sockets with their own file descriptors in the old process. These are not affected by passing the listening socket. The old process should stop accepting new connections (close or stop polling the listening fd), then continue processing in-flight requests on existing accepted sockets until they complete. It can set a deadline and, after the deadline, close remaining connections with a proper TCP FIN (graceful close). If the service uses HTTP, it should send Connection: close headers on in-flight responses. For long-lived connections (WebSockets, gRPC streams), the application protocol needs its own graceful shutdown mechanism — the kernel-level fd passing only handles the listening socket.
Strong Answer:
  • Pipes provide unidirectional byte stream communication with a kernel-managed circular buffer (default 64KB, adjustable via F_SETPIPE_SZ). Each write() and read() involves a syscall, data copy from user to kernel buffer (write side) and kernel buffer to user (read side) — two copies per transfer. Latency is approximately 1 microsecond for small messages. I would use pipes for simple parent-child communication where data flows one direction, like shell pipelines.
  • Unix domain sockets support bidirectional communication, both stream and datagram modes, and critically support fd passing. Performance is similar to pipes (two data copies, ~1 microsecond latency), but with the overhead of full socket buffer management. For SOCK_DGRAM, message boundaries are preserved. I would use Unix sockets for any IPC that needs bidirectional communication, fd passing, or integration with event loops (epoll). They are the default choice for general-purpose local IPC.
  • Shared memory eliminates all data copying: both processes map the same physical pages into their address spaces via mmap() with MAP_SHARED. Data written by one process is immediately visible to the other (subject to cache coherency, which is handled by hardware on x86). Latency is approximately 100 nanoseconds. However, shared memory requires explicit synchronization (mutexes, semaphores, or lock-free algorithms) because the kernel provides no ordering guarantees. I would use shared memory only for high-throughput, latency-critical IPC (inter-process queues for trading systems, database buffer pools) where the application team can correctly implement the synchronization.
  • The key decision factor is: how much data and how fast? For small, infrequent messages, Unix sockets are simplest and safest. For high-throughput bulk data transfer, shared memory avoids the copy overhead that dominates at high rates.
Follow-up: How does eventfd fit into this picture, and why is it commonly used with shared memory?Follow-up Answer:
  • eventfd is a lightweight notification mechanism: it provides a file descriptor that represents a counter. Writing increments the counter, reading returns and resets it. The overhead is minimal — a single syscall to signal an event, and the fd is epoll-compatible. Shared memory alone has no built-in notification mechanism: a reader must either busy-poll (waste CPU) or use a separate signaling channel. eventfd fills this role perfectly: the writer updates shared memory, then writes to the eventfd to notify the reader. The reader blocks on epoll_wait() including the eventfd, wakes up on notification, and reads the shared memory. This combines shared memory’s zero-copy data transfer with eventfd’s efficient notification, giving the best of both worlds for high-performance IPC.