Skip to main content

Synchronization

Synchronization ensures correct behavior when multiple threads access shared resources. It’s one of the most challenging areas of systems programming and a favorite topic in senior interviews.
Interview Frequency: Very High
Key Topics: Mutexes, semaphores, condition variables, atomics
Time to Master: 12-14 hours

The Need for Synchronization

Race Condition Example

// Shared counter - BUGGY CODE
int counter = 0;

void increment() {
    counter++;  // NOT atomic!
}

// What counter++ actually does:
// 1. LOAD counter from memory to register
// 2. ADD 1 to register
// 3. STORE register back to memory

// Race condition with 2 threads:
// Thread A                  Thread B
// LOAD counter (0)          
//                           LOAD counter (0)
// ADD 1 (register = 1)      
//                           ADD 1 (register = 1)
// STORE (counter = 1)       
//                           STORE (counter = 1)
//
// Expected: 2, Actual: 1 ❌

Critical Section

A critical section is code that accesses shared resources: Critical Section Pattern Requirements for critical section solutions:
  1. Mutual Exclusion: Only one thread in CS at a time
  2. Progress: If no one in CS, someone waiting can enter
  3. Bounded Waiting: No thread waits forever (no starvation)

Hardware Support

Test-and-Set (TAS)

// Atomic test-and-set instruction
// Returns old value and sets to true
bool test_and_set(bool *target) {
    bool old = *target;  // These three operations
    *target = true;      // happen atomically
    return old;          // in hardware
}

// Simple spinlock using TAS
typedef struct { bool locked; } spinlock_t;

void spin_lock(spinlock_t *lock) {
    while (test_and_set(&lock->locked)) {
        // Spin (busy wait)
    }
}

void spin_unlock(spinlock_t *lock) {
    lock->locked = false;
}

Compare-and-Swap (CAS)

// Atomic compare-and-swap
// If *addr == expected, set to new and return true
bool compare_and_swap(int *addr, int expected, int new_val) {
    if (*addr == expected) {  // Atomic
        *addr = new_val;
        return true;
    }
    return false;
}

// Lock-free counter using CAS
void atomic_increment(int *counter) {
    int old, new_val;
    do {
        old = *counter;
        new_val = old + 1;
    } while (!compare_and_swap(counter, old, new_val));
}

Fetch-and-Add

// Returns old value, adds to it atomically
int fetch_and_add(int *addr, int value) {
    int old = *addr;
    *addr += value;  // Atomic
    return old;
}

// Ticket lock using fetch-and-add
typedef struct {
    int next_ticket;
    int now_serving;
} ticket_lock_t;

void ticket_lock(ticket_lock_t *lock) {
    int my_ticket = fetch_and_add(&lock->next_ticket, 1);
    while (lock->now_serving != my_ticket) {
        // Spin
    }
}

void ticket_unlock(ticket_lock_t *lock) {
    lock->now_serving++;
}

Mutex (Mutual Exclusion Lock)

POSIX Mutex

#include <pthread.h>

pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int shared_data = 0;

void *worker(void *arg) {
    pthread_mutex_lock(&mutex);
    
    // Critical section - safe to access shared_data
    shared_data++;
    
    pthread_mutex_unlock(&mutex);
    
    return NULL;
}

Mutex Types

pthread_mutexattr_t attr;
pthread_mutexattr_init(&attr);

// Normal: Undefined behavior if same thread locks twice
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL);

// Recursive: Same thread can lock multiple times
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_RECURSIVE);

// Error-check: Returns error on double-lock, self-unlock
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_ERRORCHECK);

pthread_mutex_t mutex;
pthread_mutex_init(&mutex, &attr);

Spinlock vs Mutex

// Spinlock: Busy wait
void spinlock_acquire() {
    while (test_and_set(&lock)) {
        // Burns CPU cycles
    }
}

// Mutex: Sleep wait
void mutex_acquire() {
    if (try_lock(&lock)) {
        return;
    }
    // Add to wait queue
    // Sleep (yield CPU)
}
AspectSpinlockMutex
Wait mechanismBusy wait (spin)Sleep (block)
CPU usage100% while waiting0% while waiting
Context switchNoneYes (if contended)
Best forVery short critical sectionsLonger critical sections
SMP considerationGood on multicoreGood anywhere
Can hold acrossNever sleep!Can sleep (blocking I/O)

Semaphores

A semaphore is a counter with atomic wait and signal operations:
typedef struct {
    int value;
    queue_t waiters;
    pthread_mutex_t lock;
} semaphore_t;

void sem_wait(semaphore_t *sem) {  // Also called P() or down()
    pthread_mutex_lock(&sem->lock);
    sem->value--;
    if (sem->value < 0) {
        // Add to waiters queue
        // Block (release lock atomically)
    }
    pthread_mutex_unlock(&sem->lock);
}

void sem_post(semaphore_t *sem) {  // Also called V() or up()
    pthread_mutex_lock(&sem->lock);
    sem->value++;
    if (sem->value <= 0) {
        // Wake one waiter from queue
    }
    pthread_mutex_unlock(&sem->lock);
}

Binary Semaphore (like Mutex)

sem_t mutex;
sem_init(&mutex, 0, 1);  // Initial value = 1

void critical_section() {
    sem_wait(&mutex);    // Decrement (0 means locked)
    // ... work ...
    sem_post(&mutex);    // Increment (1 means unlocked)
}

Counting Semaphore (Resource Pool)

#define POOL_SIZE 5

sem_t pool;
sem_init(&pool, 0, POOL_SIZE);

void use_resource() {
    sem_wait(&pool);     // Acquire (value decrements)
    
    // Use one of the 5 resources
    // Up to 5 threads can be here simultaneously
    
    sem_post(&pool);     // Release (value increments)
}

Producer-Consumer with Semaphores

Producer-Consumer
#define BUFFER_SIZE 10

int buffer[BUFFER_SIZE];
int in = 0, out = 0;

sem_t empty;   // Counts empty slots
sem_t full;    // Counts filled slots
sem_t mutex;   // Protects buffer access

void init() {
    sem_init(&empty, 0, BUFFER_SIZE);
    sem_init(&full, 0, 0);
    sem_init(&mutex, 0, 1);
}

void *producer(void *arg) {
    while (1) {
        int item = produce_item();
        
        sem_wait(&empty);  // Wait for empty slot
        sem_wait(&mutex);  // Enter critical section
        
        buffer[in] = item;
        in = (in + 1) % BUFFER_SIZE;
        
        sem_post(&mutex);  // Leave critical section
        sem_post(&full);   // Signal item available
    }
}

void *consumer(void *arg) {
    while (1) {
        sem_wait(&full);   // Wait for item
        sem_wait(&mutex);  // Enter critical section
        
        int item = buffer[out];
        out = (out + 1) % BUFFER_SIZE;
        
        sem_post(&mutex);  // Leave critical section
        sem_post(&empty);  // Signal slot available
        
        consume_item(item);
    }
}

Condition Variables

Condition variables allow threads to wait for a condition to become true:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int ready = 0;

void *waiter(void *arg) {
    pthread_mutex_lock(&mutex);
    
    while (!ready) {  // ALWAYS use while, not if
        pthread_cond_wait(&cond, &mutex);
        // wait() atomically:
        // 1. Releases mutex
        // 2. Sleeps until signaled
        // 3. Re-acquires mutex before returning
    }
    
    // ready is now true, mutex is held
    
    pthread_mutex_unlock(&mutex);
    return NULL;
}

void *signaler(void *arg) {
    pthread_mutex_lock(&mutex);
    
    ready = 1;
    pthread_cond_signal(&cond);  // Wake one waiter
    // pthread_cond_broadcast(&cond);  // Wake ALL waiters
    
    pthread_mutex_unlock(&mutex);
    return NULL;
}

Why Use while Not if?

// WRONG - can miss wakeup or have spurious wakeup
if (!ready) {
    pthread_cond_wait(&cond, &mutex);
}
// Might proceed even if ready is still false!

// RIGHT - always re-check condition
while (!ready) {
    pthread_cond_wait(&cond, &mutex);
}
// Guaranteed ready is true when we continue
Reasons:
  1. Spurious wakeups: pthread_cond_wait can return without signal
  2. Multiple waiters: One signal, multiple threads wake (with broadcast)
  3. Stolen wakeup: Condition may change between signal and wake

Bounded Buffer with Condition Variables

typedef struct {
    int buffer[SIZE];
    int count, in, out;
    pthread_mutex_t mutex;
    pthread_cond_t not_empty;
    pthread_cond_t not_full;
} bounded_buffer_t;

void put(bounded_buffer_t *bb, int item) {
    pthread_mutex_lock(&bb->mutex);
    
    while (bb->count == SIZE) {  // Buffer full
        pthread_cond_wait(&bb->not_full, &bb->mutex);
    }
    
    bb->buffer[bb->in] = item;
    bb->in = (bb->in + 1) % SIZE;
    bb->count++;
    
    pthread_cond_signal(&bb->not_empty);  // Wake consumers
    
    pthread_mutex_unlock(&bb->mutex);
}

int get(bounded_buffer_t *bb) {
    pthread_mutex_lock(&bb->mutex);
    
    while (bb->count == 0) {  // Buffer empty
        pthread_cond_wait(&bb->not_empty, &bb->mutex);
    }
    
    int item = bb->buffer[bb->out];
    bb->out = (bb->out + 1) % SIZE;
    bb->count--;
    
    pthread_cond_signal(&bb->not_full);  // Wake producers
    
    pthread_mutex_unlock(&bb->mutex);
    return item;
}

Memory Ordering & Barriers

Deep down, hardware does not guarantee that instructions execute in the order you write them.

The Problem: Reordering

Compilers and CPUs reorder instructions to hide latency.
// Thread 1           // Thread 2
x = 1;                while (!flag);
flag = 1;             print(x);
Question: Can Thread 2 print 0? Answer: YES! The CPU might write flag=1 BEFORE x=1 because they are independent variables.

Solution: Memory Barriers (Fences)

We must tell the CPU: “Finish all operations before this line, before doing anything after this line.”

C11 Atomics Memory Order

  1. memory_order_relaxed: No ordering guarantees. Just atomic. Fast.
  2. memory_order_acquire: (Read) No subsequent reads/writes can be moved before this load.
  3. memory_order_release: (Write) No prior reads/writes can be moved after this store.
  4. memory_order_seq_cst: (Sequential Consistency) Global total ordering. Slowest but safest (Default).
// Correct Producer-Consumer with Acquire-Release
atomic_store_explicit(&data, 42, memory_order_relaxed); // A
atomic_store_explicit(&ready, 1, memory_order_release); // B (A happens before B)

// --- Other Thread ---

while (!atomic_load_explicit(&ready, memory_order_acquire)); // C (Synchronizes with B)
result = atomic_load_explicit(&data, memory_order_relaxed);  // D (C happens before D)
// Guaranteed: result is 42.

Read-Copy-Update (RCU)

A synchronization mechanism used heavily in the Linux Kernel (e.g., routing tables, file descriptors).

The Philosophy

Readers should be blazing fast and wait for no one. Writers do the heavy lifting.

How it Works

  1. Readers: Just read! No locks, no atomics. Zero overhead.
  2. Writer:
    • Read: Read the old data structure.
    • Copy: Make a copy and modify it.
    • Update: Atomically pointer-swap global pointer to the new copy.
    • Wait: Wait for all pre-existing readers to finish (Grace Period).
    • Free: Delete the old copy.
RCU Diagram

RCU Example Code (Conceptual)

struct config_t *global_config; // Shared pointer

// Reader: fast, no locks
void read_config() {
    rcu_read_lock();  // Disable preemption (cheap)
    
    struct config_t *conf = rcu_dereference(global_config);
    printf("Value: %d\n", conf->value);
    
    rcu_read_unlock();
}

// Writer: expensive
void update_config(int new_val) {
    struct config_t *new_conf = malloc(sizeof(struct config_t));
    struct config_t *old_conf;
    
    // Copy & Update
    *new_conf = *global_config;
    new_conf->value = new_val;
    
    // Swap pointers (Atomic Publisher)
    old_conf = global_config;
    rcu_assign_pointer(global_config, new_conf);
    
    // Wait for all readers who saw old_conf to leave
    synchronize_rcu();
    
    // Safe to free now
    free(old_conf);
}
Use Case: Read-mostly workloads (99% reads, 1% writes). Trade-off: Writers are slow and can block; memory usage spikes during update.

Lock-Free Programming

Avoid locks entirely using atomics:

Lock-Free Stack

#include <stdatomic.h>

typedef struct node {
    int data;
    struct node *next;
} node_t;

typedef struct {
    _Atomic(node_t *) top;
} lock_free_stack_t;

void push(lock_free_stack_t *stack, int data) {
    node_t *new_node = malloc(sizeof(node_t));
    new_node->data = data;
    
    node_t *old_top;
    do {
        old_top = atomic_load(&stack->top);
        new_node->next = old_top;
    } while (!atomic_compare_exchange_weak(&stack->top, &old_top, new_node));
}

bool pop(lock_free_stack_t *stack, int *data) {
    node_t *old_top;
    node_t *new_top;
    
    do {
        old_top = atomic_load(&stack->top);
        if (old_top == NULL) return false;
        new_top = old_top->next;
    } while (!atomic_compare_exchange_weak(&stack->top, &old_top, new_top));
    
    *data = old_top->data;
    // Can't free old_top yet! (ABA problem, use hazard pointers)
    return true;
}

ABA Problem

ABA Problem

Common Synchronization Problems

Dining Philosophers

#define N 5
pthread_mutex_t forks[N];

void philosopher(int id) {
    while (1) {
        think();
        
        // Pick up forks (DEADLOCK possible!)
        int left = id;
        int right = (id + 1) % N;
        
        // Solution: Always pick up lower-numbered fork first
        if (left < right) {
            pthread_mutex_lock(&forks[left]);
            pthread_mutex_lock(&forks[right]);
        } else {
            pthread_mutex_lock(&forks[right]);
            pthread_mutex_lock(&forks[left]);
        }
        
        eat();
        
        pthread_mutex_unlock(&forks[left]);
        pthread_mutex_unlock(&forks[right]);
    }
}

Readers-Writers (with writer preference)

typedef struct {
    pthread_mutex_t mutex;
    pthread_mutex_t write_lock;
    pthread_cond_t reader_cond;
    int readers;
    int waiting_writers;
    bool writer_active;
} rw_lock_t;

void read_lock(rw_lock_t *rw) {
    pthread_mutex_lock(&rw->mutex);
    
    while (rw->writer_active || rw->waiting_writers > 0) {
        pthread_cond_wait(&rw->reader_cond, &rw->mutex);
    }
    rw->readers++;
    
    pthread_mutex_unlock(&rw->mutex);
}

void read_unlock(rw_lock_t *rw) {
    pthread_mutex_lock(&rw->mutex);
    rw->readers--;
    if (rw->readers == 0) {
        pthread_cond_signal(&rw->reader_cond);  // Wake writers
    }
    pthread_mutex_unlock(&rw->mutex);
}

Interview Deep Dive Questions

Answer:
// Method 1: Double-checked locking (with proper memory ordering)
#include <stdatomic.h>

typedef struct { int data; } Singleton;

static _Atomic(Singleton *) instance = NULL;
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

Singleton *get_instance() {
    Singleton *p = atomic_load_explicit(&instance, memory_order_acquire);
    if (p == NULL) {
        pthread_mutex_lock(&mutex);
        p = atomic_load_explicit(&instance, memory_order_relaxed);
        if (p == NULL) {
            p = malloc(sizeof(Singleton));
            init_singleton(p);
            atomic_store_explicit(&instance, p, memory_order_release);
        }
        pthread_mutex_unlock(&mutex);
    }
    return p;
}

// Method 2: pthread_once (simpler, recommended)
static Singleton *instance = NULL;
static pthread_once_t once = PTHREAD_ONCE_INIT;

static void init_instance() {
    instance = malloc(sizeof(Singleton));
    init_singleton(instance);
}

Singleton *get_instance() {
    pthread_once(&once, init_instance);
    return instance;
}
Why double-checked locking needs memory ordering:
  • Without release/acquire, compiler/CPU may reorder
  • Other thread might see uninitialized Singleton
Answer:Four conditions for deadlock (all must hold):
  1. Mutual Exclusion: Resource can’t be shared
  2. Hold and Wait: Thread holds resource while waiting for another
  3. No Preemption: Can’t force thread to release
  4. Circular Wait: A waits for B, B waits for A Deadlock Conditions
Prevention strategies:
StrategyHowTrade-off
Break mutual exclusionUse lock-free data structuresComplex to implement
Break hold-and-waitAcquire all locks at onceReduces concurrency
Break no preemptionUse trylock with timeoutMay need retry logic
Break circular waitLock orderingMust maintain order everywhere
Lock ordering example:
// Always acquire in consistent order (e.g., by address)
void safe_transfer(account_t *from, account_t *to, int amount) {
    account_t *first = (from < to) ? from : to;
    account_t *second = (from < to) ? to : from;
    
    pthread_mutex_lock(&first->mutex);
    pthread_mutex_lock(&second->mutex);
    
    from->balance -= amount;
    to->balance += amount;
    
    pthread_mutex_unlock(&second->mutex);
    pthread_mutex_unlock(&first->mutex);
}
Answer:The problem: CPUs and compilers reorder memory operations for performance.
// Thread 1            Thread 2
x = 1;                 while (!ready);
ready = true;          print(x);  // Might print 0!
Memory orderings (C11 atomics):
OrderingGuarantee
relaxedAtomicity only, no ordering
acquireOperations after can’t move before
releaseOperations before can’t move after
acq_relBoth acquire and release
seq_cstTotal global order (strongest, default)
Fixed version:
atomic_int x, ready;

// Thread 1
atomic_store_explicit(&x, 1, memory_order_relaxed);
atomic_store_explicit(&ready, 1, memory_order_release);  // Release

// Thread 2
while (!atomic_load_explicit(&ready, memory_order_acquire));  // Acquire
print(atomic_load_explicit(&x, memory_order_relaxed));  // Sees 1
Hardware barriers:
  • x86: Strong ordering, mostly don’t need explicit barriers
  • ARM: Weak ordering, need barriers (dmb, dsb)
  • Compiler barrier: asm volatile("" ::: "memory")
Answer:
#include <pthread.h>
#include <time.h>

typedef struct {
    int max_requests;       // Max requests per window
    int window_ms;          // Window size in milliseconds
    int current_count;      // Requests in current window
    long window_start;      // Window start time
    pthread_mutex_t mutex;
} RateLimiter;

long current_time_ms() {
    struct timespec ts;
    clock_gettime(CLOCK_MONOTONIC, &ts);
    return ts.tv_sec * 1000 + ts.tv_nsec / 1000000;
}

bool try_acquire(RateLimiter *rl) {
    pthread_mutex_lock(&rl->mutex);
    
    long now = current_time_ms();
    
    // Check if window expired
    if (now - rl->window_start >= rl->window_ms) {
        rl->window_start = now;
        rl->current_count = 0;
    }
    
    bool allowed = false;
    if (rl->current_count < rl->max_requests) {
        rl->current_count++;
        allowed = true;
    }
    
    pthread_mutex_unlock(&rl->mutex);
    return allowed;
}
Improvements:
  • Sliding window: Track request timestamps, more accurate
  • Token bucket: Allows bursts, smoother rate limiting
  • Distributed: Use Redis + Lua for atomic operations
Token bucket:
typedef struct {
    double tokens;
    double rate;      // Tokens per second
    double capacity;  // Max tokens
    long last_update;
    pthread_mutex_t mutex;
} TokenBucket;

bool try_acquire_token(TokenBucket *tb) {
    pthread_mutex_lock(&tb->mutex);
    
    long now = current_time_ms();
    double elapsed = (now - tb->last_update) / 1000.0;
    
    // Add tokens based on elapsed time
    tb->tokens = fmin(tb->capacity, tb->tokens + elapsed * tb->rate);
    tb->last_update = now;
    
    bool allowed = false;
    if (tb->tokens >= 1.0) {
        tb->tokens -= 1.0;
        allowed = true;
    }
    
    pthread_mutex_unlock(&tb->mutex);
    return allowed;
}
Answer:Use spinlock when:
  • Critical section is very short (< 1 μs)
  • Running on multicore system
  • Can’t sleep (interrupt context, kernel code)
  • Lock contention is low
Use mutex when:
  • Critical section is longer
  • May need to sleep while holding lock
  • Single-core system (spinning wastes the only CPU)
  • High contention (sleeping saves CPU)
Adaptive locks (best of both):
void adaptive_lock(lock_t *lock) {
    int spins = 0;
    
    while (!try_lock(lock)) {
        if (spins++ < SPIN_THRESHOLD) {
            cpu_relax();  // Pause instruction (reduces power)
        } else {
            // Lock holder running on another CPU?
            if (is_lock_holder_running(lock)) {
                continue;  // Keep spinning
            }
            // Lock holder sleeping, so should we
            sleep_until_lock_available(lock);
            spins = 0;
        }
    }
}
Linux kernel:
  • spin_lock(): Always spin
  • mutex_lock(): Hybrid (optimistic spin, then sleep)
  • rt_mutex: Priority inheritance for real-time

Practice Exercises

1

Implement Barrier

Create a reusable barrier that N threads wait at before continuing.
2

Lock-Free Queue

Implement a lock-free MPSC (multi-producer, single-consumer) queue.
3

RW Lock from Primitives

Build a readers-writer lock using only mutexes and condition variables.
4

Deadlock Detector

Create a lock wrapper that detects potential deadlocks using lock ordering.

Key Takeaways

Mutex for Most Cases

Default choice. Sleep-waits, works everywhere, simple to use.

While, Not If

Always recheck condition after cond_wait returns.

Lock Ordering Prevents Deadlock

Consistent order eliminates circular wait.

Lock-Free is Hard

ABA problem, memory ordering. Use only when necessary.

Next: Deadlocks