Synchronization
Synchronization ensures correct behavior when multiple threads access shared resources. It’s one of the most challenging areas of systems programming and a favorite topic in senior interviews.
Interview Frequency : Very High
Key Topics : Mutexes, semaphores, condition variables, atomics
Time to Master : 12-14 hours
The Need for Synchronization
Race Condition Example
// Shared counter - BUGGY CODE
int counter = 0 ;
void increment () {
counter ++ ; // NOT atomic!
}
// What counter++ actually does:
// 1. LOAD counter from memory to register
// 2. ADD 1 to register
// 3. STORE register back to memory
// Race condition with 2 threads:
// Thread A Thread B
// LOAD counter (0)
// LOAD counter (0)
// ADD 1 (register = 1)
// ADD 1 (register = 1)
// STORE (counter = 1)
// STORE (counter = 1)
//
// Expected: 2, Actual: 1 ❌
Critical Section
A critical section is code that accesses shared resources:
Requirements for critical section solutions :
Mutual Exclusion : Only one thread in CS at a time
Progress : If no one in CS, someone waiting can enter
Bounded Waiting : No thread waits forever (no starvation)
Hardware Support
Test-and-Set (TAS)
// Atomic test-and-set instruction
// Returns old value and sets to true
bool test_and_set ( bool * target ) {
bool old = * target; // These three operations
* target = true ; // happen atomically
return old; // in hardware
}
// Simple spinlock using TAS
typedef struct { bool locked; } spinlock_t ;
void spin_lock ( spinlock_t * lock ) {
while ( test_and_set ( & lock -> locked )) {
// Spin (busy wait)
}
}
void spin_unlock ( spinlock_t * lock ) {
lock -> locked = false ;
}
Compare-and-Swap (CAS)
// Atomic compare-and-swap
// If *addr == expected, set to new and return true
bool compare_and_swap ( int * addr , int expected , int new_val ) {
if ( * addr == expected) { // Atomic
* addr = new_val;
return true ;
}
return false ;
}
// Lock-free counter using CAS
void atomic_increment ( int * counter ) {
int old, new_val;
do {
old = * counter;
new_val = old + 1 ;
} while ( ! compare_and_swap (counter, old, new_val));
}
Fetch-and-Add
// Returns old value, adds to it atomically
int fetch_and_add ( int * addr , int value ) {
int old = * addr;
* addr += value; // Atomic
return old;
}
// Ticket lock using fetch-and-add
typedef struct {
int next_ticket;
int now_serving;
} ticket_lock_t ;
void ticket_lock ( ticket_lock_t * lock ) {
int my_ticket = fetch_and_add ( & lock -> next_ticket , 1 );
while ( lock -> now_serving != my_ticket) {
// Spin
}
}
void ticket_unlock ( ticket_lock_t * lock ) {
lock -> now_serving ++ ;
}
Mutex (Mutual Exclusion Lock)
POSIX Mutex
#include <pthread.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int shared_data = 0 ;
void * worker ( void * arg ) {
pthread_mutex_lock ( & mutex);
// Critical section - safe to access shared_data
shared_data ++ ;
pthread_mutex_unlock ( & mutex);
return NULL ;
}
Mutex Types
pthread_mutexattr_t attr;
pthread_mutexattr_init ( & attr );
// Normal: Undefined behavior if same thread locks twice
pthread_mutexattr_settype ( & attr , PTHREAD_MUTEX_NORMAL);
// Recursive: Same thread can lock multiple times
pthread_mutexattr_settype ( & attr , PTHREAD_MUTEX_RECURSIVE);
// Error-check: Returns error on double-lock, self-unlock
pthread_mutexattr_settype ( & attr , PTHREAD_MUTEX_ERRORCHECK);
pthread_mutex_t mutex;
pthread_mutex_init ( & mutex , & attr );
Spinlock vs Mutex
// Spinlock: Busy wait
void spinlock_acquire () {
while ( test_and_set ( & lock)) {
// Burns CPU cycles
}
}
// Mutex: Sleep wait
void mutex_acquire () {
if ( try_lock ( & lock)) {
return ;
}
// Add to wait queue
// Sleep (yield CPU)
}
Aspect Spinlock Mutex Wait mechanism Busy wait (spin) Sleep (block) CPU usage 100% while waiting 0% while waiting Context switch None Yes (if contended) Best for Very short critical sections Longer critical sections SMP consideration Good on multicore Good anywhere Can hold across Never sleep! Can sleep (blocking I/O)
Semaphores
A semaphore is a counter with atomic wait and signal operations:
typedef struct {
int value;
queue_t waiters;
pthread_mutex_t lock;
} semaphore_t ;
void sem_wait ( semaphore_t * sem ) { // Also called P() or down()
pthread_mutex_lock ( & sem -> lock );
sem -> value -- ;
if ( sem -> value < 0 ) {
// Add to waiters queue
// Block (release lock atomically)
}
pthread_mutex_unlock ( & sem -> lock );
}
void sem_post ( semaphore_t * sem ) { // Also called V() or up()
pthread_mutex_lock ( & sem -> lock );
sem -> value ++ ;
if ( sem -> value <= 0 ) {
// Wake one waiter from queue
}
pthread_mutex_unlock ( & sem -> lock );
}
Binary Semaphore (like Mutex)
sem_t mutex;
sem_init ( & mutex , 0 , 1 ); // Initial value = 1
void critical_section () {
sem_wait ( & mutex); // Decrement (0 means locked)
// ... work ...
sem_post ( & mutex); // Increment (1 means unlocked)
}
Counting Semaphore (Resource Pool)
#define POOL_SIZE 5
sem_t pool;
sem_init ( & pool , 0 , POOL_SIZE);
void use_resource () {
sem_wait ( & pool); // Acquire (value decrements)
// Use one of the 5 resources
// Up to 5 threads can be here simultaneously
sem_post ( & pool); // Release (value increments)
}
Producer-Consumer with Semaphores
#define BUFFER_SIZE 10
int buffer [BUFFER_SIZE];
int in = 0 , out = 0 ;
sem_t empty; // Counts empty slots
sem_t full; // Counts filled slots
sem_t mutex; // Protects buffer access
void init () {
sem_init ( & empty, 0 , BUFFER_SIZE);
sem_init ( & full, 0 , 0 );
sem_init ( & mutex, 0 , 1 );
}
void * producer ( void * arg ) {
while ( 1 ) {
int item = produce_item ();
sem_wait ( & empty); // Wait for empty slot
sem_wait ( & mutex); // Enter critical section
buffer [in] = item;
in = (in + 1 ) % BUFFER_SIZE;
sem_post ( & mutex); // Leave critical section
sem_post ( & full); // Signal item available
}
}
void * consumer ( void * arg ) {
while ( 1 ) {
sem_wait ( & full); // Wait for item
sem_wait ( & mutex); // Enter critical section
int item = buffer [out];
out = (out + 1 ) % BUFFER_SIZE;
sem_post ( & mutex); // Leave critical section
sem_post ( & empty); // Signal slot available
consume_item (item);
}
}
Condition Variables
Condition variables allow threads to wait for a condition to become true:
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int ready = 0 ;
void * waiter ( void * arg ) {
pthread_mutex_lock ( & mutex);
while ( ! ready) { // ALWAYS use while, not if
pthread_cond_wait ( & cond, & mutex);
// wait() atomically:
// 1. Releases mutex
// 2. Sleeps until signaled
// 3. Re-acquires mutex before returning
}
// ready is now true, mutex is held
pthread_mutex_unlock ( & mutex);
return NULL ;
}
void * signaler ( void * arg ) {
pthread_mutex_lock ( & mutex);
ready = 1 ;
pthread_cond_signal ( & cond); // Wake one waiter
// pthread_cond_broadcast(&cond); // Wake ALL waiters
pthread_mutex_unlock ( & mutex);
return NULL ;
}
Why Use while Not if?
// WRONG - can miss wakeup or have spurious wakeup
if ( ! ready) {
pthread_cond_wait ( & cond, & mutex);
}
// Might proceed even if ready is still false!
// RIGHT - always re-check condition
while ( ! ready) {
pthread_cond_wait ( & cond, & mutex);
}
// Guaranteed ready is true when we continue
Reasons :
Spurious wakeups : pthread_cond_wait can return without signal
Multiple waiters : One signal, multiple threads wake (with broadcast)
Stolen wakeup : Condition may change between signal and wake
Bounded Buffer with Condition Variables
typedef struct {
int buffer [SIZE];
int count, in, out;
pthread_mutex_t mutex;
pthread_cond_t not_empty;
pthread_cond_t not_full;
} bounded_buffer_t ;
void put ( bounded_buffer_t * bb , int item ) {
pthread_mutex_lock ( & bb -> mutex );
while ( bb -> count == SIZE) { // Buffer full
pthread_cond_wait ( & bb -> not_full , & bb -> mutex );
}
bb -> buffer [ bb -> in ] = item;
bb -> in = ( bb -> in + 1 ) % SIZE;
bb -> count ++ ;
pthread_cond_signal ( & bb -> not_empty ); // Wake consumers
pthread_mutex_unlock ( & bb -> mutex );
}
int get ( bounded_buffer_t * bb ) {
pthread_mutex_lock ( & bb -> mutex );
while ( bb -> count == 0 ) { // Buffer empty
pthread_cond_wait ( & bb -> not_empty , & bb -> mutex );
}
int item = bb -> buffer [ bb -> out ];
bb -> out = ( bb -> out + 1 ) % SIZE;
bb -> count -- ;
pthread_cond_signal ( & bb -> not_full ); // Wake producers
pthread_mutex_unlock ( & bb -> mutex );
return item;
}
Memory Ordering & Barriers
Deep down, hardware does not guarantee that instructions execute in the order you write them.
The Problem: Reordering
Compilers and CPUs reorder instructions to hide latency.
// Thread 1 // Thread 2
x = 1 ; while ( ! flag);
flag = 1 ; print (x);
Question : Can Thread 2 print 0?
Answer : YES! The CPU might write flag=1 BEFORE x=1 because they are independent variables.
Solution: Memory Barriers (Fences)
We must tell the CPU: “Finish all operations before this line, before doing anything after this line.”
C11 Atomics Memory Order
memory_order_relaxed : No ordering guarantees. Just atomic. Fast.
memory_order_acquire : (Read) No subsequent reads/writes can be moved before this load.
memory_order_release : (Write) No prior reads/writes can be moved after this store.
memory_order_seq_cst : (Sequential Consistency) Global total ordering. Slowest but safest (Default).
// Correct Producer-Consumer with Acquire-Release
atomic_store_explicit ( & data , 42 , memory_order_relaxed); // A
atomic_store_explicit ( & ready , 1 , memory_order_release); // B (A happens before B)
// --- Other Thread ---
while ( ! atomic_load_explicit ( & ready , memory_order_acquire)); // C (Synchronizes with B)
result = atomic_load_explicit ( & data , memory_order_relaxed); // D (C happens before D)
// Guaranteed: result is 42.
Read-Copy-Update (RCU)
A synchronization mechanism used heavily in the Linux Kernel (e.g., routing tables, file descriptors).
The Philosophy
Readers should be blazing fast and wait for no one. Writers do the heavy lifting.
How it Works
Readers : Just read! No locks, no atomics. Zero overhead.
Writer :
Read : Read the old data structure.
Copy : Make a copy and modify it.
Update : Atomically pointer-swap global pointer to the new copy.
Wait : Wait for all pre-existing readers to finish (Grace Period).
Free : Delete the old copy.
RCU Example Code (Conceptual)
struct config_t * global_config; // Shared pointer
// Reader: fast, no locks
void read_config () {
rcu_read_lock (); // Disable preemption (cheap)
struct config_t * conf = rcu_dereference (global_config);
printf ( "Value: %d \n " , conf -> value );
rcu_read_unlock ();
}
// Writer: expensive
void update_config ( int new_val ) {
struct config_t * new_conf = malloc ( sizeof ( struct config_t ));
struct config_t * old_conf;
// Copy & Update
* new_conf = * global_config;
new_conf -> value = new_val;
// Swap pointers (Atomic Publisher)
old_conf = global_config;
rcu_assign_pointer (global_config, new_conf);
// Wait for all readers who saw old_conf to leave
synchronize_rcu ();
// Safe to free now
free (old_conf);
}
Use Case : Read-mostly workloads (99% reads, 1% writes).
Trade-off : Writers are slow and can block; memory usage spikes during update.
Lock-Free Programming
Avoid locks entirely using atomics:
Lock-Free Stack
#include <stdatomic.h>
typedef struct node {
int data;
struct node * next;
} node_t ;
typedef struct {
_Atomic ( node_t * ) top;
} lock_free_stack_t ;
void push ( lock_free_stack_t * stack , int data ) {
node_t * new_node = malloc ( sizeof ( node_t ));
new_node -> data = data;
node_t * old_top;
do {
old_top = atomic_load ( & stack -> top );
new_node -> next = old_top;
} while ( ! atomic_compare_exchange_weak ( & stack -> top , & old_top, new_node));
}
bool pop ( lock_free_stack_t * stack , int * data ) {
node_t * old_top;
node_t * new_top;
do {
old_top = atomic_load ( & stack -> top );
if (old_top == NULL ) return false ;
new_top = old_top -> next ;
} while ( ! atomic_compare_exchange_weak ( & stack -> top , & old_top, new_top));
* data = old_top -> data ;
// Can't free old_top yet! (ABA problem, use hazard pointers)
return true ;
}
ABA Problem
Common Synchronization Problems
Dining Philosophers
#define N 5
pthread_mutex_t forks [N];
void philosopher ( int id ) {
while ( 1 ) {
think ();
// Pick up forks (DEADLOCK possible!)
int left = id;
int right = (id + 1 ) % N;
// Solution: Always pick up lower-numbered fork first
if (left < right) {
pthread_mutex_lock ( & forks [left]);
pthread_mutex_lock ( & forks [right]);
} else {
pthread_mutex_lock ( & forks [right]);
pthread_mutex_lock ( & forks [left]);
}
eat ();
pthread_mutex_unlock ( & forks [left]);
pthread_mutex_unlock ( & forks [right]);
}
}
Readers-Writers (with writer preference)
typedef struct {
pthread_mutex_t mutex;
pthread_mutex_t write_lock;
pthread_cond_t reader_cond;
int readers;
int waiting_writers;
bool writer_active;
} rw_lock_t ;
void read_lock ( rw_lock_t * rw ) {
pthread_mutex_lock ( & rw -> mutex );
while ( rw -> writer_active || rw -> waiting_writers > 0 ) {
pthread_cond_wait ( & rw -> reader_cond , & rw -> mutex );
}
rw -> readers ++ ;
pthread_mutex_unlock ( & rw -> mutex );
}
void read_unlock ( rw_lock_t * rw ) {
pthread_mutex_lock ( & rw -> mutex );
rw -> readers -- ;
if ( rw -> readers == 0 ) {
pthread_cond_signal ( & rw -> reader_cond ); // Wake writers
}
pthread_mutex_unlock ( & rw -> mutex );
}
Interview Deep Dive Questions
Q1: Implement a thread-safe singleton
Answer: // Method 1: Double-checked locking (with proper memory ordering)
#include <stdatomic.h>
typedef struct { int data; } Singleton;
static _Atomic(Singleton * ) instance = NULL ;
static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
Singleton * get_instance () {
Singleton * p = atomic_load_explicit ( & instance, memory_order_acquire);
if (p == NULL ) {
pthread_mutex_lock ( & mutex);
p = atomic_load_explicit ( & instance, memory_order_relaxed);
if (p == NULL ) {
p = malloc ( sizeof (Singleton));
init_singleton (p);
atomic_store_explicit ( & instance, p, memory_order_release);
}
pthread_mutex_unlock ( & mutex);
}
return p;
}
// Method 2: pthread_once (simpler, recommended)
static Singleton * instance = NULL ;
static pthread_once_t once = PTHREAD_ONCE_INIT;
static void init_instance () {
instance = malloc ( sizeof (Singleton));
init_singleton (instance);
}
Singleton * get_instance () {
pthread_once ( & once, init_instance);
return instance;
}
Why double-checked locking needs memory ordering :
Without release/acquire, compiler/CPU may reorder
Other thread might see uninitialized Singleton
Q2: What causes deadlock and how to prevent it?
Answer: Four conditions for deadlock (all must hold):
Mutual Exclusion : Resource can’t be shared
Hold and Wait : Thread holds resource while waiting for another
No Preemption : Can’t force thread to release
Circular Wait : A waits for B, B waits for A
Prevention strategies :Strategy How Trade-off Break mutual exclusion Use lock-free data structures Complex to implement Break hold-and-wait Acquire all locks at once Reduces concurrency Break no preemption Use trylock with timeout May need retry logic Break circular wait Lock ordering Must maintain order everywhere
Lock ordering example :// Always acquire in consistent order (e.g., by address)
void safe_transfer ( account_t * from , account_t * to , int amount ) {
account_t * first = (from < to) ? from : to;
account_t * second = (from < to) ? to : from;
pthread_mutex_lock ( & first -> mutex );
pthread_mutex_lock ( & second -> mutex );
from -> balance -= amount;
to -> balance += amount;
pthread_mutex_unlock ( & second -> mutex );
pthread_mutex_unlock ( & first -> mutex );
}
Q3: Explain memory ordering and memory barriers
Answer: The problem : CPUs and compilers reorder memory operations for performance.// Thread 1 Thread 2
x = 1 ; while ( ! ready);
ready = true ; print (x); // Might print 0!
Memory orderings (C11 atomics):Ordering Guarantee relaxedAtomicity only, no ordering acquireOperations after can’t move before releaseOperations before can’t move after acq_relBoth acquire and release seq_cstTotal global order (strongest, default)
Fixed version :atomic_int x, ready;
// Thread 1
atomic_store_explicit ( & x , 1 , memory_order_relaxed);
atomic_store_explicit ( & ready , 1 , memory_order_release); // Release
// Thread 2
while ( ! atomic_load_explicit ( & ready , memory_order_acquire)); // Acquire
print ( atomic_load_explicit ( & x , memory_order_relaxed)); // Sees 1
Hardware barriers :
x86: Strong ordering, mostly don’t need explicit barriers
ARM: Weak ordering, need barriers (dmb, dsb)
Compiler barrier: asm volatile("" ::: "memory")
Q4: Design a rate limiter
Answer: #include <pthread.h>
#include <time.h>
typedef struct {
int max_requests; // Max requests per window
int window_ms; // Window size in milliseconds
int current_count; // Requests in current window
long window_start; // Window start time
pthread_mutex_t mutex;
} RateLimiter;
long current_time_ms () {
struct timespec ts;
clock_gettime (CLOCK_MONOTONIC, & ts);
return ts . tv_sec * 1000 + ts . tv_nsec / 1000000 ;
}
bool try_acquire (RateLimiter * rl ) {
pthread_mutex_lock ( & rl -> mutex );
long now = current_time_ms ();
// Check if window expired
if (now - rl -> window_start >= rl -> window_ms ) {
rl -> window_start = now;
rl -> current_count = 0 ;
}
bool allowed = false ;
if ( rl -> current_count < rl -> max_requests ) {
rl -> current_count ++ ;
allowed = true ;
}
pthread_mutex_unlock ( & rl -> mutex );
return allowed;
}
Improvements :
Sliding window : Track request timestamps, more accurate
Token bucket : Allows bursts, smoother rate limiting
Distributed : Use Redis + Lua for atomic operations
Token bucket :typedef struct {
double tokens;
double rate; // Tokens per second
double capacity; // Max tokens
long last_update;
pthread_mutex_t mutex;
} TokenBucket;
bool try_acquire_token (TokenBucket * tb ) {
pthread_mutex_lock ( & tb -> mutex );
long now = current_time_ms ();
double elapsed = (now - tb -> last_update ) / 1000.0 ;
// Add tokens based on elapsed time
tb -> tokens = fmin ( tb -> capacity , tb -> tokens + elapsed * tb -> rate );
tb -> last_update = now;
bool allowed = false ;
if ( tb -> tokens >= 1.0 ) {
tb -> tokens -= 1.0 ;
allowed = true ;
}
pthread_mutex_unlock ( & tb -> mutex );
return allowed;
}
Q5: When should you use spinlock vs mutex?
Answer: Use spinlock when :
Critical section is very short (< 1 μs)
Running on multicore system
Can’t sleep (interrupt context, kernel code)
Lock contention is low
Use mutex when :
Critical section is longer
May need to sleep while holding lock
Single-core system (spinning wastes the only CPU)
High contention (sleeping saves CPU)
Adaptive locks (best of both):void adaptive_lock ( lock_t * lock ) {
int spins = 0 ;
while ( ! try_lock (lock)) {
if (spins ++ < SPIN_THRESHOLD) {
cpu_relax (); // Pause instruction (reduces power)
} else {
// Lock holder running on another CPU?
if ( is_lock_holder_running (lock)) {
continue ; // Keep spinning
}
// Lock holder sleeping, so should we
sleep_until_lock_available (lock);
spins = 0 ;
}
}
}
Linux kernel :
spin_lock(): Always spin
mutex_lock(): Hybrid (optimistic spin, then sleep)
rt_mutex: Priority inheritance for real-time
Practice Exercises
Implement Barrier
Create a reusable barrier that N threads wait at before continuing.
Lock-Free Queue
Implement a lock-free MPSC (multi-producer, single-consumer) queue.
RW Lock from Primitives
Build a readers-writer lock using only mutexes and condition variables.
Deadlock Detector
Create a lock wrapper that detects potential deadlocks using lock ordering.
Key Takeaways
Mutex for Most Cases Default choice. Sleep-waits, works everywhere, simple to use.
While, Not If Always recheck condition after cond_wait returns.
Lock Ordering Prevents Deadlock Consistent order eliminates circular wait.
Lock-Free is Hard ABA problem, memory ordering. Use only when necessary.
Next: Deadlocks →