Deadlocks

A deadlock is a situation where two or more threads are waiting indefinitely for resources held by each other. Understanding deadlocks is essential for senior engineers — they can bring entire systems down.

Interview Frequency: High
Key Topics: Four conditions, Banker’s algorithm, prevention strategies
Time to Master: 6-8 hours

Deadlock Fundamentals

Classic Deadlock Scenario

The Four Necessary Conditions

Coffman Conditions: ALL four must hold for deadlock to occur. Breaking ANY one prevents deadlock.

1. Mutual Exclusion

Resources can only be held by one thread at a time.Example: A mutex can only be held by one thread.

2. Hold and Wait

Thread holds resources while waiting for others.Example: Thread holds mutex1 while waiting for mutex2.

3. No Preemption

Resources can only be released voluntarily.Example: Can’t force a thread to release its mutex.

4. Circular Wait

Circular chain of threads waiting for each other.Example: A waits for B, B waits for A.

Resource Allocation Graph

Visualize resource allocation to detect deadlocks:

Detection rule:

Single instance per resource: Cycle = Deadlock
Multiple instances: Cycle is necessary but not sufficient

Deadlock Prevention

Break one of the four conditions:

1. Break Mutual Exclusion

// Use lock-free data structures
// Example: Atomic counter instead of mutex-protected counter

#include <stdatomic.h>
atomic_int counter = 0;

void increment() {
    atomic_fetch_add(&counter, 1);  // No mutex needed
}

Limitation: Not all resources can be shared (e.g., printer, write access)

2. Break Hold and Wait

// Acquire ALL locks at once, or none

typedef struct {
    pthread_mutex_t mutex;
    int id;
} resource_t;

bool acquire_all(resource_t *resources[], int count) {
    // Try to lock all
    for (int i = 0; i < count; i++) {
        if (pthread_mutex_trylock(&resources[i]->mutex) != 0) {
            // Failed to acquire one, release all held
            for (int j = 0; j < i; j++) {
                pthread_mutex_unlock(&resources[j]->mutex);
            }
            return false;  // Retry later
        }
    }
    return true;  // Got all resources
}

Limitation: Low resource utilization, may cause starvation

3. Break No Preemption

// If can't get resource, release what you have

bool acquire_with_preemption(pthread_mutex_t *m1, pthread_mutex_t *m2) {
    while (1) {
        pthread_mutex_lock(m1);
        
        if (pthread_mutex_trylock(m2) == 0) {
            return true;  // Got both
        }
        
        // Can't get m2, release m1 (preempt self)
        pthread_mutex_unlock(m1);
        
        // Back off to avoid livelock
        usleep(random() % 1000);
    }
}

Limitation: Only works for restartable resources (not printers, etc.)

4. Break Circular Wait (Lock Ordering)

// Always acquire locks in consistent order

// Define global ordering by resource address
void safe_lock_two(pthread_mutex_t *a, pthread_mutex_t *b) {
    if (a < b) {
        pthread_mutex_lock(a);
        pthread_mutex_lock(b);
    } else {
        pthread_mutex_lock(b);
        pthread_mutex_lock(a);
    }
}

// Or use explicit priority
#define LOCK_PRIORITY_DB 1
#define LOCK_PRIORITY_CACHE 2
#define LOCK_PRIORITY_LOG 3

// Always lock in priority order: DB → Cache → Log

This is the most practical prevention method!

Deadlock Avoidance

Allow the system to deny requests that COULD lead to deadlock.

Safe State

A state is safe if there exists a sequence where all processes can complete:

Banker’s Algorithm

// System state
int n_processes, n_resources;
int available[n_resources];          // Available resources
int max[n_processes][n_resources];   // Maximum demand
int allocation[n_processes][n_resources];  // Currently allocated
int need[n_processes][n_resources];  // max - allocation

bool is_safe_state() {
    bool finish[n_processes] = {false};
    int work[n_resources];
    memcpy(work, available, sizeof(available));
    
    int completed = 0;
    while (completed < n_processes) {
        bool found = false;
        
        for (int i = 0; i < n_processes; i++) {
            if (finish[i]) continue;
            
            // Check if process i can finish with available resources
            bool can_finish = true;
            for (int j = 0; j < n_resources; j++) {
                if (need[i][j] > work[j]) {
                    can_finish = false;
                    break;
                }
            }
            
            if (can_finish) {
                // Simulate process completing
                for (int j = 0; j < n_resources; j++) {
                    work[j] += allocation[i][j];
                }
                finish[i] = true;
                completed++;
                found = true;
            }
        }
        
        if (!found) return false;  // No process can finish
    }
    
    return true;  // Found safe sequence
}

bool request_resources(int process_id, int request[]) {
    // Check if request exceeds need
    for (int j = 0; j < n_resources; j++) {
        if (request[j] > need[process_id][j]) {
            return false;  // Error: exceeds maximum claim
        }
        if (request[j] > available[j]) {
            return false;  // Must wait
        }
    }
    
    // Pretend to allocate
    for (int j = 0; j < n_resources; j++) {
        available[j] -= request[j];
        allocation[process_id][j] += request[j];
        need[process_id][j] -= request[j];
    }
    
    // Check if resulting state is safe
    if (is_safe_state()) {
        return true;  // Grant request
    }
    
    // Not safe, rollback
    for (int j = 0; j < n_resources; j++) {
        available[j] += request[j];
        allocation[process_id][j] -= request[j];
        need[process_id][j] += request[j];
    }
    
    return false;  // Deny request
}

Complexity: O(n² × m) where n = processes, m = resource types

Deadlock Detection

Instead of preventing/avoiding, detect and recover:

Detection Algorithm

// Similar to Banker's safety check
bool detect_deadlock() {
    bool finish[n_processes];
    int work[n_resources];
    
    memcpy(work, available, sizeof(available));
    
    // Initially, only processes with no allocation are "finished"
    for (int i = 0; i < n_processes; i++) {
        bool has_allocation = false;
        for (int j = 0; j < n_resources; j++) {
            if (allocation[i][j] > 0) {
                has_allocation = true;
                break;
            }
        }
        finish[i] = !has_allocation;
    }
    
    // Find processes that can complete
    bool changed;
    do {
        changed = false;
        for (int i = 0; i < n_processes; i++) {
            if (finish[i]) continue;
            
            bool can_finish = true;
            for (int j = 0; j < n_resources; j++) {
                if (request[i][j] > work[j]) {
                    can_finish = false;
                    break;
                }
            }
            
            if (can_finish) {
                finish[i] = true;
                for (int j = 0; j < n_resources; j++) {
                    work[j] += allocation[i][j];
                }
                changed = true;
            }
        }
    } while (changed);
    
    // Any unfinished process is deadlocked
    for (int i = 0; i < n_processes; i++) {
        if (!finish[i]) return true;  // Deadlock detected
    }
    return false;
}

When to Run Detection

Strategy	Frequency	Trade-off
Every request	Immediate detection	High overhead
Periodic (every N seconds)	Balance	May delay detection
When CPU utilization drops	Triggered	Resource dependent

Deadlock Recovery

Once detected, how to break the deadlock:

1. Process Termination

void recover_by_termination() {
    // Option A: Kill all deadlocked processes (drastic)
    for (int i = 0; i < n_processes; i++) {
        if (is_deadlocked(i)) {
            kill_process(i);
        }
    }
    
    // Option B: Kill one at a time until deadlock resolved
    while (detect_deadlock()) {
        int victim = select_victim();  // Based on priority, resources, etc.
        kill_process(victim);
    }
}

int select_victim() {
    // Criteria for victim selection:
    // 1. Priority (kill lowest priority)
    // 2. Resources held (kill one with most)
    // 3. Progress (kill one with least work done)
    // 4. Resources needed (kill one needing most)
    // 5. Interactive vs batch (prefer killing batch)
}

2. Resource Preemption

void recover_by_preemption() {
    // Take resources from some processes, give to others
    
    while (detect_deadlock()) {
        int victim = select_victim();
        
        // Roll back victim to safe state
        rollback_process(victim);
        
        // Reallocate its resources
        for (int j = 0; j < n_resources; j++) {
            available[j] += allocation[victim][j];
            allocation[victim][j] = 0;
        }
    }
}

Challenges:

Rolling back may lose work
Same process may be selected repeatedly (starvation)
Need checkpointing for rollback

Locking Design Playbook

This section is a practical checklist for designing locking in a new subsystem so you avoid deadlocks up front.

Step 1: Identify Resources and Contention Points

List all shared data structures and external resources (sockets, files, devices) that multiple threads touch.
For each, decide:
- Is it mostly read or write heavy?
- How large is the critical section?
- Is ordering important (e.g., parent → child, global → local)?

Step 2: Define a Global Lock Ordering

Assign each lock a total order (e.g., GLOBAL < DB < CACHE < LOG).
Enforce the rule: always acquire locks in increasing order.
Document this in comments and design docs so future contributors follow it.

Step 3: Choose the Right Primitive

Coarse mutex for simple systems with low contention.
Fine-grained locks where contention hotspots are identified.
Reader-writer locks for read-mostly data.
Lock-free / RCU only when the performance benefit justifies the complexity.

Step 4: Add Timeouts and Diagnostics

Prefer trylock + backoff in non-critical paths where waiting forever is unacceptable.
Add lock acquisition logging at debug level for tricky subsystems.
In the kernel, enable tools like lockdep; in user space, integrate watch-dog threads.

Step 5: Simulate and Test

Write stress tests that:
- Start many threads acquiring locks in randomized orders (still following your documented rules).
- Intentionally break the ordering in a test-only build to ensure your detection mechanisms fire.
Use tools like helgrind, TSan, or kernel lock validators where applicable.

A good locking design is boring: clearly ordered, well documented, and easy to reason about. Use this playbook whenever you add a new subsystem or refactor an old one.

Livelock

Threads keep changing state but make no progress:

// LIVELOCK EXAMPLE
void thread_a() {
    while (1) {
        lock(mutex_a);
        if (trylock(mutex_b) == 0) {
            // Got both
            do_work();
            unlock(mutex_b);
            unlock(mutex_a);
            return;
        }
        unlock(mutex_a);  // Release and retry
        // Both threads keep doing this forever!
    }
}

// SOLUTION: Add randomized backoff
void thread_a_fixed() {
    while (1) {
        lock(mutex_a);
        if (trylock(mutex_b) == 0) {
            do_work();
            unlock(mutex_b);
            unlock(mutex_a);
            return;
        }
        unlock(mutex_a);
        usleep(random() % 1000);  // Random backoff
    }
}

Starvation

A thread never gets the resources it needs:

// STARVATION EXAMPLE
// Reader-writer lock with reader preference
// Writers may starve if readers keep coming

// SOLUTION: Fair scheduling
// - Limit how long readers can hold lock
// - Give waiting writers priority after threshold
// - Use fair lock (FIFO ordering)

Priority Inversion

Low-priority thread blocks high-priority thread:

High priority (H):   Waiting for lock held by L
Medium priority (M): Running (doesn't need lock)
Low priority (L):    Holds lock, preempted by M

H is blocked by M (indirectly)!

Solutions:

Priority inheritance: L temporarily gets H’s priority
Priority ceiling: Lock has priority, holder gets it

Interview Deep Dive Questions

Q1: Design a system that's deadlock-free

Answer:For database transactions:

-- 1. Lock ordering by table name/ID
-- Always lock accounts before orders
BEGIN;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
SELECT * FROM orders WHERE account_id = 1 FOR UPDATE;
-- Work...
COMMIT;

-- 2. Lock timeout
SET lock_timeout = '5s';
-- Transaction aborts if can't get lock in 5s

-- 3. Deadlock detection (PostgreSQL does this)
-- Detect cycles, abort youngest transaction

For application locks:

// 1. Lock ordering
// Define consistent order based on resource ID
void transfer(Account *from, Account *to, int amount) {
    Account *first = from->id < to->id ? from : to;
    Account *second = from->id < to->id ? to : from;
    
    lock(&first->mutex);
    lock(&second->mutex);
    // Transfer...
    unlock(&second->mutex);
    unlock(&first->mutex);
}

// 2. Try-lock with timeout
bool acquire_or_abort(mutex_t *locks[], int count) {
    for (int i = 0; i < count; i++) {
        if (!try_lock_timeout(&locks[i], 100)) {
            // Timeout, release all and abort
            for (int j = 0; j < i; j++) {
                unlock(locks[j]);
            }
            return false;
        }
    }
    return true;
}

Q2: Walk through Banker's algorithm with an example

Answer:

Resources: A=10, B=5, C=7

Process  Allocation   Max      Need     
         A  B  C    A  B  C   A  B  C   
P0       0  1  0    7  5  3   7  4  3   
P1       2  0  0    3  2  2   1  2  2   
P2       3  0  2    9  0  2   6  0  0   
P3       2  1  1    2  2  2   0  1  1   
P4       0  0  2    4  3  3   4  3  1   

Available: A=3, B=3, C=2

Finding safe sequence:

1. Check P0: Need (7,4,3) > Available (3,3,2)? YES → Skip
2. Check P1: Need (1,2,2) ≤ Available (3,3,2)? YES
   → P1 can complete
   → Work = (3,3,2) + (2,0,0) = (5,3,2)

3. Check P0: (7,4,3) > (5,3,2)? YES → Skip
4. Check P2: (6,0,0) > (5,3,2)? YES → Skip  
5. Check P3: (0,1,1) ≤ (5,3,2)? YES
   → P3 can complete
   → Work = (5,3,2) + (2,1,1) = (7,4,3)

6. Check P0: (7,4,3) ≤ (7,4,3)? YES
   → P0 can complete
   → Work = (7,4,3) + (0,1,0) = (7,5,3)

7. Check P2: (6,0,0) ≤ (7,5,3)? YES
   → Work = (7,5,3) + (3,0,2) = (10,5,5)

8. Check P4: (4,3,1) ≤ (10,5,5)? YES
   → P4 can complete

Safe sequence: <P1, P3, P0, P2, P4>
System is in safe state!

Request evaluation:

P1 requests (1, 0, 2):

1. Check: Request ≤ Need? (1,0,2) ≤ (1,2,2)? YES
2. Check: Request ≤ Available? (1,0,2) ≤ (3,3,2)? YES
3. Pretend to allocate:
   Allocation[P1] = (2,0,0) + (1,0,2) = (3,0,2)
   Available = (3,3,2) - (1,0,2) = (2,3,0)
   Need[P1] = (1,2,2) - (1,0,2) = (0,2,0)

4. Check if safe: Run safety algorithm...
   → If safe, grant request
   → If not safe, deny and rollback

Q3: How does a database handle deadlocks?

Answer:PostgreSQL approach:

Wait-for graph: Track which transaction waits for which
Deadlock detection: Periodically (every deadlock_timeout, default 1s) check for cycles
Victim selection: Abort transaction that:
- Has done least work (youngest)
- Holds fewest locks
- Is easiest to rollback

-- Example deadlock
-- Session 1                    -- Session 2
BEGIN;                          BEGIN;
UPDATE accounts SET             UPDATE accounts SET
  balance = 100                   balance = 200
  WHERE id = 1;                   WHERE id = 2;

-- Session 1 waits              -- Session 2 waits
UPDATE accounts SET             UPDATE accounts SET
  balance = 150                   balance = 250
  WHERE id = 2;                   WHERE id = 1;

-- DEADLOCK! PostgreSQL detects and aborts one:
-- ERROR: deadlock detected
-- DETAIL: Process 1234 waits for ShareLock on transaction 5678;
--         Process 5678 waits for ShareLock on transaction 1234.
-- HINT: See server log for query details.

Prevention strategies:

-- 1. Lock ordering
-- Always update rows in consistent order (by ID)
SELECT * FROM accounts WHERE id IN (1, 2) 
ORDER BY id FOR UPDATE;

-- 2. Lock timeout
SET lock_timeout = '5s';

-- 3. Use NOWAIT
SELECT * FROM accounts WHERE id = 1 FOR UPDATE NOWAIT;
-- Fails immediately if lock not available

-- 4. Advisory locks for complex operations
SELECT pg_advisory_lock(12345);
-- ... operations ...
SELECT pg_advisory_unlock(12345);

Q4: Explain priority inversion with a real-world example

Answer:Mars Pathfinder (1997):

System had three threads:
- High priority: Communication task
- Medium priority: Data gathering
- Low priority: Meteorological task (held shared bus lock)

What happened:
1. Low-priority task acquires bus lock
2. High-priority task preempts, needs bus lock, blocks
3. Medium-priority task preempts low-priority
4. Medium runs indefinitely
5. Low can't release lock (preempted)
6. High starves, watchdog timer triggers reset

Result: Spacecraft kept resetting!

Solutions implemented:

Priority Inheritance:

void lock_with_inheritance(mutex_t *m) {
    disable_preemption();
    
    if (m->holder != NULL) {
        // Boost holder's priority to ours
        if (current->priority > m->holder->priority) {
            m->holder->effective_priority = current->priority;
        }
    }
    
    // Now wait for lock
    while (!try_lock(m)) {
        sleep_on_mutex(m);
    }
    
    enable_preemption();
}

void unlock_with_inheritance(mutex_t *m) {
    // Restore original priority
    current->effective_priority = current->base_priority;
    m->holder = NULL;
    wake_waiters(m);
}

Priority Ceiling Protocol:

// Each mutex has ceiling = highest priority of any thread that uses it

void lock_with_ceiling(mutex_t *m) {
    // Immediately boost to ceiling priority
    saved_priority = current->priority;
    current->priority = m->ceiling;
    
    acquire_lock(m);
}

void unlock_with_ceiling(mutex_t *m) {
    release_lock(m);
    current->priority = saved_priority;
}

Mars Pathfinder fix: Enabled priority inheritance on VxWorks mutexes via remote patch upload!

Q5: Implement a deadlock detection system

Answer:

// Lock wrapper with deadlock detection

#include <pthread.h>
#include <stdbool.h>

#define MAX_THREADS 100
#define MAX_LOCKS 100

typedef struct {
    pthread_mutex_t internal_mutex;
    int id;
    int holder_tid;  // Thread holding this lock
} tracked_mutex_t;

// Adjacency matrix for wait-for graph
// waits_for[i][j] = true if thread i waits for thread j
bool waits_for[MAX_THREADS][MAX_THREADS];
pthread_mutex_t graph_lock = PTHREAD_MUTEX_INITIALIZER;

bool has_cycle_dfs(int node, bool visited[], bool in_stack[]) {
    visited[node] = true;
    in_stack[node] = true;
    
    for (int i = 0; i < MAX_THREADS; i++) {
        if (waits_for[node][i]) {
            if (!visited[i]) {
                if (has_cycle_dfs(i, visited, in_stack)) {
                    return true;
                }
            } else if (in_stack[i]) {
                return true;  // Back edge = cycle!
            }
        }
    }
    
    in_stack[node] = false;
    return false;
}

bool detect_deadlock() {
    bool visited[MAX_THREADS] = {false};
    bool in_stack[MAX_THREADS] = {false};
    
    for (int i = 0; i < MAX_THREADS; i++) {
        if (!visited[i]) {
            if (has_cycle_dfs(i, visited, in_stack)) {
                return true;
            }
        }
    }
    return false;
}

int tracked_lock(tracked_mutex_t *m) {
    int my_tid = get_thread_id();
    
    // Try to acquire
    if (pthread_mutex_trylock(&m->internal_mutex) == 0) {
        m->holder_tid = my_tid;
        return 0;
    }
    
    // Update wait-for graph
    pthread_mutex_lock(&graph_lock);
    waits_for[my_tid][m->holder_tid] = true;
    
    if (detect_deadlock()) {
        waits_for[my_tid][m->holder_tid] = false;
        pthread_mutex_unlock(&graph_lock);
        return EDEADLK;  // Would deadlock!
    }
    pthread_mutex_unlock(&graph_lock);
    
    // Actually wait
    pthread_mutex_lock(&m->internal_mutex);
    
    // Got lock, update graph
    pthread_mutex_lock(&graph_lock);
    waits_for[my_tid][m->holder_tid] = false;
    m->holder_tid = my_tid;
    pthread_mutex_unlock(&graph_lock);
    
    return 0;
}

Practice Exercises

Banker's Simulator

Implement Banker’s algorithm. Test with various request sequences.

Deadlock Detector

Build a system that wraps pthread_mutex and detects potential deadlocks.

Dining Philosophers

Solve with different strategies: lock ordering, waiter, chandy-misra.

Priority Inheritance

Implement mutex with priority inheritance protocol.

Key Takeaways

Four Conditions

Mutual exclusion, hold-and-wait, no preemption, circular wait. Break any one.

Lock Ordering Works

Most practical prevention. Define consistent order, always follow it.

Detection + Recovery

For complex systems. Run detection periodically, abort victim.

Databases Handle It

Built-in deadlock detection. Abort youngest transaction on cycle.

Next: Inter-Process Communication →

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Deadlocks

​Deadlock Fundamentals

​Classic Deadlock Scenario

​The Four Necessary Conditions

1. Mutual Exclusion

2. Hold and Wait

3. No Preemption

4. Circular Wait

​Resource Allocation Graph

​Deadlock Prevention

​1. Break Mutual Exclusion

​2. Break Hold and Wait

​3. Break No Preemption

​4. Break Circular Wait (Lock Ordering)

​Deadlock Avoidance

​Safe State

​Banker’s Algorithm

​Deadlock Detection

​Detection Algorithm

​When to Run Detection

​Deadlock Recovery

​1. Process Termination

​2. Resource Preemption

​Locking Design Playbook

​Step 1: Identify Resources and Contention Points

​Step 2: Define a Global Lock Ordering

​Step 3: Choose the Right Primitive

​Step 4: Add Timeouts and Diagnostics

​Step 5: Simulate and Test

​Related Problems

​Livelock

​Starvation

​Priority Inversion

​Interview Deep Dive Questions

​Practice Exercises

​Key Takeaways

Four Conditions

Lock Ordering Works

Deadlocks

Deadlock Fundamentals

Classic Deadlock Scenario

The Four Necessary Conditions

Resource Allocation Graph

Deadlock Prevention

1. Break Mutual Exclusion

2. Break Hold and Wait

3. Break No Preemption

4. Break Circular Wait (Lock Ordering)

Deadlock Avoidance

Safe State

Banker’s Algorithm

Deadlock Detection

Detection Algorithm

When to Run Detection

Deadlock Recovery

1. Process Termination

2. Resource Preemption

Locking Design Playbook

Step 1: Identify Resources and Contention Points

Step 2: Define a Global Lock Ordering

Step 3: Choose the Right Primitive

Step 4: Add Timeouts and Diagnostics

Step 5: Simulate and Test

Related Problems

Livelock

Starvation

Priority Inversion

Interview Deep Dive Questions

Practice Exercises

Key Takeaways