Threads & Concurrency
A thread is the smallest unit of CPU execution — a lightweight process that shares memory with other threads in the same process. Understanding threading is critical for senior engineers building high-performance systems.Interview Frequency: Very High
Key Topics: Threading models, pthreads, thread safety, thread pools
Time to Master: 10-12 hours
Key Topics: Threading models, pthreads, thread safety, thread pools
Time to Master: 10-12 hours
Thread vs Process
Process
- Heavy — has own address space
- Expensive to create (fork + COW)
- Isolated — protected from other processes
- Crash is contained
- Communication needs IPC
Thread
- Light — shares address space
- Cheap to create
- Shared memory — easy communication
- One thread crash can kill all
- Direct memory sharing (with care)
What Threads Share (and Don’t)
Thread Models
User-Level Threads (N:1)
Threads managed entirely in user space by a library:| Pros | Cons |
|---|---|
| Very fast thread switch (no kernel) | One block blocks all threads |
| No kernel modifications needed | Can’t use multiple CPUs |
| Portable across OSes | Not true parallelism |
Kernel-Level Threads (1:1)
Each user thread maps to one kernel thread:| Pros | Cons |
|---|---|
| True parallelism on multiple CPUs | Slower thread creation |
| One block doesn’t affect others | Kernel resources per thread |
| Full kernel scheduling features | More context switch overhead |
Hybrid (M:N)
M user threads mapped to N kernel threads:| Pros | Cons |
|---|---|
| Best of both worlds | Complex implementation |
| Efficient thread switching | Scheduling challenges |
| True parallelism | Priority inversion possible |
POSIX Threads (pthreads)
The standard threading API on Unix systems:Creating and Joining Threads
gcc -pthread program.c -o program
Thread Attributes
Detached Threads
Detached vs Joinable:
- Joinable (default): Resources held until
pthread_join()called - Detached: Resources freed immediately on thread exit, can’t retrieve return value
Thread-Local Storage (TLS)
Data that’s private to each thread:Method 1: pthread_key (Traditional)
Method 2: __thread keyword (GCC/Modern)
Thread Cancellation
Thread Pool Pattern
Creating threads is expensive. Thread pools reuse threads:Thread Pool Sizing
Modern Concurrency: Coroutines & Green Threads
Go Goroutines (M:N)
- Goroutines are multiplexed onto OS threads (M:N model)
- Initial stack is ~2KB (vs 2MB for pthreads)
- Stack grows dynamically as needed
- Runtime handles scheduling, not kernel
Python asyncio (Cooperative)
Thread Safety
Identifying Thread-Unsafe Code
Reentrancy vs Thread Safety
| Property | Reentrant | Thread-Safe |
|---|---|---|
| Multiple threads | Safe | Safe |
| Interrupted and re-entered | Safe | Not necessarily |
| Uses global/static state | No | Maybe (with locks) |
| Calls non-reentrant functions | No | Maybe |
| Uses locks | Usually no | Often yes |
strtok_r, localtime_r, rand_rNon-reentrant:
strtok, localtime, rand
Interview Deep Dive Questions
Q1: Why are threads lighter than processes?
Q1: Why are threads lighter than processes?
Answer:
- Memory: Threads share address space (code, data, heap), only need private stack and registers
- Creation: No page table copy, no memory mapping setup
- Context switch: No TLB flush (same address space), cache stays warm for shared data
- Communication: Direct memory access vs IPC mechanisms
- Process creation: ~10,000 CPU cycles
- Thread creation: ~1,000 CPU cycles
- Thread stack: 2-8 MB default
- Process overhead: PCB + page tables + mappings
Q2: Explain Go's goroutine scheduling
Q2: Explain Go's goroutine scheduling
Answer:Go uses GMP model:
- G (Goroutine): Lightweight user-space thread
- M (Machine): OS thread
- P (Processor): Logical processor context (GOMAXPROCS)
- P holds a run queue of Gs
- M takes P to execute Gs
- When G blocks on syscall, M releases P for another M
- Work stealing: idle P steals Gs from busy P’s queue
- Thousands of goroutines on few OS threads
- Low creation cost (~2KB initial stack)
- Preemptive (since Go 1.14) at function calls
- Efficient I/O multiplexing with netpoller
Q3: How would you size a thread pool for a web server?
Q3: How would you size a thread pool for a web server?
Answer:Analysis:
- Profile typical request: CPU time (S) and I/O wait time (W)
- Determine request characteristics
N = N_CPU * (1 + W/S)Example:- 8 CPUs
- Average request: 10ms CPU, 90ms I/O (database)
- N = 8 * (1 + 90/10) = 8 * 10 = 80 threads
- Monitor queue length and response times
- Use separate pools for CPU-bound vs I/O-bound
- Consider connection limits (database connections)
- Add bounded queue to handle bursts
Q4: What problems can occur with thread cancellation?
Q4: What problems can occur with thread cancellation?
Answer:Problems:
- Resource leaks: Thread holding malloc’d memory
- Lock abandonment: Thread holding mutex when canceled
- Inconsistent state: Canceled mid-transaction
- Deadlock: Other threads waiting on canceled thread’s lock
- Cleanup handlers:
- Cancellation points: Define where cancellation can happen
- Defer cancellation:
- Better design: Use cooperative shutdown with flag
Q5: How do you debug a multi-threaded race condition?
Q5: How do you debug a multi-threaded race condition?
Answer:Detection tools:
- ThreadSanitizer (TSan):
gcc -fsanitize=thread - Helgrind: Part of Valgrind suite
- Intel Inspector: Commercial tool
- Stress testing: Run with many threads, iterations
- Add sleeps: Increase timing windows
- Logging: Track thread IDs and operations
- Core dumps: Analyze thread states
- Minimize shared state
- Use immutable data
- Lock ordering: Prevent deadlocks
- Code review: Focus on shared access
Practice Exercises
1
Producer-Consumer
Implement a thread-safe bounded buffer with multiple producers and consumers using pthreads.
2
Thread Pool
Build a thread pool with dynamic sizing based on queue length.
3
Parallel Matrix Multiply
Parallelize matrix multiplication with proper thread count selection.
4
Read-Write Lock
Implement a readers-writer lock from mutexes and condition variables.
Key Takeaways
1:1 is Standard
Modern Linux/Windows use 1:1 model. Goroutines/Erlang use M:N for scale.
Thread Pool Always
Create pools, not individual threads. Size based on workload profile.
TLS for Thread State
Use thread-local storage for per-thread data, not globals.
Cancellation is Dangerous
Prefer cooperative shutdown with flags over pthread_cancel.
Next: CPU Scheduling →