Skip to main content

Kernel Synchronization

The Linux kernel is massively concurrent - multiple CPUs executing kernel code simultaneously, interrupts preempting at any moment, and softirqs running in parallel. Understanding synchronization primitives is essential for reading kernel code and debugging race conditions.
Interview Frequency: Very High (critical for systems roles)
Key Topics: Spinlocks, mutexes, RCU, memory barriers, deadlock debugging
Time to Master: 14-16 hours

Why Kernel Synchronization is Different

User-space synchronization is simpler because:
  • Processes are preemptible (scheduler handles it)
  • Only threads of the same process share address space
  • You can always sleep when waiting for a lock
Kernel synchronization is harder because:
  • Interrupt context: Cannot sleep, must use spinlocks
  • Preemption: Kernel code can be preempted (on PREEMPT kernels)
  • Multiple CPUs: True parallelism, not just concurrency
  • Diverse contexts: Process context, softirq, hardirq - different rules for each

Synchronization Contexts

┌─────────────────────────────────────────────────────────────────────────────┐
│                   KERNEL EXECUTION CONTEXTS                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Context         Can Sleep?    Preemptible?    Lock Types Allowed           │
│  ────────────────────────────────────────────────────────────────────────   │
│                                                                              │
│  Process         Yes           Yes             mutex, semaphore,             │
│  Context         (GFP_KERNEL)  (CONFIG_       spinlock, rwlock              │
│                                 PREEMPT)                                    │
│                                                                              │
│  Softirq/        No            No              spinlock (with               │
│  Tasklet                                       IRQ disabling if             │
│                                                sharing with hardirq)        │
│                                                                              │
│  Hardirq         No            No              spin_lock_irqsave            │
│                  (atomic)                      only                         │
│                                                                              │
│  NMI             No            No              raw_spinlock,                │
│                  (super                        trylock only                 │
│                   atomic)                                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Atomic Operations

The foundation of all synchronization - operations that cannot be interrupted:

Atomic Types

#include <linux/atomic.h>

// Basic atomic integer
atomic_t counter = ATOMIC_INIT(0);

// 64-bit atomic
atomic64_t big_counter = ATOMIC64_INIT(0);

// Atomic operations
atomic_read(&counter);           // Read value
atomic_set(&counter, 5);         // Set value
atomic_add(10, &counter);        // Add
atomic_sub(3, &counter);         // Subtract
atomic_inc(&counter);            // Increment
atomic_dec(&counter);            // Decrement

// Atomic read-modify-write
int old = atomic_fetch_add(5, &counter);  // Returns old value
int old = atomic_xchg(&counter, 10);       // Exchange

// Atomic test operations
atomic_dec_and_test(&counter);   // Returns true if result is 0
atomic_add_negative(n, &counter); // Returns true if result is negative

// Compare and swap
atomic_cmpxchg(&counter, old, new);  // If counter==old, set to new
atomic_try_cmpxchg(&counter, &old, new);  // Returns bool

When to Use Atomic Operations

// Good: Simple counter
static atomic_t connection_count = ATOMIC_INIT(0);

void on_connect(void) {
    atomic_inc(&connection_count);
}

void on_disconnect(void) {
    atomic_dec(&connection_count);
}

// Bad: Compound operation (not atomic!)
if (atomic_read(&counter) == 0) {  // RACE!
    atomic_inc(&counter);          // Another CPU could change it
}

// Good: Use atomic primitive
atomic_cmpxchg(&counter, 0, 1);  // Atomic "compare and set if zero"

Spinlocks

The most basic lock - spins in a loop waiting for the lock:

Basic Spinlock

#include <linux/spinlock.h>

// Static initialization
static DEFINE_SPINLOCK(my_lock);

// Dynamic initialization
spinlock_t my_lock;
spin_lock_init(&my_lock);

// Basic usage
spin_lock(&my_lock);
// ... critical section ...
spin_unlock(&my_lock);

// With trylock (non-blocking)
if (spin_trylock(&my_lock)) {
    // ... critical section ...
    spin_unlock(&my_lock);
} else {
    // Lock not available, handle gracefully
}

Spinlock Variants

// Basic (use in process context when not sharing with interrupts)
spin_lock(&lock);
spin_unlock(&lock);

// Disable softirqs (use when sharing with softirq context)
spin_lock_bh(&lock);      // Disable bottom halves
spin_unlock_bh(&lock);    // Re-enable bottom halves

// Disable all interrupts (use when sharing with hardirq context)
unsigned long flags;
spin_lock_irqsave(&lock, flags);     // Save IRQ state, disable IRQs
// ... critical section ...
spin_unlock_irqrestore(&lock, flags); // Restore IRQ state

// Disable interrupts (unconditionally)
spin_lock_irq(&lock);     // Assumes IRQs were enabled
spin_unlock_irq(&lock);   // Re-enables IRQs

// Warning: Don't nest irq variants wrong!
spin_lock_irq(&lock1);
spin_lock_irq(&lock2);    // WRONG! First unlock will re-enable IRQs
spin_unlock_irq(&lock2);  // IRQs now enabled, but lock1 still held!
spin_unlock_irq(&lock1);

// Correct: Use irqsave for inner locks
spin_lock_irq(&lock1);
spin_lock(&lock2);        // IRQs already disabled
spin_unlock(&lock2);
spin_unlock_irq(&lock1);

Spinlock Selection Guide

┌─────────────────────────────────────────────────────────────────────────────┐
│                   WHICH SPINLOCK VARIANT TO USE?                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Q: Is the lock ever acquired in hardirq context?                           │
│  │                                                                           │
│  ├─ YES → Always use spin_lock_irqsave() / spin_unlock_irqrestore()         │
│  │        (Even from process context - must disable IRQs to prevent         │
│  │         deadlock if interrupt tries to acquire same lock)                │
│  │                                                                           │
│  └─ NO ──► Is the lock ever acquired in softirq/tasklet context?            │
│            │                                                                 │
│            ├─ YES → Use spin_lock_bh() / spin_unlock_bh() from process ctx  │
│            │        Use spin_lock() / spin_unlock() from softirq ctx        │
│            │                                                                 │
│            └─ NO ──► Use spin_lock() / spin_unlock()                        │
│                      (But consider if a mutex would be better)              │
│                                                                              │
│  DANGER: Never sleep while holding a spinlock!                               │
│  The CPU spins, so other CPUs waste time waiting.                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Reader-Writer Locks

When reads are much more common than writes:

Read-Write Spinlocks

#include <linux/rwlock.h>

static DEFINE_RWLOCK(my_rwlock);

// Reader (shared access)
read_lock(&my_rwlock);
// ... read data ...
read_unlock(&my_rwlock);

// Writer (exclusive access)
write_lock(&my_rwlock);
// ... modify data ...
write_unlock(&my_rwlock);

// IRQ-safe variants
read_lock_irqsave(&my_rwlock, flags);
write_lock_irqsave(&my_rwlock, flags);

Seqlock (Sequence Lock)

For data with very frequent reads and rare writes:
#include <linux/seqlock.h>

static DEFINE_SEQLOCK(my_seqlock);

// Writer (takes exclusive lock)
write_seqlock(&my_seqlock);
// ... modify data ...
write_sequnlock(&my_seqlock);

// Reader (lockless, may need to retry)
unsigned int seq;
do {
    seq = read_seqbegin(&my_seqlock);
    // ... read data into local variables ...
} while (read_seqretry(&my_seqlock, seq));

// If read_seqretry returns true, a write occurred during read
// and we must retry
When to use seqlock:
  • Data is read very frequently
  • Writes are rare
  • Readers can tolerate occasional retries
  • Example: System time (jiffies)

Mutexes

For when you can sleep - more efficient than spinlocks for long waits:
#include <linux/mutex.h>

// Static initialization
static DEFINE_MUTEX(my_mutex);

// Dynamic initialization
struct mutex my_mutex;
mutex_init(&my_mutex);

// Basic usage
mutex_lock(&my_mutex);           // Sleep if locked
// ... critical section (can sleep here!) ...
mutex_unlock(&my_mutex);

// Interruptible (can be interrupted by signals)
if (mutex_lock_interruptible(&my_mutex)) {
    return -ERESTARTSYS;  // Signal received
}
// ... critical section ...
mutex_unlock(&my_mutex);

// Killable (only SIGKILL can interrupt)
if (mutex_lock_killable(&my_mutex)) {
    return -ERESTARTSYS;
}

// Trylock (non-blocking)
if (mutex_trylock(&my_mutex)) {
    // ... got the lock ...
    mutex_unlock(&my_mutex);
} else {
    // Lock not available
}

Mutex Rules

  1. Only the owner can unlock: Unlike spinlocks, mutexes track ownership
  2. Cannot be used in interrupt context: They might sleep
  3. Cannot be held across sleep with spinlock: Spinlock holder cannot sleep
  4. Recursive locking forbidden: Will deadlock
// WRONG: Recursive locking
mutex_lock(&my_mutex);
foo();  // foo() also calls mutex_lock(&my_mutex) → DEADLOCK!
mutex_unlock(&my_mutex);

// WRONG: Holding mutex in hardirq
irqreturn_t my_irq_handler(int irq, void *dev_id) {
    mutex_lock(&my_mutex);  // BUG: Cannot sleep in IRQ context!
    // ...
}

Semaphores

Counting semaphores for limiting concurrency:
#include <linux/semaphore.h>

// Initialize with count
static DEFINE_SEMAPHORE(my_sem, 3);  // Allow 3 concurrent accessors

struct semaphore my_sem;
sema_init(&my_sem, 3);

// Acquire (decrement)
down(&my_sem);              // Sleep if count is 0
down_interruptible(&my_sem); // Can be interrupted
down_trylock(&my_sem);      // Non-blocking

// Release (increment)
up(&my_sem);
Use cases:
  • Limiting concurrent I/O operations
  • Resource pools (e.g., connection pool)
  • Producer-consumer with bounded buffer

RCU (Read-Copy-Update)

RCU is Linux’s secret weapon for read-mostly data structures. It allows readers to access data without locks while writers make copies and update pointers atomically.

RCU Concept

┌─────────────────────────────────────────────────────────────────────────────┐
│                        RCU MECHANISM                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Traditional Locking:                                                        │
│  ───────────────────                                                        │
│  Reader 1: [====lock====][read][====unlock====]                             │
│  Reader 2:                    [wait...][====lock====][read][unlock]         │
│  Writer:   [wait..................][lock][write][unlock]                    │
│                                                                              │
│  RCU:                                                                        │
│  ─────                                                                       │
│  Reader 1: [rcu_read_lock][read old data][rcu_read_unlock]                  │
│  Reader 2:     [rcu_read_lock][read][rcu_read_unlock]                       │
│  Writer:   [copy][update copy][publish ptr][wait for readers][free old]    │
│            │                  │            │                                 │
│            │                  │            └─ synchronize_rcu()             │
│            │                  └─ rcu_assign_pointer()                       │
│            └─ kmalloc + copy data                                           │
│                                                                              │
│  Key Insight:                                                                │
│  • Readers never wait                                                       │
│  • Writers copy, update pointer atomically, wait, then free old            │
│  • "Grace period" = time for all pre-existing readers to finish            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

RCU API

#include <linux/rcupdate.h>

// Reader-side
rcu_read_lock();
struct my_data *ptr = rcu_dereference(global_ptr);
// ... use ptr (do not dereference after rcu_read_unlock) ...
rcu_read_unlock();

// Writer-side (update)
struct my_data *new = kmalloc(sizeof(*new), GFP_KERNEL);
*new = *old;                    // Copy old data
new->field = new_value;         // Modify
rcu_assign_pointer(global_ptr, new);  // Atomic publish
synchronize_rcu();              // Wait for readers
kfree(old);                     // Safe to free now

// Asynchronous free (don't wait)
call_rcu(&old->rcu_head, my_free_callback);

RCU-Protected List

#include <linux/rculist.h>

struct my_entry {
    int data;
    struct list_head list;
    struct rcu_head rcu;  // For call_rcu
};

static LIST_HEAD(my_list);
static DEFINE_SPINLOCK(list_lock);  // Protects writers

// Reader (lockless)
void read_list(void)
{
    struct my_entry *entry;
    
    rcu_read_lock();
    list_for_each_entry_rcu(entry, &my_list, list) {
        process(entry->data);
    }
    rcu_read_unlock();
}

// Writer (needs lock for mutual exclusion with other writers)
void add_entry(int data)
{
    struct my_entry *new = kmalloc(sizeof(*new), GFP_KERNEL);
    new->data = data;
    
    spin_lock(&list_lock);
    list_add_rcu(&new->list, &my_list);
    spin_unlock(&list_lock);
}

// Delete with RCU
static void entry_free_rcu(struct rcu_head *head)
{
    struct my_entry *entry = container_of(head, struct my_entry, rcu);
    kfree(entry);
}

void remove_entry(struct my_entry *entry)
{
    spin_lock(&list_lock);
    list_del_rcu(&entry->list);
    spin_unlock(&list_lock);
    
    call_rcu(&entry->rcu, entry_free_rcu);  // Free after grace period
}

When to Use RCU

Use CaseGood for RCU?Why
Read-mostly data✅ YesReaders are lockless
Frequently updated data❌ NoWriters block waiting for grace period
Large data structures⚠️ DependsCopying cost may be high
Pointer-based updates✅ YesSingle pointer update is atomic
Field-by-field updates❌ NoNeed copy-on-write for consistency

Memory Barriers

Compilers and CPUs reorder memory operations for performance. Barriers prevent unwanted reordering:

Why Memory Barriers?

// Without barriers, this might be reordered!
int ready = 0;
int data = 0;

// CPU 1 (writer)
data = 42;          // Might execute after ready=1!
ready = 1;

// CPU 2 (reader)
while (!ready);     // Might read ready=1 but...
print(data);        // ...data still shows 0!

// With barriers
// CPU 1 (writer)
data = 42;
smp_wmb();          // Write barrier: data visible before ready
ready = 1;

// CPU 2 (reader)
while (!ready);
smp_rmb();          // Read barrier: read data after ready
print(data);        // Now guaranteed to see 42

Barrier Types

#include <linux/barrier.h>

// Full memory barrier (both reads and writes)
mb();           // Barrier for all memory types
smp_mb();       // Only needed on SMP systems

// Write barrier (writes before cannot pass writes after)
wmb();
smp_wmb();

// Read barrier (reads before cannot pass reads after)
rmb();
smp_rmb();

// Acquire/Release semantics (lighter than full barriers)
smp_load_acquire(&variable);   // All reads after this see writes before corresponding release
smp_store_release(&variable, value);  // All writes before this visible to loads after acquire

// Compiler-only barrier (doesn't affect CPU)
barrier();      // Prevents compiler reordering
READ_ONCE(x);   // Prevent compiler from caching/reordering read
WRITE_ONCE(x, val);  // Prevent compiler from tearing/reordering write

Common Barrier Patterns

// Pattern 1: Flag-guarded data
struct {
    int data;
    int ready;  // Flag
} shared;

// Writer
WRITE_ONCE(shared.data, 42);
smp_wmb();                      // Ensure data written before ready
WRITE_ONCE(shared.ready, 1);

// Reader
while (!READ_ONCE(shared.ready))
    cpu_relax();
smp_rmb();                      // Ensure ready read before data
int value = READ_ONCE(shared.data);

// Pattern 2: Ring buffer
// Producer
buffer[head] = item;
smp_wmb();                      // Item visible before head update
WRITE_ONCE(head, next_head);

// Consumer
while (READ_ONCE(head) == tail)
    cpu_relax();
smp_rmb();                      // Head visible before reading item
item = buffer[tail];

Per-CPU Variables

Avoid locking entirely by giving each CPU its own copy:
#include <linux/percpu.h>

// Static per-CPU variable
static DEFINE_PER_CPU(int, my_counter);

// Dynamic allocation
int __percpu *my_ptr;
my_ptr = alloc_percpu(int);

// Access (must disable preemption!)
void increment_counter(void)
{
    preempt_disable();           // Or get_cpu()
    this_cpu_inc(my_counter);    // Increment this CPU's copy
    preempt_enable();            // Or put_cpu()
}

// Alternative: get_cpu() returns CPU number and disables preemption
void access_percpu(void)
{
    int cpu = get_cpu();
    int *ptr = per_cpu_ptr(&my_counter, cpu);
    (*ptr)++;
    put_cpu();
}

// Read all CPUs (for statistics)
int total = 0;
int cpu;
for_each_possible_cpu(cpu) {
    total += per_cpu(my_counter, cpu);
}

// Atomic per-CPU operations (no preemption disable needed)
this_cpu_add(my_counter, 5);
this_cpu_read(my_counter);
this_cpu_write(my_counter, 0);

Deadlock Prevention and Lockdep

Lock Ordering Rules

┌─────────────────────────────────────────────────────────────────────────────┐
│                       DEADLOCK PREVENTION                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Classic Deadlock:                                                           │
│  ─────────────────                                                          │
│  CPU 0:                      CPU 1:                                         │
│  lock(A)                     lock(B)                                        │
│  lock(B)  ← waits for B      lock(A)  ← waits for A                        │
│  ...                         ...                                            │
│  unlock(B)                   unlock(A)                                      │
│  unlock(A)                   unlock(B)                                      │
│                                                                              │
│  Both CPUs wait forever!                                                     │
│                                                                              │
│  Prevention: Consistent Lock Ordering                                        │
│  ─────────────────────────────────────                                      │
│  Rule: Always acquire locks in the same order                               │
│                                                                              │
│  CPU 0:                      CPU 1:                                         │
│  lock(A)                     lock(A)  ← waits for A                        │
│  lock(B)                     ...      ← eventually gets A                   │
│  ...                         lock(B)                                        │
│  unlock(B)                   ...                                            │
│  unlock(A)                   unlock(B)                                      │
│                              unlock(A)                                      │
│                                                                              │
│  Document your lock ordering!                                                │
│  Example: i_mutex → dentry->d_lock → dcache_lock                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Lockdep: Lock Validator

Linux has a runtime lock validator that catches potential deadlocks:
# Enable in kernel config
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCK_STAT=y

# Check lockdep warnings in dmesg
dmesg | grep -i "lockdep\|deadlock\|held lock"

Lockdep Output Example

=============================================
WARNING: possible recursive locking detected
---------------------------------------------
kworker/0:1/1234 is trying to acquire lock:
 (&my_lock){+.+.}, at: my_function+0x20/0x100

but task is already holding lock:
 (&my_lock){+.+.}, at: my_function+0x10/0x100

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&my_lock);
  lock(&my_lock);   <-- Recursive!

 *** DEADLOCK ***

Lock Annotations

// Annotate lock ordering for lockdep
static DEFINE_MUTEX(outer_lock);
static DEFINE_MUTEX(inner_lock);

void my_function(void)
{
    mutex_lock(&outer_lock);
    mutex_lock_nested(&inner_lock, SINGLE_DEPTH_NESTING);  // Tell lockdep this is intentional
    // ...
    mutex_unlock(&inner_lock);
    mutex_unlock(&outer_lock);
}

// Lock class annotation (for locks that seem same but aren't)
static struct lock_class_key my_lock_key;
__mutex_init(&my_lock, "my_lock", &my_lock_key);

Lock Statistics

# Enable lock statistics
echo 1 > /proc/sys/kernel/lock_stat

# View lock contention
cat /proc/lock_stat

# Sample output:
# class name                 con-bounces    contentions   waittime-min   ...
# &mm->mmap_lock-W:            123456         12345          0.50
# &inode->i_lock:               98765          9876          0.10

Complete Example: Thread-Safe Data Structure

#include <linux/spinlock.h>
#include <linux/rcupdate.h>
#include <linux/slab.h>

struct my_data {
    int value;
    struct rcu_head rcu;
};

struct my_container {
    struct my_data __rcu *current;  // RCU-protected pointer
    spinlock_t update_lock;          // Writer mutual exclusion
};

// Initialize
void my_container_init(struct my_container *c)
{
    struct my_data *data = kzalloc(sizeof(*data), GFP_KERNEL);
    spin_lock_init(&c->update_lock);
    rcu_assign_pointer(c->current, data);
}

// Read (lockless, can be called from any context including IRQ)
int my_container_read(struct my_container *c)
{
    int value;
    
    rcu_read_lock();
    struct my_data *data = rcu_dereference(c->current);
    value = data->value;
    rcu_read_unlock();
    
    return value;
}

// Update (process context only, might sleep)
void my_container_update(struct my_container *c, int new_value)
{
    struct my_data *old, *new;
    
    new = kmalloc(sizeof(*new), GFP_KERNEL);
    if (!new)
        return;
    
    spin_lock(&c->update_lock);
    
    old = rcu_dereference_protected(c->current, 
                                    lockdep_is_held(&c->update_lock));
    new->value = new_value;
    rcu_assign_pointer(c->current, new);
    
    spin_unlock(&c->update_lock);
    
    synchronize_rcu();  // Wait for readers
    kfree(old);
}

Interview Questions

Answer:Use spinlock when:
  • Critical section is very short (microseconds)
  • Code might run in interrupt context
  • Cannot sleep (holding other spinlocks)
  • Lock contention is low
Use mutex when:
  • Critical section might be long
  • Code can sleep (e.g., waiting for I/O)
  • Always in process context
  • Better for longer waits (sleeping is more efficient than spinning)
Key difference: Spinlocks busy-wait (waste CPU), mutexes sleep (yield CPU). Spinlocks are faster for very short holds; mutexes are better for anything longer.
Answer:RCU (Read-Copy-Update) provides lockless reads for read-mostly data:How it works:
  1. Readers use rcu_read_lock() - just disables preemption, no actual lock
  2. Writers copy data, modify copy, atomically update pointer
  3. Writers wait for grace period (all pre-existing readers finish)
  4. Writers free old data
Use when:
  • Reads vastly outnumber writes
  • Can afford to copy data on write
  • Updates are pointer-based (not field-by-field)
Examples: Routing tables, configuration data, module listsAdvantage: Zero overhead for readers on the fast path
Answer:Four conditions for deadlock (all must be true):
  1. Mutual exclusion: Resource can only be held by one thread
  2. Hold and wait: Thread holds one lock while waiting for another
  3. No preemption: Locks cannot be forcibly taken
  4. Circular wait: A→B→C→A dependency chain
Prevention strategies:
  1. Lock ordering: Always acquire locks in the same global order
  2. Lock hierarchy: Document and enforce lock levels
  3. Try-lock with backoff: Use trylock, release and retry if fails
  4. Avoid holding locks while calling unknown code
Detection: Use lockdep (CONFIG_PROVE_LOCKING=y)
Answer:Problem: CPUs and compilers reorder memory operations for performance. On multi-core systems, this can cause one CPU to see operations in a different order than another CPU performed them.Example without barriers:
// CPU 1                    // CPU 2
data = 42;                  while (!flag);
flag = 1;                   print(data);  // Might print 0!
Why it fails: CPU 1 might reorder (flag=1 before data=42), or CPU 2 might read stale data from cache.Solution:
// CPU 1                    // CPU 2
data = 42;                  while (!READ_ONCE(flag));
smp_wmb();                  smp_rmb();
WRITE_ONCE(flag, 1);        print(data);  // Now guaranteed 42
Barrier types: smp_wmb() (writes), smp_rmb() (reads), smp_mb() (full)

Summary Table

PrimitiveCan Sleep?Use CaseContext
atomic_tN/ASimple countersAny
spinlockNoShort critical sectionsAny
spin_lock_irqNoShared with hardirqProcess/softirq
rwlockNoRead-heavy, short holdAny
seqlockNoVery frequent readsAny
mutexYesLong critical sectionsProcess only
semaphoreYesCounting/limitingProcess only
RCUReaders: NoRead-mostly dataReaders: Any
per-cpuN/AAvoid sharingAny

Key Takeaways

  1. Context matters: Know if you can sleep (affects lock choice)
  2. Spin vs sleep: Spin for microseconds, sleep for longer
  3. RCU is magic: Use it for read-mostly data structures
  4. Lock ordering: Document and follow it religiously
  5. Use lockdep: It catches bugs before production
  6. Barriers are subtle: Prefer higher-level primitives when possible

Next Steps