Interrupts are fundamental to how Linux handles hardware events, system calls, and exceptional conditions. Understanding the interrupt subsystem is crucial for debugging performance issues and writing high-performance systems code.The analogy: Imagine you’re a chef cooking an elaborate meal (the running process). A kitchen timer goes off (hardware interrupt) — you must stop what you’re doing, turn off the oven, and then decide: do you handle the hot dish right now (hardirq), or set it aside on the counter to plate later when you have a moment (softirq/workqueue)? The key insight is the same as in the kernel: acknowledge the interrupt immediately, but defer the heavy work. If you spend too long handling the timer, all your other dishes burn.This “top-half / bottom-half” split is the single most important concept in interrupt handling. Get it wrong and you get latency spikes, dropped packets, and unresponsive systems.
Interview Frequency: High (especially for performance-critical roles) Key Topics: IRQ handling, softirqs, tasklets, workqueues, interrupt coalescing Time to Master: 12-14 hours
Every time a network packet arrives, a disk I/O completes, or a timer fires, an interrupt is involved. Understanding interrupts explains:
Why context switches happen: Interrupts can preempt any code
Network performance: Interrupt coalescing and NAPI
CPU affinity effects: IRQ pinning and load balancing
Latency sources: Interrupt storms and processing time
A real-world example: At 10Gbps with 64-byte packets, a NIC can generate over 14 million interrupts per second. If each interrupt takes 5 microseconds of CPU time, that’s 70 seconds of CPU time per second — more than one full core just handling interrupts. This is why NAPI (polling mode) was invented, and why interrupt affinity tuning is a daily task for infrastructure engineers at companies like Cloudflare and Datadog.
The analogy: If hardirqs are the smoke alarm (drop everything, acknowledge immediately), softirqs are the cleanup crew that arrives after the alarm stops. They run with interrupts re-enabled, so new alarms can still ring, but they still cannot take a nap (no sleeping) because they need to be fast. Softirqs are how the kernel processes network packets in bulk, completes block I/O, and runs timer callbacks — all the heavy lifting deferred from the hardirq handler.
// Create workqueues// Bound workqueue (work runs on submitting CPU)struct workqueue_struct *wq = alloc_workqueue("my_wq", WQ_UNBOUND | WQ_MEM_RECLAIM, max_active);// Flags:#define WQ_UNBOUND (1 << 1) // Work can run on any CPU#define WQ_FREEZABLE (1 << 2) // Participate in system suspend#define WQ_MEM_RECLAIM (1 << 3) // Can be used for memory reclaim#define WQ_HIGHPRI (1 << 4) // High priority workers#define WQ_CPU_INTENSIVE (1 << 5) // CPU-intensive work// System workqueuessystem_wq // Default, boundsystem_highpri_wq // High prioritysystem_long_wq // For long-running worksystem_unbound_wq // For CPU-intensive worksystem_freezable_wq // Freezable for suspend
┌─────────────────────────────────────────────────────────────────────────────┐│ DEFERRED WORK DECISION TREE │├─────────────────────────────────────────────────────────────────────────────┤│ ││ Need to defer work from interrupt context? ││ │ ││ └─► Does the work need to sleep? ││ │ ││ ├─ YES ──► Use WORKQUEUE ││ │ • Can sleep (mutex_lock, kmalloc with GFP_KERNEL) ││ │ • Full kernel API available ││ │ • Higher latency (process context switch) ││ │ ││ └─ NO ───► Is it performance-critical networking/block I/O? ││ │ ││ ├─ YES ──► Use SOFTIRQ (if modifying kernel) ││ │ or NAPI (for network drivers) ││ │ • Lowest latency ││ │ • Runs at highest priority ││ │ • Need to handle SMP locking ││ │ ││ └─ NO ───► Use TASKLET ││ • Simpler than softirq ││ • Serialized execution ││ • Good for most driver deferred work ││ │└─────────────────────────────────────────────────────────────────────────────┘
# View current affinity (hex bitmask)cat /proc/irq/121/smp_affinity# 0000000f means CPUs 0-3# Set affinity to CPUs 0 and 1echo 3 > /proc/irq/121/smp_affinity# Using affinity list (human-readable)echo 0-3 > /proc/irq/121/smp_affinity_listecho 0,2,4,6 > /proc/irq/121/smp_affinity_list# Check if irqbalance is running (it may override your settings)systemctl status irqbalance# Disable irqbalance for manual controlsystemctl stop irqbalance
# Detect interrupt storms: watch for rapidly increasing countswatch -d -n 1 cat /proc/interrupts# If a single IRQ line is incrementing by thousands per second,# you likely have a misbehaving device or driver bug# Check if ksoftirqd is consuming excessive CPU# This indicates softirq backlog -- the kernel cannot keep uptop -p $(pgrep -d',' ksoftirqd)# If ksoftirqd is using >50% of a core, investigate which softirq type# Identify which softirq type is dominatingcat /proc/softirqs# NET_RX growing much faster than others → network receive backlog# BLOCK growing fast → storage completion backlog# TIMER growing fast → timer callback overload# Trace hardirq handler duration (find slow handlers)sudo bpftrace -e 'tracepoint:irq:irq_handler_entry { @start[args->irq] = nsecs; }tracepoint:irq:irq_handler_exit /@start[args->irq]/ { $dur = (nsecs - @start[args->irq]) / 1000; if ($dur > 100) { // handlers > 100us are suspicious printf("SLOW IRQ %d: %d us\n", args->irq, $dur); } delete(@start[args->irq]);}'# Check for IRQ affinity imbalance# One CPU handling all interrupts while others are idlecat /proc/interrupts | awk 'NR>1 {for(i=2;i<=NF-3;i++) sum[i]+=$i} END {for(i in sum) print "CPU"i-2": "sum[i]}'
Common Misconception: “Setting IRQ affinity is enough to control interrupt placement.” In practice, irqbalance daemon may override your manual settings. Always check systemctl status irqbalance and stop it if you need manual control. Also, some NIC drivers pin MSI-X interrupts internally, overriding smp_affinity writes.
Usually true, but when softirq backlog is high, they’re offloaded to ksoftirqd kernel threads which can be migrated
”Workqueues are slow”
For deferred work that needs sleeping, workqueues add only a few microseconds of overhead. The alternative (busy-waiting in atomic context) is far worse
”NAPI polling wastes CPU”
NAPI only polls when there are packets. When the queue empties, it automatically re-enables interrupts and stops polling
”Tasklets are the recommended bottom-half mechanism”
Tasklets are actually semi-deprecated in modern kernel development. New code should prefer threaded IRQs or workqueues. Tasklets have problematic semantics (latency-unpredictable, blocks softirq processing for other devices)
“Disabling interrupts with spin_lock_irq is always safe”
Only safe if you know interrupts were enabled before the lock. Use spin_lock_irqsave/spin_unlock_irqrestore in code paths that might be called with interrupts already disabled