Interrupts & Exception Handling
Interrupts are fundamental to how Linux handles hardware events, system calls, and exceptional conditions. Understanding the interrupt subsystem is crucial for debugging performance issues and writing high-performance systems code.
Interview Frequency : High (especially for performance-critical roles)
Key Topics : IRQ handling, softirqs, tasklets, workqueues, interrupt coalescing
Time to Master : 12-14 hours
Why Interrupts Matter
Every time a network packet arrives, a disk I/O completes, or a timer fires, an interrupt is involved. Understanding interrupts explains:
Why context switches happen : Interrupts can preempt any code
Network performance : Interrupt coalescing and NAPI
CPU affinity effects : IRQ pinning and load balancing
Latency sources : Interrupt storms and processing time
Interrupt Architecture
┌─────────────────────────────────────────────────────────────────────────────┐
│ LINUX INTERRUPT ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ HARDWARE LAYER │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌──────┐ ┌─────┐ ││
│ │ │ NIC │ │ Disk│ │Timer│ │ PCIe │ │ USB │ ││
│ │ └──┬──┘ └──┬──┘ └──┬──┘ └───┬──┘ └──┬──┘ ││
│ │ │ │ │ │ │ ││
│ │ └─────────┴────┬────┴──────────┴─────────┘ ││
│ │ ▼ ││
│ │ ┌─────────────────────┐ ││
│ │ │ Interrupt │ Modern: MSI/MSI-X ││
│ │ │ Controller │ (Message Signaled Interrupts) ││
│ │ │ (APIC/GIC) │ ││
│ │ └──────────┬──────────┘ ││
│ └─────────────────────│───────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ CPU │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ ││
│ │ 1. CPU receives interrupt signal ││
│ │ 2. Saves current context (registers, flags) ││
│ │ 3. Looks up handler in IDT (Interrupt Descriptor Table) ││
│ │ 4. Switches to kernel stack ││
│ │ 5. Jumps to interrupt handler ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ KERNEL INTERRUPT HANDLING │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ ││
│ │ ┌───────────────────────────────────────────────────────────────┐ ││
│ │ │ HARDIRQ CONTEXT (interrupts disabled) │ ││
│ │ │ │ ││
│ │ │ • Acknowledge interrupt to hardware │ ││
│ │ │ • Do MINIMAL work (read status, copy data to buffer) │ ││
│ │ │ • Schedule deferred work (softirq, tasklet, workqueue) │ ││
│ │ │ • Return ASAP (microseconds, not milliseconds) │ ││
│ │ │ │ ││
│ │ └────────────────────────────┬──────────────────────────────────┘ ││
│ │ │ ││
│ │ ▼ ││
│ │ ┌───────────────────────────────────────────────────────────────┐ ││
│ │ │ SOFTIRQ CONTEXT (interrupts enabled) │ ││
│ │ │ │ ││
│ │ │ • Process accumulated data (network packets, block I/O) │ ││
│ │ │ • Can be preempted by hardirqs │ ││
│ │ │ • Cannot sleep or block │ ││
│ │ │ │ ││
│ │ └────────────────────────────┬──────────────────────────────────┘ ││
│ │ │ ││
│ │ ▼ ││
│ │ ┌───────────────────────────────────────────────────────────────┐ ││
│ │ │ PROCESS CONTEXT (workqueues) │ ││
│ │ │ │ ││
│ │ │ • Can sleep and block │ ││
│ │ │ • Can allocate memory with GFP_KERNEL │ ││
│ │ │ • Full kernel API available │ ││
│ │ │ │ ││
│ │ └───────────────────────────────────────────────────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Interrupt Types
Exceptions vs Interrupts
Type Cause Synchronous? Examples Exception CPU execution Yes Page fault, divide by zero, syscall Hardware IRQ External device No NIC, disk, timer, keyboard Software Interrupt Explicit instruction Yes int 0x80, syscall
Exception Categories
┌─────────────────────────────────────────────────────────────────────────────┐
│ EXCEPTION TYPES (x86-64) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ FAULTS │
│ ──────── │
│ • Recoverable: execution can continue after handling │
│ • Return address = instruction that caused fault │
│ • Examples: page fault (#PF), segment not present (#NP) │
│ │
│ TRAPS │
│ ────── │
│ • Intentional: used for debugging and system calls │
│ • Return address = next instruction │
│ • Examples: breakpoint (#BP), overflow (#OF), syscall │
│ │
│ ABORTS │
│ ──────── │
│ • Severe: cannot continue execution │
│ • Examples: machine check (#MC), double fault (#DF) │
│ │
│ COMMON EXCEPTION VECTORS │
│ ───────────────────────── │
│ 0: #DE - Divide Error │
│ 3: #BP - Breakpoint │
│ 6: #UD - Invalid Opcode │
│ 8: #DF - Double Fault │
│ 13: #GP - General Protection │
│ 14: #PF - Page Fault │
│ 18: #MC - Machine Check │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Interrupt Descriptor Table (IDT)
The IDT maps interrupt vectors to handlers:
// arch/x86/kernel/idt.c (simplified)
struct idt_data {
unsigned int vector; // Interrupt number (0-255)
unsigned int segment; // Code segment selector
struct idt_bits bits; // Type, DPL, present
const void * addr; // Handler address
};
// IDT entries
static const __initconst struct idt_data early_idts [] = {
INTG (X86_TRAP_DE, asm_exc_divide_error), // #DE
INTG (X86_TRAP_NMI, asm_exc_nmi), // NMI
INTG (X86_TRAP_BP, asm_exc_int3), // #BP
INTG (X86_TRAP_OF, asm_exc_overflow), // #OF
INTG (X86_TRAP_UD, asm_exc_invalid_op), // #UD
INTG (X86_TRAP_DF, asm_exc_double_fault), // #DF
INTG (X86_TRAP_GP, asm_exc_general_protection), // #GP
INTG (X86_TRAP_PF, asm_exc_page_fault), // #PF
// ... more entries
};
// Hardware interrupts (IRQs) start at vector 32
#define FIRST_EXTERNAL_VECTOR 0x 20
# View interrupt statistics
cat /proc/interrupts
# Example output:
# CPU0 CPU1 CPU2 CPU3
# 0: 25 0 0 0 IR-IO-APIC 2-edge timer
# 8: 0 0 0 1 IR-IO-APIC 8-edge rtc0
# 16: 0 0 0 0 IR-IO-APIC 16-fasteoi ehci_hcd
# 120: 0 0 0 0 DMAR-MSI 0-edge dmar0
# 121: 234567 12345 9876 8765 IR-PCI-MSI 512000-edge nvme0q0
# 122: 0 987654 0 0 IR-PCI-MSI 512001-edge nvme0q1
# NMI: 0 0 0 0 Non-maskable interrupts
# LOC: 1234567 1234567 1234567 1234567 Local timer interrupts
# View affinity for a specific IRQ
cat /proc/irq/121/smp_affinity # Hex mask of allowed CPUs
cat /proc/irq/121/smp_affinity_list # Human-readable CPU list
Hardware IRQ Handling
IRQ Handler Registration
// include/linux/interrupt.h
int request_irq ( unsigned int irq ,
irq_handler_t handler ,
unsigned long flags ,
const char * name ,
void * dev_id );
// flags:
#define IRQF_SHARED 0x 00000080 // Multiple devices share IRQ
#define IRQF_TRIGGER_HIGH 0x 00000004 // Level-triggered, active high
#define IRQF_TRIGGER_RISING 0x 00000001 // Edge-triggered, rising
#define IRQF_ONESHOT 0x 00002000 // IRQ disabled until handler completes
// Handler return values
typedef irqreturn_t ( * irq_handler_t )( int irq, void * dev_id);
#define IRQ_NONE ( 0 ) // Interrupt wasn't from this device
#define IRQ_HANDLED ( 1 ) // Interrupt was handled
#define IRQ_WAKE_THREAD ( 2 ) // Wake threaded handler
Example: Network Driver IRQ Handler
// Simplified network driver interrupt handler
static irqreturn_t my_net_irq_handler ( int irq , void * dev_id )
{
struct my_net_device * dev = dev_id;
u32 status;
// Read interrupt status register
status = my_read_reg (dev, INTR_STATUS);
if ( ! (status & MY_INTR_MASK))
return IRQ_NONE; // Not our interrupt (shared IRQ line)
// Acknowledge interrupt to hardware
my_write_reg (dev, INTR_ACK, status);
if (status & INTR_RX_DONE) {
// Disable RX interrupts, schedule NAPI poll
my_write_reg (dev, INTR_DISABLE, INTR_RX_DONE);
napi_schedule ( & dev -> napi ); // Schedule softirq
}
if (status & INTR_TX_DONE) {
// TX completion - can do minimal work here
tasklet_schedule ( & dev -> tx_tasklet );
}
return IRQ_HANDLED;
}
Softirqs: High-Priority Deferred Work
Softirq Types
// include/linux/interrupt.h
enum {
HI_SOFTIRQ = 0 , // High-priority tasklets
TIMER_SOFTIRQ, // Timer processing
NET_TX_SOFTIRQ, // Network transmit
NET_RX_SOFTIRQ, // Network receive
BLOCK_SOFTIRQ, // Block device completion
IRQ_POLL_SOFTIRQ, // IRQ polling
TASKLET_SOFTIRQ, // Regular tasklets
SCHED_SOFTIRQ, // Scheduler load balancing
HRTIMER_SOFTIRQ, // High-resolution timers
RCU_SOFTIRQ, // RCU callbacks
NR_SOFTIRQS
};
Softirq Properties
┌─────────────────────────────────────────────────────────────────────────────┐
│ SOFTIRQ CHARACTERISTICS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Properties: │
│ • Fixed number (10 currently, compile-time) │
│ • Run with interrupts enabled │
│ • Cannot sleep or block │
│ • Same softirq can run simultaneously on different CPUs │
│ • Must use appropriate locking │
│ │
│ Execution Points: │
│ • After hardirq handler returns │
│ • When explicitly enabled (local_bh_enable()) │
│ • By ksoftirqd kernel threads (when too many pending) │
│ │
│ Priority Order: │
│ HI_SOFTIRQ > TIMER > NET_TX > NET_RX > BLOCK > ... > RCU │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Viewing Softirq Activity
# View softirq statistics per CPU
cat /proc/softirqs
# CPU0 CPU1 CPU2 CPU3
# HI: 0 0 0 0
# TIMER: 12345678 12345678 12345678 12345678
# NET_TX: 12345 12345 12345 12345
# NET_RX: 98765432 1234567 123456 12345
# BLOCK: 123456 234567 345678 456789
# IRQ_POLL: 0 0 0 0
# TASKLET: 1234 2345 3456 4567
# SCHED: 1234567 1234567 1234567 1234567
# HRTIMER: 123456 123456 123456 123456
# RCU: 1234567 1234567 1234567 1234567
# Watch ksoftirqd CPU usage
top -p $( pgrep -d ',' ksoftirqd )
Tasklets: Dynamic Deferred Work
Tasklets are built on top of softirqs but more flexible:
// Tasklet declaration
struct tasklet_struct {
struct tasklet_struct * next;
unsigned long state; // TASKLET_STATE_SCHED, TASKLET_STATE_RUN
atomic_t count; // Disable count
void ( * func)( unsigned long );
unsigned long data;
};
// Static initialization
DECLARE_TASKLET (my_tasklet, my_tasklet_handler, data);
DECLARE_TASKLET_DISABLED (my_tasklet, my_tasklet_handler, data);
// Dynamic initialization
tasklet_init ( & my_tasklet , my_tasklet_handler, data);
// Schedule for execution
tasklet_schedule ( & my_tasklet ); // Normal priority (TASKLET_SOFTIRQ)
tasklet_hi_schedule ( & my_tasklet ); // High priority (HI_SOFTIRQ)
// Control
tasklet_disable ( & my_tasklet ); // Prevent execution
tasklet_enable ( & my_tasklet ); // Re-enable
tasklet_kill ( & my_tasklet ); // Remove (waits if running)
Tasklet vs Softirq
Feature Softirq Tasklet Concurrency Same softirq can run on multiple CPUs Same tasklet runs on only one CPU at a time Definition Static (compile-time) Dynamic (runtime) Locking Must handle SMP yourself Serialized per-tasklet Use case High-frequency, performance-critical General deferred work
Workqueues: Process Context Deferred Work
When you need to sleep or allocate memory:
// Create work item
struct work_struct my_work;
INIT_WORK ( & my_work , my_work_handler);
// Work handler
static void my_work_handler ( struct work_struct * work )
{
// Can sleep, allocate memory, etc.
struct my_device * dev = container_of (work, struct my_device, work);
// Do heavy processing
process_data (dev);
// Allocate memory (can sleep)
void * buf = kmalloc ( 4096 , GFP_KERNEL);
// Call functions that might sleep
mutex_lock ( & dev -> lock );
// ...
mutex_unlock ( & dev -> lock );
}
// Schedule work
schedule_work ( & my_work ); // Global workqueue
queue_work (my_workqueue, & my_work ); // Custom workqueue
schedule_delayed_work ( & my_dwork , HZ * 5 ); // Delayed by 5 seconds
Workqueue Types
// Create workqueues
// Bound workqueue (work runs on submitting CPU)
struct workqueue_struct * wq = alloc_workqueue ( "my_wq" ,
WQ_UNBOUND | WQ_MEM_RECLAIM, max_active);
// Flags:
#define WQ_UNBOUND ( 1 << 1 ) // Work can run on any CPU
#define WQ_FREEZABLE ( 1 << 2 ) // Participate in system suspend
#define WQ_MEM_RECLAIM ( 1 << 3 ) // Can be used for memory reclaim
#define WQ_HIGHPRI ( 1 << 4 ) // High priority workers
#define WQ_CPU_INTENSIVE ( 1 << 5 ) // CPU-intensive work
// System workqueues
system_wq // Default, bound
system_highpri_wq // High priority
system_long_wq // For long-running work
system_unbound_wq // For CPU-intensive work
system_freezable_wq // Freezable for suspend
Concurrency Managed Workqueue (cmwq)
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONCURRENCY MANAGED WORKQUEUE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Work Items │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ W1 │ │ W2 │ │ W3 │ │ W4 │ │ W5 │ │
│ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ │
│ │ │ │ │ │ │
│ └───────┴───────┼───────┴───────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Per-CPU Worker Pools ││
│ │ ││
│ │ CPU 0 Pool CPU 1 Pool CPU 2 Pool ││
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ ││
│ │ │ Worker threads: │ │ Worker threads: │ │ Worker threads: │ ││
│ │ │ [kworker/0:0] │ │ [kworker/1:0] │ │ [kworker/2:0] │ ││
│ │ │ [kworker/0:1] │ │ [kworker/1:1] │ │ [kworker/2:1] │ ││
│ │ │ (dynamic) │ │ (dynamic) │ │ (dynamic) │ ││
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
│ Key Properties: │
│ • Workers created/destroyed dynamically based on load │
│ • Work items queued to per-CPU pools for locality │
│ • Automatic concurrency management (avoids thundering herd) │
│ • WQ_UNBOUND work uses separate unbound pools │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Choosing the Right Deferred Work
┌─────────────────────────────────────────────────────────────────────────────┐
│ DEFERRED WORK DECISION TREE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Need to defer work from interrupt context? │
│ │ │
│ └─► Does the work need to sleep? │
│ │ │
│ ├─ YES ──► Use WORKQUEUE │
│ │ • Can sleep (mutex_lock, kmalloc with GFP_KERNEL) │
│ │ • Full kernel API available │
│ │ • Higher latency (process context switch) │
│ │ │
│ └─ NO ───► Is it performance-critical networking/block I/O? │
│ │ │
│ ├─ YES ──► Use SOFTIRQ (if modifying kernel) │
│ │ or NAPI (for network drivers) │
│ │ • Lowest latency │
│ │ • Runs at highest priority │
│ │ • Need to handle SMP locking │
│ │ │
│ └─ NO ───► Use TASKLET │
│ • Simpler than softirq │
│ • Serialized execution │
│ • Good for most driver deferred work │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Setting IRQ Affinity
# View current affinity (hex bitmask)
cat /proc/irq/121/smp_affinity
# 0000000f means CPUs 0-3
# Set affinity to CPUs 0 and 1
echo 3 > /proc/irq/121/smp_affinity
# Using affinity list (human-readable)
echo 0-3 > /proc/irq/121/smp_affinity_list
echo 0,2,4,6 > /proc/irq/121/smp_affinity_list
# Check if irqbalance is running (it may override your settings)
systemctl status irqbalance
# Disable irqbalance for manual control
systemctl stop irqbalance
┌─────────────────────────────────────────────────────────────────────────────┐
│ IRQ AFFINITY STRATEGIES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. NETWORK-INTENSIVE WORKLOADS │
│ ──────────────────────────── │
│ • Pin NIC IRQs to dedicated CPUs │
│ • Use RPS/RFS for receive steering │
│ • Enable XPS for transmit steering │
│ • Consider isolcpus for application CPUs │
│ │
│ Example (4-queue NIC, 8 CPUs): │
│ IRQ 121 (rx-0) → CPU 0 │
│ IRQ 122 (rx-1) → CPU 1 │
│ IRQ 123 (rx-2) → CPU 2 │
│ IRQ 124 (rx-3) → CPU 3 │
│ CPUs 4-7 → Application threads │
│ │
│ 2. LATENCY-SENSITIVE WORKLOADS │
│ ────────────────────────── │
│ • Isolate CPUs for application (isolcpus=) │
│ • Pin all IRQs to housekeeping CPUs │
│ • Use nohz_full for tickless operation │
│ • Consider disabling irqbalance │
│ │
│ 3. NUMA-AWARE AFFINITY │
│ ─────────────────── │
│ • Pin IRQs to same NUMA node as device │
│ • Avoid cross-NUMA memory access │
│ • Check: cat /sys/class/net/eth0/device/numa_node │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Measuring Interrupt Latency
# Using cyclictest for latency measurement
sudo cyclictest -p 80 -t1 -n -i 1000 -l 10000
# -p 80: priority
# -t1: 1 thread
# -n: use nanosleep
# -i 1000: 1ms interval
# -l 10000: 10000 loops
# Using perf for interrupt latency
sudo perf sched latency
# Using bpftrace
sudo bpftrace -e '
tracepoint:irq:irq_handler_entry {
@start[args->irq] = nsecs;
}
tracepoint:irq:irq_handler_exit /@start[args->irq]/ {
@latency_ns = hist(nsecs - @start[args->irq]);
delete(@start[args->irq]);
}'
NAPI: Network Interrupt Coalescing
NAPI (New API) reduces interrupt overhead for high-throughput networking:
┌─────────────────────────────────────────────────────────────────────────────┐
│ NAPI MECHANISM │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Traditional (interrupt per packet): │
│ ────────────────────────────────── │
│ Packet 1 → IRQ → Handle → Return │
│ Packet 2 → IRQ → Handle → Return │
│ Packet 3 → IRQ → Handle → Return │
│ ... (high CPU overhead) │
│ │
│ NAPI (polling mode): │
│ ───────────────────── │
│ Packet 1 → IRQ → Disable IRQ → Schedule NAPI │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────┐ │
│ │ NAPI Poll Loop │ │
│ │ │ │
│ │ while (budget > 0) { │ │
│ │ packet = poll_device(); │ │
│ │ if (!packet) break; │ │
│ │ process(packet); │ │
│ │ budget--; │ │
│ │ } │ │
│ │ │ │
│ │ if (budget > 0) │ │
│ │ napi_complete(); // Re-enable │ │
│ │ enable_irq(); // IRQ │ │
│ │ else │ │
│ │ reschedule(); // More work │ │
│ │ │ │
│ └─────────────────────────────────────┘ │
│ │
│ Benefits: │
│ • Reduced interrupt overhead │
│ • Better CPU cache utilization │
│ • Back-pressure mechanism (budget) │
│ • Automatic adaptation to load │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
NAPI API
// Initialize NAPI
netif_napi_add (netdev, & dev -> napi , my_poll, NAPI_POLL_WEIGHT);
napi_enable ( & dev -> napi );
// In IRQ handler
static irqreturn_t my_irq_handler ( int irq , void * dev_id )
{
struct my_device * dev = dev_id;
// Disable interrupts and schedule NAPI
if ( napi_schedule_prep ( & dev -> napi )) {
disable_irq (dev);
__napi_schedule ( & dev -> napi );
}
return IRQ_HANDLED;
}
// Poll function
static int my_poll ( struct napi_struct * napi , int budget )
{
struct my_device * dev = container_of (napi, struct my_device, napi);
int work_done = 0 ;
while (work_done < budget) {
struct sk_buff * skb = get_next_packet (dev);
if ( ! skb)
break ;
// Process packet
napi_gro_receive (napi, skb);
work_done ++ ;
}
if (work_done < budget) {
napi_complete_done (napi, work_done);
enable_irq (dev); // Re-enable interrupts
}
return work_done;
}
Threaded IRQs
For handlers that need more flexibility:
// Request threaded IRQ
int request_threaded_irq ( unsigned int irq ,
irq_handler_t handler , // Hardirq handler
irq_handler_t thread_fn , // Threaded handler
unsigned long flags ,
const char * name ,
void * dev_id );
// Example
static irqreturn_t my_hardirq_handler ( int irq , void * dev_id )
{
// Quick check: is this our interrupt?
if ( ! check_interrupt_pending (dev_id))
return IRQ_NONE;
// Acknowledge hardware
ack_interrupt (dev_id);
// Wake threaded handler
return IRQ_WAKE_THREAD;
}
static irqreturn_t my_threaded_handler ( int irq , void * dev_id )
{
// Can sleep here!
mutex_lock ( & dev_lock);
process_data (dev_id);
mutex_unlock ( & dev_lock);
return IRQ_HANDLED;
}
// Registration
request_threaded_irq (irq, my_hardirq_handler, my_threaded_handler,
IRQF_ONESHOT, "my_device" , dev);
Interview Questions
Q: Explain the difference between hardirq and softirq context
Answer :Hardirq context (top half):
Runs with interrupts disabled on local CPU
Must be extremely fast (microseconds)
Cannot sleep or allocate memory with GFP_KERNEL
Preempts everything including kernel code
Softirq context (bottom half):
Runs with interrupts enabled
Can be preempted by hardirqs
Still cannot sleep (atomic context)
Used for deferred processing (networking, block I/O)
Key insight : Split processing into minimal hardirq work (acknowledge, disable, schedule) and heavier softirq work (process data).
Q: When would you use a workqueue vs a tasklet?
Answer :Use workqueue when :
You need to sleep (mutex, blocking I/O)
You need to allocate memory with GFP_KERNEL
The work is not time-critical
You need to call functions that might block
Use tasklet when :
Work must be done quickly after interrupt
You don’t need to sleep
Work is small and fast
You want serialization (same tasklet won’t run concurrently)
Example : Network driver TX completion → tasklet (fast, no sleeping). Firmware loading → workqueue (needs file I/O, can sleep).
Q: How does NAPI improve network performance?
Q: How would you debug an interrupt storm?
Answer :Symptoms :
High CPU usage in si (softirq) or hi (hardirq)
System unresponsive
/proc/interrupts shows rapidly increasing counts
Debugging steps :# 1. Identify which IRQ
watch -n 1 cat /proc/interrupts
# 2. Check which device
cat /proc/irq/ < irq_nu m > /smp_affinity_list
ls -la /proc/irq/ < irq_nu m > /
# 3. Monitor interrupt rate
perf stat -e irq:irq_handler_entry -a sleep 1
# 4. Trace specific IRQ
sudo bpftrace -e '
tracepoint:irq:irq_handler_entry /args->irq == 121/ {
@count = count();
}'
Common causes :
Faulty hardware generating spurious interrupts
Driver bug not acknowledging interrupt properly
Shared IRQ with misbehaving device
Misconfigured interrupt coalescing
Practice Exercises
Interrupt Statistics
Write a script that monitors /proc/interrupts and alerts when any IRQ rate exceeds a threshold
Affinity Configuration
Set up IRQ affinity for a multi-queue NIC to optimize for either throughput or latency
eBPF Tracing
Write a bpftrace script to trace interrupt handler latency and identify slow handlers
Workqueue Analysis
Use workqueue:* tracepoints to analyze work item execution patterns
Summary
Mechanism Context Can Sleep? Use Case Hardirq Interrupt No Acknowledge HW, schedule deferred work Softirq Atomic No High-frequency networking, block I/O Tasklet Atomic No Driver deferred work, serialized Workqueue Process Yes General deferred work, can block Threaded IRQ Process Yes Device handling that needs sleeping
Key Takeaways
Split interrupt handling : Minimal work in hardirq, bulk processing in softirq/workqueue
Choose the right mechanism : Workqueue if you need to sleep, tasklet/softirq otherwise
IRQ affinity matters : Align with NUMA topology and application threads
NAPI is essential : For any high-performance networking
Monitor interrupt rates : High rates indicate potential issues
Next Steps