Interrupts & Exception Handling

Interrupts are fundamental to how Linux handles hardware events, system calls, and exceptional conditions. Understanding the interrupt subsystem is crucial for debugging performance issues and writing high-performance systems code.

Interview Frequency: High (especially for performance-critical roles)
Key Topics: IRQ handling, softirqs, tasklets, workqueues, interrupt coalescing
Time to Master: 12-14 hours

Why Interrupts Matter

Every time a network packet arrives, a disk I/O completes, or a timer fires, an interrupt is involved. Understanding interrupts explains:

Why context switches happen: Interrupts can preempt any code
Network performance: Interrupt coalescing and NAPI
CPU affinity effects: IRQ pinning and load balancing
Latency sources: Interrupt storms and processing time

Interrupt Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    LINUX INTERRUPT ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  HARDWARE LAYER                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │  ┌─────┐   ┌─────┐   ┌─────┐   ┌──────┐   ┌─────┐                      ││
│  │  │ NIC │   │ Disk│   │Timer│   │ PCIe │   │ USB │                      ││
│  │  └──┬──┘   └──┬──┘   └──┬──┘   └───┬──┘   └──┬──┘                      ││
│  │     │         │         │          │         │                          ││
│  │     └─────────┴────┬────┴──────────┴─────────┘                          ││
│  │                    ▼                                                     ││
│  │          ┌─────────────────────┐                                        ││
│  │          │   Interrupt         │    Modern: MSI/MSI-X                   ││
│  │          │   Controller        │    (Message Signaled Interrupts)       ││
│  │          │   (APIC/GIC)        │                                        ││
│  │          └──────────┬──────────┘                                        ││
│  └─────────────────────│───────────────────────────────────────────────────┘│
│                        │                                                     │
│                        ▼                                                     │
│  CPU                                                                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                                                                          ││
│  │  1. CPU receives interrupt signal                                       ││
│  │  2. Saves current context (registers, flags)                           ││
│  │  3. Looks up handler in IDT (Interrupt Descriptor Table)               ││
│  │  4. Switches to kernel stack                                            ││
│  │  5. Jumps to interrupt handler                                          ││
│  │                                                                          ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│                        │                                                     │
│                        ▼                                                     │
│  KERNEL INTERRUPT HANDLING                                                   │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                                                                          ││
│  │  ┌───────────────────────────────────────────────────────────────┐      ││
│  │  │           HARDIRQ CONTEXT (interrupts disabled)               │      ││
│  │  │                                                               │      ││
│  │  │  • Acknowledge interrupt to hardware                          │      ││
│  │  │  • Do MINIMAL work (read status, copy data to buffer)        │      ││
│  │  │  • Schedule deferred work (softirq, tasklet, workqueue)      │      ││
│  │  │  • Return ASAP (microseconds, not milliseconds)              │      ││
│  │  │                                                               │      ││
│  │  └────────────────────────────┬──────────────────────────────────┘      ││
│  │                               │                                          ││
│  │                               ▼                                          ││
│  │  ┌───────────────────────────────────────────────────────────────┐      ││
│  │  │          SOFTIRQ CONTEXT (interrupts enabled)                  │      ││
│  │  │                                                               │      ││
│  │  │  • Process accumulated data (network packets, block I/O)     │      ││
│  │  │  • Can be preempted by hardirqs                              │      ││
│  │  │  • Cannot sleep or block                                     │      ││
│  │  │                                                               │      ││
│  │  └────────────────────────────┬──────────────────────────────────┘      ││
│  │                               │                                          ││
│  │                               ▼                                          ││
│  │  ┌───────────────────────────────────────────────────────────────┐      ││
│  │  │           PROCESS CONTEXT (workqueues)                         │      ││
│  │  │                                                               │      ││
│  │  │  • Can sleep and block                                       │      ││
│  │  │  • Can allocate memory with GFP_KERNEL                       │      ││
│  │  │  • Full kernel API available                                 │      ││
│  │  │                                                               │      ││
│  │  └───────────────────────────────────────────────────────────────┘      ││
│  │                                                                          ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Interrupt Types

Exceptions vs Interrupts

Type	Cause	Synchronous?	Examples
Exception	CPU execution	Yes	Page fault, divide by zero, syscall
Hardware IRQ	External device	No	NIC, disk, timer, keyboard
Software Interrupt	Explicit instruction	Yes	`int 0x80`, `syscall`

Exception Categories

┌─────────────────────────────────────────────────────────────────────────────┐
│                         EXCEPTION TYPES (x86-64)                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  FAULTS                                                                      │
│  ────────                                                                   │
│  • Recoverable: execution can continue after handling                       │
│  • Return address = instruction that caused fault                           │
│  • Examples: page fault (#PF), segment not present (#NP)                   │
│                                                                              │
│  TRAPS                                                                       │
│  ──────                                                                     │
│  • Intentional: used for debugging and system calls                        │
│  • Return address = next instruction                                        │
│  • Examples: breakpoint (#BP), overflow (#OF), syscall                     │
│                                                                              │
│  ABORTS                                                                      │
│  ────────                                                                   │
│  • Severe: cannot continue execution                                        │
│  • Examples: machine check (#MC), double fault (#DF)                       │
│                                                                              │
│  COMMON EXCEPTION VECTORS                                                    │
│  ─────────────────────────                                                  │
│  0:  #DE - Divide Error                                                     │
│  3:  #BP - Breakpoint                                                       │
│  6:  #UD - Invalid Opcode                                                   │
│  8:  #DF - Double Fault                                                     │
│  13: #GP - General Protection                                               │
│  14: #PF - Page Fault                                                       │
│  18: #MC - Machine Check                                                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Interrupt Descriptor Table (IDT)

The IDT maps interrupt vectors to handlers:

// arch/x86/kernel/idt.c (simplified)
struct idt_data {
    unsigned int    vector;      // Interrupt number (0-255)
    unsigned int    segment;     // Code segment selector
    struct idt_bits bits;        // Type, DPL, present
    const void      *addr;       // Handler address
};

// IDT entries
static const __initconst struct idt_data early_idts[] = {
    INTG(X86_TRAP_DE,     asm_exc_divide_error),       // #DE
    INTG(X86_TRAP_NMI,    asm_exc_nmi),                // NMI
    INTG(X86_TRAP_BP,     asm_exc_int3),               // #BP
    INTG(X86_TRAP_OF,     asm_exc_overflow),           // #OF
    INTG(X86_TRAP_UD,     asm_exc_invalid_op),         // #UD
    INTG(X86_TRAP_DF,     asm_exc_double_fault),       // #DF
    INTG(X86_TRAP_GP,     asm_exc_general_protection), // #GP
    INTG(X86_TRAP_PF,     asm_exc_page_fault),         // #PF
    // ... more entries
};

// Hardware interrupts (IRQs) start at vector 32
#define FIRST_EXTERNAL_VECTOR 0x20

Viewing IDT Information

# View interrupt statistics
cat /proc/interrupts

# Example output:
#            CPU0       CPU1       CPU2       CPU3
#   0:         25          0          0          0  IR-IO-APIC   2-edge      timer
#   8:          0          0          0          1  IR-IO-APIC   8-edge      rtc0
#  16:          0          0          0          0  IR-IO-APIC  16-fasteoi   ehci_hcd
# 120:          0          0          0          0  DMAR-MSI    0-edge       dmar0
# 121:     234567      12345       9876       8765  IR-PCI-MSI 512000-edge  nvme0q0
# 122:          0     987654          0          0  IR-PCI-MSI 512001-edge  nvme0q1
# NMI:          0          0          0          0   Non-maskable interrupts
# LOC:    1234567    1234567    1234567    1234567   Local timer interrupts

# View affinity for a specific IRQ
cat /proc/irq/121/smp_affinity      # Hex mask of allowed CPUs
cat /proc/irq/121/smp_affinity_list # Human-readable CPU list

Hardware IRQ Handling

IRQ Handler Registration

// include/linux/interrupt.h
int request_irq(unsigned int irq,
                irq_handler_t handler,
                unsigned long flags,
                const char *name,
                void *dev_id);

// flags:
#define IRQF_SHARED         0x00000080  // Multiple devices share IRQ
#define IRQF_TRIGGER_HIGH   0x00000004  // Level-triggered, active high
#define IRQF_TRIGGER_RISING 0x00000001  // Edge-triggered, rising
#define IRQF_ONESHOT        0x00002000  // IRQ disabled until handler completes

// Handler return values
typedef irqreturn_t (*irq_handler_t)(int irq, void *dev_id);

#define IRQ_NONE      (0)  // Interrupt wasn't from this device
#define IRQ_HANDLED   (1)  // Interrupt was handled
#define IRQ_WAKE_THREAD (2)  // Wake threaded handler

Example: Network Driver IRQ Handler

// Simplified network driver interrupt handler
static irqreturn_t my_net_irq_handler(int irq, void *dev_id)
{
    struct my_net_device *dev = dev_id;
    u32 status;
    
    // Read interrupt status register
    status = my_read_reg(dev, INTR_STATUS);
    
    if (!(status & MY_INTR_MASK))
        return IRQ_NONE;  // Not our interrupt (shared IRQ line)
    
    // Acknowledge interrupt to hardware
    my_write_reg(dev, INTR_ACK, status);
    
    if (status & INTR_RX_DONE) {
        // Disable RX interrupts, schedule NAPI poll
        my_write_reg(dev, INTR_DISABLE, INTR_RX_DONE);
        napi_schedule(&dev->napi);  // Schedule softirq
    }
    
    if (status & INTR_TX_DONE) {
        // TX completion - can do minimal work here
        tasklet_schedule(&dev->tx_tasklet);
    }
    
    return IRQ_HANDLED;
}

Softirqs: High-Priority Deferred Work

Softirq Types

// include/linux/interrupt.h
enum {
    HI_SOFTIRQ = 0,        // High-priority tasklets
    TIMER_SOFTIRQ,         // Timer processing
    NET_TX_SOFTIRQ,        // Network transmit
    NET_RX_SOFTIRQ,        // Network receive
    BLOCK_SOFTIRQ,         // Block device completion
    IRQ_POLL_SOFTIRQ,      // IRQ polling
    TASKLET_SOFTIRQ,       // Regular tasklets
    SCHED_SOFTIRQ,         // Scheduler load balancing
    HRTIMER_SOFTIRQ,       // High-resolution timers
    RCU_SOFTIRQ,           // RCU callbacks
    NR_SOFTIRQS
};

Softirq Properties

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SOFTIRQ CHARACTERISTICS                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Properties:                                                                 │
│  • Fixed number (10 currently, compile-time)                                │
│  • Run with interrupts enabled                                              │
│  • Cannot sleep or block                                                    │
│  • Same softirq can run simultaneously on different CPUs                    │
│  • Must use appropriate locking                                             │
│                                                                              │
│  Execution Points:                                                           │
│  • After hardirq handler returns                                            │
│  • When explicitly enabled (local_bh_enable())                              │
│  • By ksoftirqd kernel threads (when too many pending)                      │
│                                                                              │
│  Priority Order:                                                             │
│  HI_SOFTIRQ > TIMER > NET_TX > NET_RX > BLOCK > ... > RCU                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Viewing Softirq Activity

# View softirq statistics per CPU
cat /proc/softirqs
#                    CPU0       CPU1       CPU2       CPU3
#          HI:          0          0          0          0
#       TIMER:   12345678   12345678   12345678   12345678
#      NET_TX:      12345      12345      12345      12345
#      NET_RX:   98765432    1234567     123456      12345
#       BLOCK:     123456     234567     345678     456789
#    IRQ_POLL:          0          0          0          0
#     TASKLET:       1234       2345       3456       4567
#       SCHED:    1234567    1234567    1234567    1234567
#     HRTIMER:     123456     123456     123456     123456
#         RCU:    1234567    1234567    1234567    1234567

# Watch ksoftirqd CPU usage
top -p $(pgrep -d',' ksoftirqd)

Tasklets: Dynamic Deferred Work

Tasklets are built on top of softirqs but more flexible:

// Tasklet declaration
struct tasklet_struct {
    struct tasklet_struct *next;
    unsigned long state;      // TASKLET_STATE_SCHED, TASKLET_STATE_RUN
    atomic_t count;           // Disable count
    void (*func)(unsigned long);
    unsigned long data;
};

// Static initialization
DECLARE_TASKLET(my_tasklet, my_tasklet_handler, data);
DECLARE_TASKLET_DISABLED(my_tasklet, my_tasklet_handler, data);

// Dynamic initialization
tasklet_init(&my_tasklet, my_tasklet_handler, data);

// Schedule for execution
tasklet_schedule(&my_tasklet);     // Normal priority (TASKLET_SOFTIRQ)
tasklet_hi_schedule(&my_tasklet);  // High priority (HI_SOFTIRQ)

// Control
tasklet_disable(&my_tasklet);  // Prevent execution
tasklet_enable(&my_tasklet);   // Re-enable
tasklet_kill(&my_tasklet);     // Remove (waits if running)

Tasklet vs Softirq

Feature	Softirq	Tasklet
Concurrency	Same softirq can run on multiple CPUs	Same tasklet runs on only one CPU at a time
Definition	Static (compile-time)	Dynamic (runtime)
Locking	Must handle SMP yourself	Serialized per-tasklet
Use case	High-frequency, performance-critical	General deferred work

Workqueues: Process Context Deferred Work

When you need to sleep or allocate memory:

// Create work item
struct work_struct my_work;
INIT_WORK(&my_work, my_work_handler);

// Work handler
static void my_work_handler(struct work_struct *work)
{
    // Can sleep, allocate memory, etc.
    struct my_device *dev = container_of(work, struct my_device, work);
    
    // Do heavy processing
    process_data(dev);
    
    // Allocate memory (can sleep)
    void *buf = kmalloc(4096, GFP_KERNEL);
    
    // Call functions that might sleep
    mutex_lock(&dev->lock);
    // ...
    mutex_unlock(&dev->lock);
}

// Schedule work
schedule_work(&my_work);                    // Global workqueue
queue_work(my_workqueue, &my_work);         // Custom workqueue
schedule_delayed_work(&my_dwork, HZ * 5);   // Delayed by 5 seconds

Workqueue Types

// Create workqueues
// Bound workqueue (work runs on submitting CPU)
struct workqueue_struct *wq = alloc_workqueue("my_wq",
    WQ_UNBOUND | WQ_MEM_RECLAIM, max_active);

// Flags:
#define WQ_UNBOUND       (1 << 1)   // Work can run on any CPU
#define WQ_FREEZABLE     (1 << 2)   // Participate in system suspend
#define WQ_MEM_RECLAIM   (1 << 3)   // Can be used for memory reclaim
#define WQ_HIGHPRI       (1 << 4)   // High priority workers
#define WQ_CPU_INTENSIVE (1 << 5)   // CPU-intensive work

// System workqueues
system_wq              // Default, bound
system_highpri_wq      // High priority
system_long_wq         // For long-running work
system_unbound_wq      // For CPU-intensive work
system_freezable_wq    // Freezable for suspend

Concurrency Managed Workqueue (cmwq)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONCURRENCY MANAGED WORKQUEUE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Work Items                                                                  │
│  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐                                   │
│  │ W1  │ │ W2  │ │ W3  │ │ W4  │ │ W5  │                                   │
│  └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘ └──┬──┘                                   │
│     │       │       │       │       │                                        │
│     └───────┴───────┼───────┴───────┘                                        │
│                     │                                                         │
│                     ▼                                                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                    Per-CPU Worker Pools                                  ││
│  │                                                                          ││
│  │  CPU 0 Pool              CPU 1 Pool              CPU 2 Pool             ││
│  │  ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐    ││
│  │  │ Worker threads: │     │ Worker threads: │     │ Worker threads: │    ││
│  │  │ [kworker/0:0]  │     │ [kworker/1:0]  │     │ [kworker/2:0]  │    ││
│  │  │ [kworker/0:1]  │     │ [kworker/1:1]  │     │ [kworker/2:1]  │    ││
│  │  │ (dynamic)      │     │ (dynamic)      │     │ (dynamic)      │    ││
│  │  └─────────────────┘     └─────────────────┘     └─────────────────┘    ││
│  │                                                                          ││
│  └─────────────────────────────────────────────────────────────────────────┘│
│                                                                              │
│  Key Properties:                                                             │
│  • Workers created/destroyed dynamically based on load                       │
│  • Work items queued to per-CPU pools for locality                          │
│  • Automatic concurrency management (avoids thundering herd)                │
│  • WQ_UNBOUND work uses separate unbound pools                              │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Choosing the Right Deferred Work

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DEFERRED WORK DECISION TREE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Need to defer work from interrupt context?                                  │
│  │                                                                           │
│  └─► Does the work need to sleep?                                           │
│      │                                                                       │
│      ├─ YES ──► Use WORKQUEUE                                               │
│      │          • Can sleep (mutex_lock, kmalloc with GFP_KERNEL)           │
│      │          • Full kernel API available                                 │
│      │          • Higher latency (process context switch)                   │
│      │                                                                       │
│      └─ NO ───► Is it performance-critical networking/block I/O?            │
│                 │                                                            │
│                 ├─ YES ──► Use SOFTIRQ (if modifying kernel)               │
│                 │          or NAPI (for network drivers)                    │
│                 │          • Lowest latency                                 │
│                 │          • Runs at highest priority                       │
│                 │          • Need to handle SMP locking                     │
│                 │                                                            │
│                 └─ NO ───► Use TASKLET                                      │
│                            • Simpler than softirq                           │
│                            • Serialized execution                           │
│                            • Good for most driver deferred work             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Interrupt Affinity and Performance

Setting IRQ Affinity

# View current affinity (hex bitmask)
cat /proc/irq/121/smp_affinity
# 0000000f means CPUs 0-3

# Set affinity to CPUs 0 and 1
echo 3 > /proc/irq/121/smp_affinity

# Using affinity list (human-readable)
echo 0-3 > /proc/irq/121/smp_affinity_list
echo 0,2,4,6 > /proc/irq/121/smp_affinity_list

# Check if irqbalance is running (it may override your settings)
systemctl status irqbalance

# Disable irqbalance for manual control
systemctl stop irqbalance

Performance Tuning Patterns

┌─────────────────────────────────────────────────────────────────────────────┐
│                    IRQ AFFINITY STRATEGIES                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. NETWORK-INTENSIVE WORKLOADS                                              │
│     ────────────────────────────                                            │
│     • Pin NIC IRQs to dedicated CPUs                                        │
│     • Use RPS/RFS for receive steering                                      │
│     • Enable XPS for transmit steering                                      │
│     • Consider isolcpus for application CPUs                                │
│                                                                              │
│     Example (4-queue NIC, 8 CPUs):                                          │
│     IRQ 121 (rx-0) → CPU 0                                                  │
│     IRQ 122 (rx-1) → CPU 1                                                  │
│     IRQ 123 (rx-2) → CPU 2                                                  │
│     IRQ 124 (rx-3) → CPU 3                                                  │
│     CPUs 4-7 → Application threads                                          │
│                                                                              │
│  2. LATENCY-SENSITIVE WORKLOADS                                              │
│     ──────────────────────────                                              │
│     • Isolate CPUs for application (isolcpus=)                              │
│     • Pin all IRQs to housekeeping CPUs                                     │
│     • Use nohz_full for tickless operation                                  │
│     • Consider disabling irqbalance                                         │
│                                                                              │
│  3. NUMA-AWARE AFFINITY                                                      │
│     ───────────────────                                                     │
│     • Pin IRQs to same NUMA node as device                                  │
│     • Avoid cross-NUMA memory access                                        │
│     • Check: cat /sys/class/net/eth0/device/numa_node                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Measuring Interrupt Latency

# Using cyclictest for latency measurement
sudo cyclictest -p 80 -t1 -n -i 1000 -l 10000
# -p 80: priority
# -t1: 1 thread
# -n: use nanosleep
# -i 1000: 1ms interval
# -l 10000: 10000 loops

# Using perf for interrupt latency
sudo perf sched latency

# Using bpftrace
sudo bpftrace -e '
tracepoint:irq:irq_handler_entry {
    @start[args->irq] = nsecs;
}
tracepoint:irq:irq_handler_exit /@start[args->irq]/ {
    @latency_ns = hist(nsecs - @start[args->irq]);
    delete(@start[args->irq]);
}'

NAPI: Network Interrupt Coalescing

NAPI (New API) reduces interrupt overhead for high-throughput networking:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         NAPI MECHANISM                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Traditional (interrupt per packet):                                         │
│  ──────────────────────────────────                                         │
│  Packet 1 → IRQ → Handle → Return                                           │
│  Packet 2 → IRQ → Handle → Return                                           │
│  Packet 3 → IRQ → Handle → Return                                           │
│  ... (high CPU overhead)                                                    │
│                                                                              │
│  NAPI (polling mode):                                                        │
│  ─────────────────────                                                      │
│  Packet 1 → IRQ → Disable IRQ → Schedule NAPI                              │
│                                  │                                           │
│                                  ▼                                           │
│                    ┌─────────────────────────────────────┐                  │
│                    │         NAPI Poll Loop              │                  │
│                    │                                     │                  │
│                    │  while (budget > 0) {               │                  │
│                    │      packet = poll_device();        │                  │
│                    │      if (!packet) break;            │                  │
│                    │      process(packet);               │                  │
│                    │      budget--;                      │                  │
│                    │  }                                  │                  │
│                    │                                     │                  │
│                    │  if (budget > 0)                    │                  │
│                    │      napi_complete();  // Re-enable │                  │
│                    │      enable_irq();     // IRQ      │                  │
│                    │  else                               │                  │
│                    │      reschedule();     // More work │                  │
│                    │                                     │                  │
│                    └─────────────────────────────────────┘                  │
│                                                                              │
│  Benefits:                                                                   │
│  • Reduced interrupt overhead                                               │
│  • Better CPU cache utilization                                             │
│  • Back-pressure mechanism (budget)                                         │
│  • Automatic adaptation to load                                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

NAPI API

// Initialize NAPI
netif_napi_add(netdev, &dev->napi, my_poll, NAPI_POLL_WEIGHT);
napi_enable(&dev->napi);

// In IRQ handler
static irqreturn_t my_irq_handler(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    
    // Disable interrupts and schedule NAPI
    if (napi_schedule_prep(&dev->napi)) {
        disable_irq(dev);
        __napi_schedule(&dev->napi);
    }
    return IRQ_HANDLED;
}

// Poll function
static int my_poll(struct napi_struct *napi, int budget)
{
    struct my_device *dev = container_of(napi, struct my_device, napi);
    int work_done = 0;
    
    while (work_done < budget) {
        struct sk_buff *skb = get_next_packet(dev);
        if (!skb)
            break;
        
        // Process packet
        napi_gro_receive(napi, skb);
        work_done++;
    }
    
    if (work_done < budget) {
        napi_complete_done(napi, work_done);
        enable_irq(dev);  // Re-enable interrupts
    }
    
    return work_done;
}

Threaded IRQs

For handlers that need more flexibility:

// Request threaded IRQ
int request_threaded_irq(unsigned int irq,
                         irq_handler_t handler,        // Hardirq handler
                         irq_handler_t thread_fn,      // Threaded handler
                         unsigned long flags,
                         const char *name,
                         void *dev_id);

// Example
static irqreturn_t my_hardirq_handler(int irq, void *dev_id)
{
    // Quick check: is this our interrupt?
    if (!check_interrupt_pending(dev_id))
        return IRQ_NONE;
    
    // Acknowledge hardware
    ack_interrupt(dev_id);
    
    // Wake threaded handler
    return IRQ_WAKE_THREAD;
}

static irqreturn_t my_threaded_handler(int irq, void *dev_id)
{
    // Can sleep here!
    mutex_lock(&dev_lock);
    process_data(dev_id);
    mutex_unlock(&dev_lock);
    
    return IRQ_HANDLED;
}

// Registration
request_threaded_irq(irq, my_hardirq_handler, my_threaded_handler,
                     IRQF_ONESHOT, "my_device", dev);

Interview Questions

Q: Explain the difference between hardirq and softirq context

Answer:Hardirq context (top half):

Runs with interrupts disabled on local CPU
Must be extremely fast (microseconds)
Cannot sleep or allocate memory with GFP_KERNEL
Preempts everything including kernel code

Softirq context (bottom half):

Runs with interrupts enabled
Can be preempted by hardirqs
Still cannot sleep (atomic context)
Used for deferred processing (networking, block I/O)

Key insight: Split processing into minimal hardirq work (acknowledge, disable, schedule) and heavier softirq work (process data).

Q: When would you use a workqueue vs a tasklet?

Answer:Use workqueue when:

You need to sleep (mutex, blocking I/O)
You need to allocate memory with GFP_KERNEL
The work is not time-critical
You need to call functions that might block

Use tasklet when:

Work must be done quickly after interrupt
You don’t need to sleep
Work is small and fast
You want serialization (same tasklet won’t run concurrently)

Example: Network driver TX completion → tasklet (fast, no sleeping). Firmware loading → workqueue (needs file I/O, can sleep).

Q: How does NAPI improve network performance?

Answer:Problem: At high packet rates, per-packet interrupts cause:

High CPU overhead (context switch per packet)
Cache thrashing
Interrupt storms (CPU spends all time in IRQ handlers)

NAPI solution:

First packet triggers interrupt
Disable further interrupts for that queue
Switch to polling mode (softirq)
Process packets in batches (budget-based)
Re-enable interrupts when queue is empty

Benefits:

Amortized interrupt cost across many packets
Better cache locality (process batch together)
Natural back-pressure (stop polling when overwhelmed)
Scales to millions of packets/second

Q: How would you debug an interrupt storm?

Answer:Symptoms:

High CPU usage in si (softirq) or hi (hardirq)
System unresponsive
/proc/interrupts shows rapidly increasing counts

Debugging steps:

# 1. Identify which IRQ
watch -n 1 cat /proc/interrupts

# 2. Check which device
cat /proc/irq/<irq_num>/smp_affinity_list
ls -la /proc/irq/<irq_num>/

# 3. Monitor interrupt rate
perf stat -e irq:irq_handler_entry -a sleep 1

# 4. Trace specific IRQ
sudo bpftrace -e '
tracepoint:irq:irq_handler_entry /args->irq == 121/ {
    @count = count();
}'

Common causes:

Faulty hardware generating spurious interrupts
Driver bug not acknowledging interrupt properly
Shared IRQ with misbehaving device
Misconfigured interrupt coalescing

Practice Exercises

Interrupt Statistics

Write a script that monitors /proc/interrupts and alerts when any IRQ rate exceeds a threshold

Affinity Configuration

Set up IRQ affinity for a multi-queue NIC to optimize for either throughput or latency

eBPF Tracing

Write a bpftrace script to trace interrupt handler latency and identify slow handlers

Workqueue Analysis

Use workqueue:* tracepoints to analyze work item execution patterns

Summary

Mechanism	Context	Can Sleep?	Use Case
Hardirq	Interrupt	No	Acknowledge HW, schedule deferred work
Softirq	Atomic	No	High-frequency networking, block I/O
Tasklet	Atomic	No	Driver deferred work, serialized
Workqueue	Process	Yes	General deferred work, can block
Threaded IRQ	Process	Yes	Device handling that needs sleeping

Key Takeaways

Split interrupt handling: Minimal work in hardirq, bulk processing in softirq/workqueue
Choose the right mechanism: Workqueue if you need to sleep, tasklet/softirq otherwise
IRQ affinity matters: Align with NUMA topology and application threads
NAPI is essential: For any high-performance networking
Monitor interrupt rates: High rates indicate potential issues

Next Steps

Kernel Synchronization → - Locking mechanisms for concurrent access
Networking Stack → - See NAPI in action
eBPF → - Trace interrupt handlers in production

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Interrupts & Exception Handling

​Why Interrupts Matter

​Interrupt Architecture

​Interrupt Types

​Exceptions vs Interrupts

​Exception Categories

​Interrupt Descriptor Table (IDT)

​Viewing IDT Information

​Hardware IRQ Handling

​IRQ Handler Registration

​Example: Network Driver IRQ Handler

​Softirqs: High-Priority Deferred Work

​Softirq Types

​Softirq Properties

​Viewing Softirq Activity

​Tasklets: Dynamic Deferred Work

​Tasklet vs Softirq

​Workqueues: Process Context Deferred Work

​Workqueue Types

​Concurrency Managed Workqueue (cmwq)

​Choosing the Right Deferred Work

​Interrupt Affinity and Performance

​Setting IRQ Affinity

​Performance Tuning Patterns

​Measuring Interrupt Latency

​NAPI: Network Interrupt Coalescing

​NAPI API

​Threaded IRQs

​Interview Questions

​Practice Exercises

​Summary

​Key Takeaways

​Next Steps

Interrupts & Exception Handling

Why Interrupts Matter

Interrupt Architecture

Interrupt Types

Exceptions vs Interrupts

Exception Categories

Interrupt Descriptor Table (IDT)

Viewing IDT Information

Hardware IRQ Handling

IRQ Handler Registration

Example: Network Driver IRQ Handler

Softirqs: High-Priority Deferred Work

Softirq Types

Softirq Properties

Viewing Softirq Activity

Tasklets: Dynamic Deferred Work

Tasklet vs Softirq

Workqueues: Process Context Deferred Work

Workqueue Types

Concurrency Managed Workqueue (cmwq)

Choosing the Right Deferred Work

Interrupt Affinity and Performance

Setting IRQ Affinity

Performance Tuning Patterns

Measuring Interrupt Latency

NAPI: Network Interrupt Coalescing

NAPI API

Threaded IRQs

Interview Questions

Practice Exercises

Summary

Key Takeaways

Next Steps