I/O Systems
The I/O subsystem bridges the gap between software and hardware devices. Understanding I/O is crucial for performance optimization and is frequently tested in systems interviews.Interview Frequency: Medium-High
Key Topics: DMA, interrupt handling, I/O scheduling, io_uring
Time to Master: 8-10 hours
Key Topics: DMA, interrupt handling, I/O scheduling, io_uring
Time to Master: 8-10 hours
I/O Hardware Basics
Device Controllers
Device Registers
Every device controller has registers accessible by the CPU:| Register | Purpose |
|---|---|
| Status | Device state (busy, error, ready) |
| Command | What operation to perform |
| Data In | Data from device to CPU |
| Data Out | Data from CPU to device |
I/O Addressing
- Port-Mapped I/O
- Memory-Mapped I/O
Special CPU instructions for I/O:
- Separate I/O address space
- Requires special instructions
- Common on x86
I/O Methods
Programmed I/O (Polling)
CPU actively polls device status:Cons: Wastes CPU cycles, doesn’t scale
Interrupt-Driven I/O
Device signals CPU when ready:Interrupt Lifecycle: Step-by-Step
What exactly happens when you press a key or a packet arrives?- Device Signal: Network card asserts the IRQ line on the bus.
- Controller: Interrupt Controller (APIC) prioritizes it and signals the CPU.
- CPU Context Save:
- CPU finishes current instruction.
- Saves
EFLAGS,CS,EIP(Program Counter) to the kernel stack. - Switches to kernel mode (Ring 0).
- Vector Lookup: CPU reads the Interrupt Descriptor Table (IDT) using the IRQ number.
- Execution (Top Half): Jumps to the registered ISR (Interrupt Service Routine).
- Ack: ISR acknowledges the interrupt to the APIC so it can send more.
- Schedule: Schedules a “Bottom Half” (SoftIRQ/Tasklet) for heavy processing.
- Context Restore: CPU executes
IRET, restoring registers and resuming the interrupted process.
Block Layer Subsystem
Between the File System (VFS) and the Device Driver lies the Block Layer. It optimizes I/O performance.The bio Structure
The basic unit of block I/O in Linux is the struct bio. It represents an in-flight read/write request.
Request Merging (I/O Scheduling)
Disks hate random seeking. The block layer acts as a traffic controller:- Merging: If App A reads sector 100-107 and App B reads 108-115, merge them into one request (100-115).
- Sorting: Reorder requests to minimize disk head movement (Elevator Algorithm).
Direct Memory Access (DMA)
Without DMA, the CPU must copy every byte (PIO). With DMA, the CPU delegates.Cycle Stealing & Arbitration
- The Problem: CPU and DMA Controller share the same system bus. They can’t use it simultaneously.
- Arbitration: The DMA controller requests the bus. The CPU “pauses” (stalls) for a few cycles to grant access.
- Cycle Stealing: DMA steals potential CPU instruction fetch cycles to transfer data.
Character vs Block Devices
| Aspect | Character Device | Block Device |
|---|---|---|
| Access | Byte stream | Fixed-size blocks |
| Buffering | No kernel buffering | Uses buffer cache |
| Random access | Not supported | Supported |
| Examples | Keyboard, serial port | Disk, SSD |
| Major/Minor | Yes | Yes |
Interview Deep Dive Questions
Q1: Explain DMA and its advantages over PIO
Q1: Explain DMA and its advantages over PIO
Answer:PIO (Programmed I/O):Advantages of DMA:
- CPU executes every transfer instruction
- CPU utilization: 100% during transfer
- Good for: Small transfers, simple devices
- CPU efficiency: CPU free during transfer
- Higher throughput: DMA controller optimized for transfers
- Lower latency: No per-byte CPU overhead
- Needs contiguous physical memory (or scatter-gather DMA)
- Cache coherency: CPU cache may have stale data
- Setup overhead: Small transfers may be slower than PIO
Q2: Why is io_uring faster than traditional async I/O?
Q2: Why is io_uring faster than traditional async I/O?
Answer:Traditional read() syscall:io_uring:Speed advantages:
- Batching: 1 syscall for 1000 operations
- SQPOLL mode: Kernel thread polls SQ, zero user syscalls
- Zero-copy: Shared memory rings, no copy in/out
- Registered buffers: Pre-registered, skip validation
- Registered files: File descriptors pre-validated
Q3: How does an interrupt work from hardware to user process?
Q3: How does an interrupt work from hardware to user process?
Answer:Top half vs Bottom half:
- Top half: Runs with interrupts disabled, must be fast
- Bottom half: Can sleep, do heavy processing
- NAPI (networking): Disable interrupts, switch to polling under load
- Threaded IRQs: Handler runs in kernel thread context
Q4: Design a high-performance logging system
Q4: Design a high-performance logging system
Answer:Requirements:Key optimizations:
- Write-heavy (100K+ logs/sec)
- Durability (survive crashes)
- Low latency (don’t block application)
- Thread-local buffers: No contention between threads
- Ring buffers: Fixed memory, no allocation
- Batching: Combine small logs into larger writes
- io_uring: Async, batched I/O submission
- O_APPEND: Kernel handles concurrent appends atomically
- Periodic fsync: Balance durability vs performance
Q5: Explain O_DIRECT and when to use it
Q5: Explain O_DIRECT and when to use it
Answer:Normal I/O path:O_DIRECT I/O path:Requirements for O_DIRECT:
- Buffer must be aligned (typically 512 bytes or 4KB)
- Offset must be aligned
- Size must be aligned
- Databases (manage own buffer pool)
- Video streaming (no reread, waste cache)
- Large sequential scans (would pollute cache)
- When you need predictable latency
- General file access
- Small random reads (cache helps)
- When kernel caching is beneficial
Practice Exercises
1
I/O Scheduler Comparison
Use
fio to benchmark random vs sequential I/O with different schedulers.2
io_uring Program
Write a file copy program using io_uring. Compare performance with cp.
3
DMA Simulation
Implement a simulation of DMA controller behavior with interrupt generation.
4
Simple Driver
Write a character device driver that logs all read/write operations.
Key Takeaways
DMA Frees the CPU
Device transfers data directly. CPU only sets up and handles completion.
io_uring is Revolutionary
Batched, zero-copy async I/O. Essential for high-performance systems.
Buffering Matters
Double buffering enables overlapping I/O and processing.
Scheduler Choice Matters
Use none/kyber for NVMe, mq-deadline for HDDs.
Next: Synchronization →