Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Linux Kernel Architecture
Understanding the architecture of the Linux kernel is the foundation for everything else in this course. This module covers how the kernel is organized, why certain design decisions were made, and how to navigate the massive codebase.Key Topics: Monolithic design, kernel source navigation, boot process, modules
Time to Master: 10-12 hours
| Performance | Direct function calls | IPC overhead for every operation | | Latency | Lower (no message passing) | Higher (context switches) | | Complexity | Harder to maintain safety | Cleaner separation | | Reliability | One bug can crash kernel | Components isolated | | Development | Faster prototyping | More engineering overhead |
The Famous Torvalds-Tanenbaum Debate
In 1992, Linus Torvalds and Andrew Tanenbaum debated kernel design:- Tanenbaum: Microkernels are the future; monolithic is obsolete
- Torvalds: Performance matters; Linux’s approach is pragmatic
Kernel Source Tree Organization
The Linux kernel source is massive (30+ million lines), but well-organized:Key Files to Know
| File | Purpose |
|---|---|
init/main.c | Kernel entry point (start_kernel()) |
arch/x86/entry/entry_64.S | System call entry point |
kernel/sched/core.c | Scheduler core |
mm/page_alloc.c | Page allocator (buddy system) |
mm/slab.c | Slab allocator |
fs/read_write.c | read/write system calls |
net/core/dev.c | Core networking |
Boot Process Deep Dive
Understanding how Linux boots is essential for systems engineers:start_kernel() - The Heart of Boot
Key Boot Parameters
| Parameter | Purpose | Example |
|---|---|---|
root= | Root filesystem device | root=/dev/sda1 |
init= | First user process | init=/bin/bash |
quiet | Suppress boot messages | quiet |
debug | Enable debug messages | debug |
nokaslr | Disable KASLR | nokaslr |
isolcpus= | Isolate CPUs from scheduler | isolcpus=2,3 |
nosmp | Disable SMP | nosmp |
Kernel Address Space Layout
On x86-64, the virtual address space is split between user and kernel:KASLR (Kernel Address Space Layout Randomization)
KASLR randomizes kernel addresses at boot for security:Loadable Kernel Modules
Modules allow extending the kernel without recompiling:Module Structure
Module Loading Process
Module Management Commands
Module Parameters
Kernel Threads
Kernel threads (kthreads) are processes that run entirely in kernel mode:Important Kernel Threads
Lab Exercises
Lab 1: Navigate Kernel Source
Lab 1: Navigate Kernel Source
Lab 2: Build and Load Module
Lab 2: Build and Load Module
Lab 3: Analyze Boot Process
Lab 3: Analyze Boot Process
Interview Questions
Q1: Why is Linux monolithic, and what are the trade-offs?
Q1: Why is Linux monolithic, and what are the trade-offs?
- Direct function calls between subsystems (no IPC overhead)
- Single address space eliminates context switch on internal calls
- Simpler data sharing between components
- A bug in any component can crash the entire kernel
- Larger attack surface (all code runs privileged)
- More complex codebase to maintain
- Loadable modules for flexibility
- Namespaces and cgroups for isolation
- Seccomp for syscall filtering
- Strong code review process
Q2: Walk through what happens when you type 'ls' and press Enter
Q2: Walk through what happens when you type 'ls' and press Enter
-
Shell process:
fork()→ creates child process (clone syscall)execve("/bin/ls")→ replaces process image
-
execve processing:
- Kernel opens ELF binary
- Maps code/data sections into memory
- Sets up stack with arguments/environment
- Loads dynamic linker (ld.so)
-
ls execution:
- Dynamic linker loads libc
- ls calls
opendir()→getdents64syscall - Kernel reads directory entries from filesystem
- ls calls
write()→ output to terminal
-
Termination:
- ls calls
exit()→exit_groupsyscall - Kernel cleans up resources
- Parent shell’s
wait()returns
- ls calls
Q3: How do kernel modules differ from user-space shared libraries?
Q3: How do kernel modules differ from user-space shared libraries?
Q4: Explain KASLR and its security benefits
Q4: Explain KASLR and its security benefits
- Randomizes kernel base address at each boot
- Makes it harder to exploit memory corruption vulnerabilities
- Attacker can’t hardcode kernel addresses
- Random offset chosen during early boot
- All kernel symbols shifted by this offset
/proc/kallsymsshows randomized addresses
- Information leaks can reveal base address
- Side-channel attacks (Meltdown/Spectre) can bypass
- Doesn’t protect against local attackers with kernel memory access
- SMEP (Supervisor Mode Execution Prevention)
- SMAP (Supervisor Mode Access Prevention)
- Stack canaries
Key Takeaways
Architecture Choice
Source Organization
Boot Process
Module System
Further Reading
- Kernel Newbies - Great for getting started
- LWN.net - In-depth kernel coverage
- kernel.org Documentation - Official docs
- Linux Kernel Development by Robert Love - Essential book
Interview Deep-Dive
Your team is debugging a production server that panicked. The only clue is a kernel oops pointing to a loadable module. Walk me through how you would investigate this, starting from the oops output.
Your team is debugging a production server that panicked. The only clue is a kernel oops pointing to a loadable module. Walk me through how you would investigate this, starting from the oops output.
- First, I would parse the oops message to extract the faulting instruction pointer (RIP), the module name, and the call trace. The RIP tells me exactly which function and offset within the module triggered the fault, and I can use
addr2lineorobjdump -dagainst the module’s.kofile with debug symbols to map that to a source line. - Next, I would check whether the module was loaded with
modinfoto confirm its version, parameters, and whether it matches the running kernel. A common root cause is loading a module compiled against a different kernel version, which causes struct layout mismatches since the kernel does not guarantee a stable internal ABI. - I would also examine the register dump and stack trace to understand what data the function was operating on. For example, if a NULL pointer dereference is indicated, I would look at which struct field was being accessed and trace backward through the call chain to find who passed a NULL pointer.
- Finally, if this is reproducible, I would boot with
module_blacklist=<mod>to confirm the module is the cause, then load it with dynamic debug enabled (dyndbg='+p') or addpr_debugstatements to the module source to trace the exact code path leading to the crash.
- KASLR randomizes the kernel base address at each boot, so the faulting address in the oops does not correspond to the compile-time addresses in the symbol table. To decode the address, I need either the
/proc/kallsymsoutput from that exact boot session (before the panic), or I need to subtract the KASLR offset, which the oops message itself sometimes prints. If the system was configured with kdump, the crash dump captures the full memory image including the randomized layout, and tools likecrashcan resolve symbols automatically. For modules, the oops typically prints the module load address, and I can compute the offset from there.
Explain the trade-off Linux makes by using a monolithic kernel with loadable modules versus a pure microkernel. If you were designing a new OS for a safety-critical embedded system, which would you choose and why?
Explain the trade-off Linux makes by using a monolithic kernel with loadable modules versus a pure microkernel. If you were designing a new OS for a safety-critical embedded system, which would you choose and why?
- The monolithic design means all kernel subsystems — scheduler, memory manager, filesystem, device drivers — share a single address space and communicate via direct function calls. This eliminates IPC overhead that microkernels pay on every cross-subsystem call, which can be hundreds of nanoseconds per message pass. For a general-purpose OS handling millions of syscalls per second, this performance advantage is decisive.
- The trade-off is reliability and security. A bug in any driver or subsystem can corrupt kernel memory and crash the entire system. Microkernels like QNX isolate each component in its own address space, so a faulty network driver crashes only that process and can be restarted without rebooting.
- Linux mitigates the monolithic downsides through loadable modules (isolate at build-time), static analysis tools like Sparse and Coccinelle, extensive code review, and runtime protections like KASAN, UBSAN, and lockdep. But these are mitigations, not guarantees.
- For a safety-critical embedded system — say, an avionics flight controller — I would choose a microkernel like seL4 or a certified RTOS. The formal verification guarantees and fault isolation outweigh the performance overhead, because correctness is non-negotiable. However, for a high-throughput infrastructure server, Linux’s monolithic approach with modules remains the pragmatic choice.
- Loadable modules give Linux a degree of runtime flexibility that pure monolithic kernels lack. You can load a filesystem driver only when a specific filesystem is mounted, unload a network driver when the interface goes down, and update drivers without rebooting. However, modules still run in kernel address space with full privileges — there is no isolation boundary. A buggy module can still panic the kernel. So modules provide deployment flexibility (similar to microkernel services that can be started/stopped independently) without the fault isolation that defines a true microkernel.
During the boot process, start_kernel() calls dozens of initialization functions in a specific order. Why does the order matter, and what would happen if you swapped mm_init() and sched_init()?
During the boot process, start_kernel() calls dozens of initialization functions in a specific order. Why does the order matter, and what would happen if you swapped mm_init() and sched_init()?
- The initialization order in start_kernel() reflects hard dependencies between subsystems. Memory management must be initialized before the scheduler because the scheduler needs to allocate data structures — runqueues, per-CPU variables, and the initial task_struct copies — and those allocations require a functioning page allocator and slab allocator.
- If you swapped them, sched_init() would attempt to call kmalloc or alloc_percpu before the memory allocator is ready. This would either trigger a NULL pointer dereference (if the allocator pointers are not yet set up) or corrupt memory by writing to uninitialized data structures. The kernel would panic very early in boot, likely before any console output is visible.
- This ordering principle extends throughout the boot: interrupts must be initialized before timers (timers are delivered via interrupts), VFS must be initialized before mounting the root filesystem, and RCU must be set up before any RCU-protected data structures are used. Each subsystem’s init function documents its dependencies implicitly through the call order in start_kernel().
- The primary tool is
earlyprintk, a kernel boot parameter that configures a simple output driver (serial port, VGA) before the normal console subsystem is ready. For example,earlyprintk=serial,ttyS0,115200sends kernel messages to the serial port immediately. If even that fails, I would use JTAG or a hardware debugger to set breakpoints at start_kernel() and single-step through the initialization sequence. On virtual machines, QEMU with-serial stdioand-s -Sflags lets me attach GDB to the kernel from the very first instruction.
A colleague proposes writing a complex monitoring agent as a kernel module instead of a user-space daemon. What concerns would you raise, and when might a kernel module actually be the right choice?
A colleague proposes writing a complex monitoring agent as a kernel module instead of a user-space daemon. What concerns would you raise, and when might a kernel module actually be the right choice?
- My primary concern is stability risk. A kernel module runs with full kernel privileges, and any bug — buffer overflow, use-after-free, deadlock — can panic the entire system, not just crash the agent. User-space processes are isolated by virtual memory: a segfault kills the process, not the machine.
- Second, development velocity suffers. Kernel modules cannot use standard C libraries, memory debugging tools like Valgrind or AddressSanitizer work differently, and testing requires either VMs or rebooting. User-space development is dramatically faster.
- Third, there is a maintenance burden. The kernel does not guarantee a stable internal ABI, so a module compiled for kernel 5.15 may not load on 5.19 if struct layouts changed. The module must be recompiled for each kernel version the fleet runs.
- A kernel module is the right choice when you need access to information or hooks that are not exposed to user space — for example, intercepting every context switch for precise scheduling analysis, or implementing a custom block I/O scheduler. However, with eBPF now providing safe, verified access to kernel hooks from user space, most monitoring use cases should prefer eBPF over custom modules.
- eBPF programs pass through a verifier before loading that proves termination (bounded loops, no unbounded recursion), memory safety (all pointer accesses bounds-checked), and type safety (correct helper function arguments). The verifier rejects any program that could crash the kernel. Additionally, eBPF programs run in a restricted execution environment: they cannot call arbitrary kernel functions, only approved helper functions, and they have a limited stack (512 bytes). This makes eBPF a safe middle ground between full kernel module access and user-space isolation.
Next: System Call Interface →