Skip to main content
Linux Kernel Architecture - Ring layers, modules, and subsystems

Linux Kernel Architecture

Understanding the architecture of the Linux kernel is the foundation for everything else in this course. This module covers how the kernel is organized, why certain design decisions were made, and how to navigate the massive codebase.
Interview Frequency: Very High
Key Topics: Monolithic design, kernel source navigation, boot process, modules
Time to Master: 10-12 hours

| Performance | Direct function calls | IPC overhead for every operation | | Latency | Lower (no message passing) | Higher (context switches) | | Complexity | Harder to maintain safety | Cleaner separation | | Reliability | One bug can crash kernel | Components isolated | | Development | Faster prototyping | More engineering overhead |
Interview Insight: Linux uses a “modular monolithic” approach — monolithic core with loadable modules. This provides the performance of monolithic with some flexibility of microkernel.

The Famous Torvalds-Tanenbaum Debate

In 1992, Linus Torvalds and Andrew Tanenbaum debated kernel design:
  • Tanenbaum: Microkernels are the future; monolithic is obsolete
  • Torvalds: Performance matters; Linux’s approach is pragmatic
Outcome: Linux became the dominant OS kernel, though microkernels power some embedded systems (QNX in cars, L4 in phones).

Kernel Source Tree Organization

The Linux kernel source is massive (30+ million lines), but well-organized:
linux/
├── arch/           # Architecture-specific code (x86, arm64, riscv)
│   ├── x86/
│   │   ├── boot/       # Boot code
│   │   ├── kernel/     # x86-specific kernel code
│   │   ├── mm/         # x86 memory management
│   │   └── entry/      # Syscall entry points
│   └── arm64/

├── block/          # Block layer, I/O scheduling
├── certs/          # Signing certificates
├── crypto/         # Cryptographic API and algorithms
├── Documentation/  # Kernel documentation

├── drivers/        # Device drivers (largest directory)
│   ├── net/            # Network drivers
│   ├── block/          # Block device drivers
│   ├── gpu/            # GPU drivers (including drm)
│   ├── nvme/           # NVMe drivers
│   └── ...

├── fs/             # Filesystems
│   ├── ext4/           # ext4 filesystem
│   ├── xfs/            # XFS filesystem
│   ├── btrfs/          # Btrfs filesystem
│   ├── proc/           # procfs
│   └── ...

├── include/        # Header files
│   ├── linux/          # Public kernel headers
│   ├── uapi/           # User-space API headers
│   └── asm-generic/    # Generic assembly headers

├── init/           # Kernel initialization
│   └── main.c          # start_kernel() lives here

├── ipc/            # Inter-process communication
├── kernel/         # Core kernel code
│   ├── sched/          # Scheduler
│   ├── locking/        # Locks, mutexes
│   ├── trace/          # Tracing infrastructure
│   └── bpf/            # BPF subsystem

├── lib/            # Kernel libraries
├── mm/             # Memory management
│   ├── slab.c          # Slab allocator
│   ├── page_alloc.c    # Page allocator
│   ├── mmap.c          # Memory mapping
│   └── ...

├── net/            # Networking
│   ├── core/           # Core networking
│   ├── ipv4/           # IPv4 stack
│   ├── ipv6/           # IPv6 stack
│   ├── netfilter/      # Packet filtering
│   └── ...

├── scripts/        # Build and helper scripts
├── security/       # Security modules (SELinux, AppArmor)
├── sound/          # Sound subsystem
├── tools/          # Userspace tools (perf, bpf)
│   ├── perf/           # perf tool
│   ├── bpf/            # BPF tools
│   └── ...

├── Kconfig         # Build configuration
├── Makefile        # Main makefile
└── MAINTAINERS     # Who maintains what

Key Files to Know

FilePurpose
init/main.cKernel entry point (start_kernel())
arch/x86/entry/entry_64.SSystem call entry point
kernel/sched/core.cScheduler core
mm/page_alloc.cPage allocator (buddy system)
mm/slab.cSlab allocator
fs/read_write.cread/write system calls
net/core/dev.cCore networking
Navigation Tip: Use tools like cscope, ctags, or online browsers like elixir.bootlin.com to navigate the source.

Boot Process Deep Dive

Understanding how Linux boots is essential for systems engineers: Linux Boot Sequence

start_kernel() - The Heart of Boot

// init/main.c - simplified
asmlinkage __visible void __init start_kernel(void)
{
    // Very early setup
    set_task_stack_end_magic(&init_task);
    
    // Memory management initialization
    setup_arch(&command_line);      // Arch-specific setup
    mm_init();                      // Memory subsystem
    
    // Core subsystems
    sched_init();                   // Scheduler
    rcu_init();                     // RCU
    
    // Interrupts and timers
    init_IRQ();
    tick_init();
    
    // Various subsystems
    vfs_caches_init();              // VFS
    signals_init();                 // Signals
    
    // Start init process
    rest_init();                    // Creates init process
}

Key Boot Parameters

ParameterPurposeExample
root=Root filesystem deviceroot=/dev/sda1
init=First user processinit=/bin/bash
quietSuppress boot messagesquiet
debugEnable debug messagesdebug
nokaslrDisable KASLRnokaslr
isolcpus=Isolate CPUs from schedulerisolcpus=2,3
nosmpDisable SMPnosmp
Debug Tip: Boot with init=/bin/bash to get a shell before init runs. Useful for recovery.

Kernel Address Space Layout

On x86-64, the virtual address space is split between user and kernel:
┌─────────────────────────────────────────────────────────────────────────────┐
│                    x86-64 VIRTUAL ADDRESS SPACE (48-bit)                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  0xFFFFFFFFFFFFFFFF ┌────────────────────────────────────────────────────┐  │
│                     │                                                     │  │
│                     │              KERNEL SPACE (128 TB)                  │  │
│                     │                                                     │  │
│                     │  0xFFFFFFFF80000000 - Kernel text (code)           │  │
│                     │  0xFFFF880000000000 - Direct physical mapping      │  │
│                     │  0xFFFFC90000000000 - vmalloc area                 │  │
│                     │  0xFFFFEA0000000000 - Virtual memory map           │  │
│                     │                                                     │  │
│  0xFFFF800000000000 ├────────────────────────────────────────────────────┤  │
│                     │              NON-CANONICAL HOLE                     │  │
│                     │         (addresses that cause fault)                │  │
│  0x0000800000000000 ├────────────────────────────────────────────────────┤  │
│                     │                                                     │  │
│                     │              USER SPACE (128 TB)                    │  │
│                     │                                                     │  │
│                     │  Stack (grows down from near top)                  │  │
│                     │  mmap region (shared libs, anonymous maps)         │  │
│                     │  Heap (grows up from end of data)                  │  │
│                     │  BSS, Data, Text (program sections)                │  │
│                     │                                                     │  │
│  0x0000000000000000 └────────────────────────────────────────────────────┘  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

KASLR (Kernel Address Space Layout Randomization)

KASLR randomizes kernel addresses at boot for security:
# Check if KASLR is enabled
cat /proc/cmdline | grep -q nokaslr && echo "KASLR disabled" || echo "KASLR enabled"

# See kernel text base (will differ each boot with KASLR)
sudo cat /proc/kallsyms | grep " _text" | head -1

Loadable Kernel Modules

Modules allow extending the kernel without recompiling:

Module Structure

// hello_module.c - Simple kernel module
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Your Name");
MODULE_DESCRIPTION("A simple Hello World module");
MODULE_VERSION("1.0");

static int __init hello_init(void)
{
    printk(KERN_INFO "Hello, Kernel!\n");
    return 0;  // 0 = success
}

static void __exit hello_exit(void)
{
    printk(KERN_INFO "Goodbye, Kernel!\n");
}

module_init(hello_init);
module_exit(hello_exit);

Module Loading Process

Module Loading Process

Module Management Commands

# List loaded modules
lsmod

# Module info
modinfo ext4

# Load module
sudo modprobe ext4

# Load with parameters
sudo modprobe loop max_loop=64

# Remove module
sudo rmmod loop

# Show module dependencies
modprobe --show-depends ext4

# Show module parameters
systool -vm ext4

Module Parameters

// Module with parameters
static int buffer_size = 1024;
module_param(buffer_size, int, 0644);  // Read-write in sysfs
MODULE_PARM_DESC(buffer_size, "Size of internal buffer");

static char *device_name = "mydev";
module_param(device_name, charp, 0444);  // Read-only
MODULE_PARM_DESC(device_name, "Device name");

Kernel Threads

Kernel threads (kthreads) are processes that run entirely in kernel mode:
// Creating a kernel thread
#include <linux/kthread.h>

static struct task_struct *my_thread;

static int thread_function(void *data)
{
    while (!kthread_should_stop()) {
        // Do work
        schedule_timeout_interruptible(HZ);  // Sleep 1 second
    }
    return 0;
}

// In module init:
my_thread = kthread_run(thread_function, NULL, "my_kthread");

// In module exit:
kthread_stop(my_thread);

Important Kernel Threads

# View kernel threads (names in brackets)
ps aux | grep '\[.*\]'

# Common kernel threads:
# [kthreadd]     - Parent of all kernel threads
# [ksoftirqd/N]  - Soft IRQ handling for CPU N
# [kworker/N:M]  - Workqueue workers
# [kswapd0]      - Memory reclaim
# [jbd2/sda1-8]  - Journal block device (ext4 journaling)
# [kcompactd0]   - Memory compaction

Lab Exercises

Objective: Get comfortable with kernel source tree
# Clone kernel source
git clone --depth=1 https://github.com/torvalds/linux.git
cd linux

# Find start_kernel
grep -rn "asmlinkage.*start_kernel" init/

# Find syscall table (x86-64)
find arch/x86 -name "*syscall*"

# Count lines in different subsystems
wc -l mm/*.c       # Memory management
wc -l kernel/*.c   # Core kernel
wc -l fs/*.c       # Filesystems
Objective: Write, compile, and load a kernel module
// Save as hello.c
#include <linux/init.h>
#include <linux/module.h>

MODULE_LICENSE("GPL");

static int __init hello_init(void)
{
    pr_info("Hello from kernel module!\n");
    return 0;
}

static void __exit hello_exit(void)
{
    pr_info("Goodbye from kernel module!\n");
}

module_init(hello_init);
module_exit(hello_exit);
# Makefile
obj-m := hello.o

KDIR := /lib/modules/$(shell uname -r)/build

all:
	make -C $(KDIR) M=$(PWD) modules

clean:
	make -C $(KDIR) M=$(PWD) clean
# Build and test
make
sudo insmod hello.ko
dmesg | tail
sudo rmmod hello
dmesg | tail
Objective: Understand boot timing and initialization
# View boot messages
dmesg | head -100

# Boot timing analysis
systemd-analyze
systemd-analyze blame
systemd-analyze critical-chain

# Kernel command line used
cat /proc/cmdline

# Initramfs contents
lsinitramfs /boot/initramfs-$(uname -r).img | head -50

Interview Questions

Answer:Linux chose monolithic design for performance:
  • Direct function calls between subsystems (no IPC overhead)
  • Single address space eliminates context switch on internal calls
  • Simpler data sharing between components
Trade-offs:
  • A bug in any component can crash the entire kernel
  • Larger attack surface (all code runs privileged)
  • More complex codebase to maintain
Mitigations in Linux:
  • Loadable modules for flexibility
  • Namespaces and cgroups for isolation
  • Seccomp for syscall filtering
  • Strong code review process
Answer (kernel perspective):
  1. Shell process:
    • fork() → creates child process (clone syscall)
    • execve("/bin/ls") → replaces process image
  2. execve processing:
    • Kernel opens ELF binary
    • Maps code/data sections into memory
    • Sets up stack with arguments/environment
    • Loads dynamic linker (ld.so)
  3. ls execution:
    • Dynamic linker loads libc
    • ls calls opendir()getdents64 syscall
    • Kernel reads directory entries from filesystem
    • ls calls write() → output to terminal
  4. Termination:
    • ls calls exit()exit_group syscall
    • Kernel cleans up resources
    • Parent shell’s wait() returns
Answer:
AspectKernel ModuleShared Library
PrivilegeRuns in kernel modeRuns in user mode
Address spaceKernel address spaceProcess address space
Fault impactCan crash systemCrashes only that process
Symbol resolutionKernel symbol tableUser-space linker
MemoryUses kmalloc, vmallocUses malloc, mmap
Loadinginsmod, modprobeld.so, dlopen
Key insight: Modules are essentially kernel code with a defined entry/exit point, while shared libraries are user-space code loaded by the dynamic linker.
Answer:KASLR (Kernel Address Space Layout Randomization):
  • Randomizes kernel base address at each boot
  • Makes it harder to exploit memory corruption vulnerabilities
  • Attacker can’t hardcode kernel addresses
Implementation:
  • Random offset chosen during early boot
  • All kernel symbols shifted by this offset
  • /proc/kallsyms shows randomized addresses
Limitations:
  • Information leaks can reveal base address
  • Side-channel attacks (Meltdown/Spectre) can bypass
  • Doesn’t protect against local attackers with kernel memory access
Related protections:
  • SMEP (Supervisor Mode Execution Prevention)
  • SMAP (Supervisor Mode Access Prevention)
  • Stack canaries

Key Takeaways

Architecture Choice

Linux’s monolithic design prioritizes performance while modules add flexibility

Source Organization

Understanding source tree layout is essential for kernel development and debugging

Boot Process

From firmware to init, each stage has specific responsibilities and debugging points

Module System

Modules extend kernel functionality at runtime without recompilation

Further Reading


Next: System Call Interface →