The System Boot Process
Booting an operating system is a series of “handoffs” where each stage initializes more hardware and increases the CPU’s capability until the full kernel is in control. For a systems engineer, this sequence is where the “magic” of the hardware-software boundary happens.Mastery Level: Senior Systems Engineer
Key Internals: Reset Vector
Prerequisites: CPU Architectures, Memory Management
Key Internals: Reset Vector
0xFFFFFFF0, CR0/CR4/EFER registers, GDT/IDT layout, Page Table bootstrappingPrerequisites: CPU Architectures, Memory Management
1. The Reset Vector: The CPU’s First Breath
When you press the power button, the CPU is in a state of “Real Mode” (16-bit) but with a twist. It does not start at address0x0000.
- The Address: On x86-64, the CPU begins execution at
0xFFFFFFF0(16 bytes below the 4GB mark). - The Hidden Base: While in 16-bit mode the address space is normally 1MB, at reset, the Code Segment (CS) register has a hidden base of
0xFFFF0000. Thus,CS:IPpoints to the top of the 4GB space, which is mapped by the motherboard to the Flash ROM containing the BIOS or UEFI.
2. Firmware: BIOS vs. UEFI
The firmware’s job is to perform POST (Power-On Self Test) and find a bootable device.2.1 Legacy BIOS (Basic Input/Output System)
- Mode: Runs entirely in 16-bit Real Mode.
- Disk Format: Uses MBR (Master Boot Record). The BIOS reads the first 512-byte sector of the disk and jumps to it.
- Limitations: Max 2TB disks, 4 primary partitions, slow interrupt-based I/O.
2.2 Modern UEFI (Unified Extensible Firmware Interface)
- Mode: Runs in 32-bit or 64-bit mode from the start.
- Disk Format: Uses GPT (GUID Partition Table) and a dedicated EFI System Partition (ESP).
- The Protocol: Instead of jumping to a sector, UEFI understands filesystems (FAT32) and loads PE/COFF executables (e.g.,
grubx64.efi).
- Boot Services: Used by the bootloader (e.g., to read files). Terminated once the OS starts.
- Runtime Services: Available even after the OS boots (e.g., setting UEFI variables, NVRAM).
3. The Road to 64-bit: Mode Transitions
The most complex part of the boot process is transitioning the CPU from its 16-bit legacy state to full 64-bit Long Mode.Step 1: Real Mode to Protected Mode (32-bit)
- Disable Interrupts:
cli(Clear Interrupt Flag) to prevent interrupts before the IDT is ready. - Enable A20 Gate: A legacy hack to allow addressing above 1MB.
- Load GDT: The Global Descriptor Table defines how memory segments work.
- Set CR0.PE: Set the Protection Enable bit in the
CR0register. - Far Jump: A special jump that flushes the CPU pipeline and loads the new 32-bit Code Segment.
Step 2: Protected Mode to Long Mode (64-bit)
- Set CR4.PAE: Physical Address Extension is required for 64-bit mode.
- Setup Page Tables: You must enable paging to enter Long Mode. The kernel builds a minimal “Identity Map” (Virtual Address = Physical Address) for the first 1GB of memory.
- Set EFER.LME: Set the Long Mode Enable bit in the Extended Feature Enable Register.
- Set CR0.PG: Enable Paging. The CPU is now in “Compatibility Mode.”
- Final Far Jump: A jump to a 64-bit segment officially enters Long Mode.
4. Kernel Data Structures: GDT and IDT
The kernel must define how it will handle memory and interrupts before it can do anything else.4.1 The GDT (Global Descriptor Table)
The GDT is an array of 8-byte descriptors. Even in “Flat” 64-bit mode where segmentation is mostly unused, the GDT is required to define:- Kernel Code Segment: Rings 0, Executable, Readable.
- Kernel Data Segment: Rings 0, Readable, Writable.
- User Code/Data Segments: Rings 3.
- TSS (Task State Segment): Points to the stack to use when an interrupt occurs.
4.2 The IDT (Interrupt Descriptor Table)
The IDT maps interrupt vectors (0-255) to handler functions.- Vectors 0-31: Reserved for CPU exceptions (Divide by Zero, Page Fault, etc.).
- Vectors 32-255: Available for hardware interrupts and system calls.
- Gate Types: Interrupt Gates (clear IF), Trap Gates (don’t clear IF).
5. The Kernel Entry Sequence (Linux)
Once the bootloader (GRUB) loads the kernel into memory, it jumps to the kernel’s entry point.5.1 Decompression (head_64.S)
The Linux kernel is actually a self-extracting executable (vmlinuz).
- The early code decompresses the “real” kernel image into a higher memory address.
- It sets up a temporary stack.
- It jumps to the decompressed kernel’s entry point.
5.2 The start_kernel() Function
This is the “Big Bang” of the operating system. It is architecture-independent C code that:
setup_arch(): Handles CPU-specific initialization.mm_init(): Initializes the full Buddy Allocator and Slab Allocator.sched_init(): Sets up the scheduler and the “Idle” task.rest_init(): Spawns Process 1 (init) and Process 2 (kthreadd).
6. The Handover to User Space
6.1 Initramfs (Initial RAM Filesystem)
The kernel cannot mount the real root disk immediately (it might need a driver for the SSD or a network driver for NFS).- The bootloader loads a small CPIO archive into memory (initrd/initramfs).
- The kernel mounts this as
/. - It runs
/initfrom the ramfs, which loads necessary drivers and finally “Switches Root” to the real disk.
6.2 PID 1: systemd / SysV init
The final step is to execute the first user-space process.- Path:
/sbin/init(or whateverinit=kernel parameter specifies). - The PID 1 Rule: This process is the ancestor of all others. If it ever exits, the kernel triggers a Kernel Panic.
7. Interview Deep Dive: Senior Level
What is the 'A20 Gate' and why is it still relevant?
What is the 'A20 Gate' and why is it still relevant?
On the original 8086, memory addresses wrapped around at 1MB. When the 80286 arrived, it could address more, but some old programs relied on the wrap-around bug. IBM added a gate on the 20th address line (A20) to manually enable/disable the wrap-around. Even today, x86 CPUs start with A20 disabled for compatibility, and the bootloader must enable it to access more than 1MB of RAM.
How does the kernel find its own code in memory before paging is enabled?
How does the kernel find its own code in memory before paging is enabled?
The kernel is compiled as a Position Independent Executable (PIE) or uses Relative Addressing. Early boot code uses the Instruction Pointer (RIP) relative instructions to find data. Once the kernel sets up the initial page tables and enables paging, it can “jump” into the virtual address space it has defined.
Explain the 'Identity Mapping' during boot.
Explain the 'Identity Mapping' during boot.
When the kernel enables paging (setting
CR0.PG), the CPU immediately begins interpreting all addresses as virtual. If the kernel didn’t “Identity Map” (Map Virtual 0x1234 to Physical 0x1234) the code it is currently executing, the very next instruction fetch would fail because the MMU wouldn’t know where to find the code, resulting in an immediate crash.8. Advanced Practice
- GDT Inspector: Use
gdband QEMU (-s -S) to inspect the GDT of a booting kernel. Use the commandmonitor info gdt. - Early Printk: Add a
printk("Hello from early boot!");to thestart_kernelfunction in a Linux source tree and compile it. Observe when the message appears during boot. - UEFI Shell: Boot into a UEFI shell and use the
lsandmapcommands to see how the firmware sees your disks and partitions.
9. Checklist: From Power-On to Login Prompt
Use this quick reference to understand (or debug) the complete boot sequence:Phase-by-Phase Checklist
Debugging Boot Issues
| Symptom | Phase | Debug Command |
|---|---|---|
| No display, fans spin | Firmware | Check POST codes, clear CMOS |
| ”No bootable device” | Firmware/ESP | efibootmgr -v, check ESP mount |
| GRUB rescue prompt | Bootloader | Boot from live USB, reinstall GRUB |
| Kernel panic early | Kernel early | Add debug to kernel cmdline |
| initramfs drops to shell | Initramfs | Check /dev, load missing drivers |
| systemd fails | Init system | systemctl --failed, journalctl -b |
Quick Boot Time Analysis
Next: Memory Management & Allocators →