Skip to main content

The System Boot Process

Booting an operating system is a series of “handoffs” where each stage initializes more hardware and increases the CPU’s capability until the full kernel is in control. For a systems engineer, this sequence is where the “magic” of the hardware-software boundary happens.
Mastery Level: Senior Systems Engineer
Key Internals: Reset Vector 0xFFFFFFF0, CR0/CR4/EFER registers, GDT/IDT layout, Page Table bootstrapping
Prerequisites: CPU Architectures, Memory Management

1. The Reset Vector: The CPU’s First Breath

When you press the power button, the CPU is in a state of “Real Mode” (16-bit) but with a twist. It does not start at address 0x0000.
  • The Address: On x86-64, the CPU begins execution at 0xFFFFFFF0 (16 bytes below the 4GB mark).
  • The Hidden Base: While in 16-bit mode the address space is normally 1MB, at reset, the Code Segment (CS) register has a hidden base of 0xFFFF0000. Thus, CS:IP points to the top of the 4GB space, which is mapped by the motherboard to the Flash ROM containing the BIOS or UEFI.

2. Firmware: BIOS vs. UEFI

The firmware’s job is to perform POST (Power-On Self Test) and find a bootable device.
┌─────────────────────────────────────────────────────────────────────┐
│                      BIOS VS UEFI COMPARISON                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  BIOS (Legacy)                          UEFI (Modern)               │
│  ┌───────────────────────────────┐     ┌──────────────────────────┐│
│  │                               │     │                          ││
│  │  1. Power On                  │     │  1. Power On             ││
│  │     • CPU starts in Real Mode │     │     • CPU in 32/64-bit   ││
│  │     • 16-bit addressing       │     │     • Full addressing    ││
│  │                               │     │                          ││
│  │  2. POST (Power-On Self Test) │     │  2. POST + Init          ││
│  │     • Memory check            │     │     • DXE (Driver Exec)  ││
│  │     • Hardware detection      │     │     • Load drivers       ││
│  │                               │     │                          ││
│  │  3. Find Boot Device          │     │  3. Boot Manager         ││
│  │     • Check boot order        │     │     • Read EFI variables ││
│  │     • Read first sector (MBR) │     │     • Load from ESP      ││
│  │     • 512 bytes max           │     │     • Read FAT32 FS      ││
│  │                               │     │                          ││
│  │  4. Load & Execute MBR        │     │  4. Load EFI Application ││
│  │     • Jump to 0x7C00          │     │     • .efi PE/COFF exec  ││
│  │     • 446 bytes of code       │     │     • Graphics, mouse    ││
│  │     • Chain load bootloader   │     │     • Full environment   ││
│  │                               │     │                          ││
│  │  Limitations:                 │     │  Advantages:             ││
│  │  • 2TB disk max (32-bit LBA)  │     │  • 9.4ZB disk (GPT)      ││
│  │  • 4 primary partitions       │     │  • 128 partitions        ││
│  │  • No security features       │     │  • Secure Boot           ││
│  │  • Slow int 13h I/O           │     │  • Fast block I/O        ││
│  │  • Text mode only             │     │  • GUI support           ││
│  │                               │     │                          ││
│  └───────────────────────────────┘     └──────────────────────────┘│
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

2.1 Legacy BIOS (Basic Input/Output System)

  • Mode: Runs entirely in 16-bit Real Mode.
  • Disk Format: Uses MBR (Master Boot Record). The BIOS reads the first 512-byte sector of the disk and jumps to it.
  • Limitations: Max 2TB disks, 4 primary partitions, slow interrupt-based I/O.
MBR Structure:
┌─────────────────────────────────────────────────────────────────────┐
│                  MBR (Master Boot Record) Layout                    │
│                          512 Bytes Total                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Offset 0x000 - 0x1BD (446 bytes): Bootstrap Code                  │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  • First-stage bootloader code                                 │ │
│  │  • Loaded at 0x7C00 in memory                                  │ │
│  │  • Jumps to partition boot sector or loads stage 2             │ │
│  │  • Example: GRUB stage 1                                       │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  Offset 0x1BE - 0x1FD (64 bytes): Partition Table                  │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  Partition 1 (16 bytes):                                       │ │
│  │  ┌──────────────────────────────────────────────────────────┐ │ │
│  │  │  Boot flag    │ Type    │ Start LBA  │ Size (sectors)   │ │ │
│  │  │  0x80 (active)│ 0x83 (Linux) │ 2048  │ 204800          │ │ │
│  │  └──────────────────────────────────────────────────────────┘ │ │
│  │  Partition 2 (16 bytes): [similar structure]                  │ │
│  │  Partition 3 (16 bytes): [similar structure]                  │ │
│  │  Partition 4 (16 bytes): [similar structure]                  │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  Offset 0x1FE - 0x1FF (2 bytes): Boot Signature                    │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  0x55 0xAA  ← Must be present for BIOS to recognize bootable  │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
BIOS Boot Process:
; BIOS loads MBR to 0x7C00 and jumps here
org 0x7C00
bits 16

start:
    ; Disable interrupts
    cli

    ; Set up segments
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00    ; Stack grows down from MBR

    ; Enable interrupts
    sti

    ; Load rest of bootloader from disk
    mov ah, 0x02      ; BIOS read sectors function
    mov al, 1         ; Number of sectors to read
    mov ch, 0         ; Cylinder
    mov cl, 2         ; Sector (sector 1 is MBR, start at 2)
    mov dh, 0         ; Head
    mov dl, 0x80      ; Drive (0x80 = first hard disk)
    mov bx, 0x7E00    ; Load to memory after MBR
    int 0x13          ; Call BIOS interrupt

    jc error          ; Jump if carry flag set (error)

    ; Jump to loaded code
    jmp 0x7E00

error:
    mov si, error_msg
    call print_string
    hlt

print_string:
    lodsb
    or al, al
    jz .done
    mov ah, 0x0E
    int 0x10
    jmp print_string
.done:
    ret

error_msg: db 'Boot error!', 0

times 510-($-$$) db 0  ; Pad to 510 bytes
dw 0xAA55              ; Boot signature

2.2 Modern UEFI (Unified Extensible Firmware Interface)

  • Mode: Runs in 32-bit or 64-bit mode from the start.
  • Disk Format: Uses GPT (GUID Partition Table) and a dedicated EFI System Partition (ESP).
  • The Protocol: Instead of jumping to a sector, UEFI understands filesystems (FAT32) and loads PE/COFF executables (e.g., grubx64.efi).
GPT Structure:
┌─────────────────────────────────────────────────────────────────────┐
│                  GPT (GUID Partition Table) Layout                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  LBA 0: Protective MBR                                              │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  Legacy MBR for backward compatibility                          │ │
│  │  Single partition entry covering entire disk                    │ │
│  │  Type: 0xEE (GPT Protective)                                    │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  LBA 1: GPT Header (Primary)                                        │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  Signature: "EFI PART"                                          │ │
│  │  Revision: 0x00010000                                           │ │
│  │  Header Size: 92 bytes                                          │ │
│  │  CRC32 checksum                                                 │ │
│  │  Current LBA: 1                                                 │ │
│  │  Backup LBA: (last LBA on disk)                                 │ │
│  │  First usable LBA: 34                                           │ │
│  │  Last usable LBA: (disk size - 34)                              │ │
│  │  Disk GUID: unique identifier                                   │ │
│  │  Partition entries start: LBA 2                                 │ │
│  │  Number of partition entries: 128                               │ │
│  │  Size of partition entry: 128 bytes                             │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  LBA 2-33: Partition Entry Array                                    │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  Entry 1: EFI System Partition                                 │ │
│  │  ┌──────────────────────────────────────────────────────────┐ │ │
│  │  │  Partition type GUID:                                     │ │ │
│  │  │  C12A7328-F81F-11D2-BA4B-00A0C93EC93B (ESP)               │ │ │
│  │  │  Unique partition GUID: (random)                          │ │ │
│  │  │  First LBA: 2048                                          │ │ │
│  │  │  Last LBA: 1048575                                        │ │ │
│  │  │  Attributes: 0x00 (no special flags)                      │ │ │
│  │  │  Partition name: "EFI System"                             │ │ │
│  │  └──────────────────────────────────────────────────────────┘ │ │
│  │                                                                 │ │
│  │  Entry 2: Root Partition                                       │ │
│  │  ┌──────────────────────────────────────────────────────────┐ │ │
│  │  │  Partition type GUID:                                     │ │ │
│  │  │  0FC63DAF-8483-4772-8E79-3D69D8477DE4 (Linux FS)         │ │ │
│  │  │  First LBA: 1048576                                       │ │ │
│  │  │  Last LBA: ...                                            │ │ │
│  │  └──────────────────────────────────────────────────────────┘ │ │
│  │                                                                 │ │
│  │  Entries 3-128: (unused or additional partitions)              │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  LBA 34+: Partition Data                                            │
│  LBA (end-33 to end-1): Backup Partition Entry Array                │
│  LBA (end): Backup GPT Header                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
UEFI Boot Process:
┌─────────────────────────────────────────────────────────────────────┐
│                      UEFI BOOT FLOW                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  1. SEC (Security Phase)                                            │
│     • CPU microcode validation                                      │
│     • Cache-as-RAM setup (CAR)                                      │
│     • Temporary memory before DRAM init                             │
│                                                                     │
│  2. PEI (Pre-EFI Initialization)                                    │
│     • Memory controller initialization                              │
│     • Initialize DRAM                                               │
│     • Copy firmware to RAM                                          │
│     • Prepare for DXE phase                                         │
│                                                                     │
│  3. DXE (Driver Execution Environment)                              │
│     • Load hardware drivers                                         │
│     • Initialize PCI, USB, SATA, etc.                               │
│     • Build ACPI tables                                             │
│     • Set up UEFI Boot Services and Runtime Services                │
│                                                                     │
│  4. BDS (Boot Device Selection)                                     │
│     • Read boot order from NVRAM                                    │
│     • BootOrder variable: {0003, 0001, 0002}                        │
│     • Boot0001 = "ubuntu" -> \EFI\ubuntu\shimx64.efi                │
│     • Boot0002 = "Windows" -> \EFI\Microsoft\Boot\bootmgfw.efi      │
│     • Boot0003 = "USB" -> \EFI\Boot\bootx64.efi                     │
│     • Scan ESP (EFI System Partition) on GPT disks                  │
│     • Look for removable media fallback paths                       │
│                                                                     │
│  5. Load EFI Application                                            │
│     • Mount ESP (FAT32 filesystem)                                  │
│     • Load bootloader (e.g., grubx64.efi, shimx64.efi)              │
│     • Provide Boot Services:                                        │
│       - LocateProtocol() - Find device drivers                      │
│       - LoadImage() - Load executables                              │
│       - StartImage() - Execute loaded image                         │
│       - AllocatePool() - Memory allocation                          │
│       - OpenProtocol() - Access device functions                    │
│                                                                     │
│  6. TSL (Transient System Load)                                     │
│     • Bootloader takes control                                      │
│     • Can return to boot menu if needed                             │
│     • Eventually calls ExitBootServices()                           │
│                                                                     │
│  7. RT (Runtime)                                                    │
│     • OS takes over                                                 │
│     • Boot Services terminated                                      │
│     • Runtime Services still available:                             │
│       - GetVariable() / SetVariable() - NVRAM access                │
│       - GetTime() / SetTime() - Hardware clock                      │
│       - ResetSystem() - Reboot/shutdown                             │
│       - UpdateCapsule() - Firmware updates                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
UEFI Services:
  • Boot Services: Used by the bootloader (e.g., to read files). Terminated once the OS starts.
  • Runtime Services: Available even after the OS boots (e.g., setting UEFI variables, NVRAM).
Example: Reading UEFI Variables from Linux:
# List all UEFI variables
efivar -l

# Read boot order
efivar -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-BootOrder

# Read specific boot entry
efivar -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-Boot0001

# Set new boot entry (requires root)
efibootmgr -c -d /dev/sda -p 1 -L "My Linux" -l "\EFI\linux\vmlinuz.efi"

# Change boot order
efibootmgr -o 0003,0001,0002
Secure Boot:
┌─────────────────────────────────────────────────────────────────────┐
│                      UEFI SECURE BOOT                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Goal: Prevent unauthorized code from running during boot           │
│                                                                     │
│  Key Databases (stored in NVRAM):                                   │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │  PK (Platform Key)                                              │ │
│  │  • Single key, owned by OEM                                     │ │
│  │  • Controls access to KEK database                              │ │
│  │                                                                 │ │
│  │  KEK (Key Exchange Keys)                                        │ │
│  │  • List of keys that can update db/dbx                          │ │
│  │  • Typically includes Microsoft KEK, Linux Foundation KEK       │ │
│  │                                                                 │ │
│  │  db (Signature Database - Whitelist)                            │ │
│  │  • Certificates/hashes of allowed bootloaders                   │ │
│  │  • Microsoft Windows cert, shim cert (for Linux), etc.          │ │
│  │                                                                 │ │
│  │  dbx (Forbidden Signatures Database - Blacklist)                │ │
│  │  • Known-bad signatures (revoked certificates)                  │ │
│  │  • Updated via Windows Update or Linux vendor                   │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                     │
│  Boot Flow with Secure Boot:                                        │
│  1. UEFI loads bootloader image                                     │
│  2. Check signature against db (whitelist)                          │
│  3. Check signature against dbx (blacklist)                         │
│  4. If valid: execute bootloader                                    │
│  5. If invalid: refuse to boot, show error                          │
│                                                                     │
│  Linux Secure Boot:                                                 │
│  • Most distros use "shim" bootloader                               │
│  • shim.efi is signed with Microsoft key (in db)                    │
│  • shim contains distro's MOK (Machine Owner Key)                   │
│  • shim verifies and loads grub/kernel signed with MOK              │
│                                                                     │
│  Chain of Trust:                                                    │
│  UEFI → shim.efi (MS-signed) → grubx64.efi (MOK-signed)             │
│       → vmlinuz (MOK-signed) → kernel modules (kernel-signed)       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

3. The Road to 64-bit: Mode Transitions

The most complex part of the boot process is transitioning the CPU from its 16-bit legacy state to full 64-bit Long Mode.

Step 1: Real Mode to Protected Mode (32-bit)

  1. Disable Interrupts: cli (Clear Interrupt Flag) to prevent interrupts before the IDT is ready.
  2. Enable A20 Gate: A legacy hack to allow addressing above 1MB.
  3. Load GDT: The Global Descriptor Table defines how memory segments work.
  4. Set CR0.PE: Set the Protection Enable bit in the CR0 register.
  5. Far Jump: A special jump that flushes the CPU pipeline and loads the new 32-bit Code Segment.

Step 2: Protected Mode to Long Mode (64-bit)

  1. Set CR4.PAE: Physical Address Extension is required for 64-bit mode.
  2. Setup Page Tables: You must enable paging to enter Long Mode. The kernel builds a minimal “Identity Map” (Virtual Address = Physical Address) for the first 1GB of memory.
  3. Set EFER.LME: Set the Long Mode Enable bit in the Extended Feature Enable Register.
  4. Set CR0.PG: Enable Paging. The CPU is now in “Compatibility Mode.”
  5. Final Far Jump: A jump to a 64-bit segment officially enters Long Mode.

4. Kernel Data Structures: GDT and IDT

The kernel must define how it will handle memory and interrupts before it can do anything else.

4.1 The GDT (Global Descriptor Table)

The GDT is an array of 8-byte descriptors. Even in “Flat” 64-bit mode where segmentation is mostly unused, the GDT is required to define:
  • Kernel Code Segment: Rings 0, Executable, Readable.
  • Kernel Data Segment: Rings 0, Readable, Writable.
  • User Code/Data Segments: Rings 3.
  • TSS (Task State Segment): Points to the stack to use when an interrupt occurs.

4.2 The IDT (Interrupt Descriptor Table)

The IDT maps interrupt vectors (0-255) to handler functions.
  • Vectors 0-31: Reserved for CPU exceptions (Divide by Zero, Page Fault, etc.).
  • Vectors 32-255: Available for hardware interrupts and system calls.
  • Gate Types: Interrupt Gates (clear IF), Trap Gates (don’t clear IF).

5. The Kernel Entry Sequence (Linux)

Once the bootloader (GRUB) loads the kernel into memory, it jumps to the kernel’s entry point.

5.1 Decompression (head_64.S)

The Linux kernel is actually a self-extracting executable (vmlinuz).
  1. The early code decompresses the “real” kernel image into a higher memory address.
  2. It sets up a temporary stack.
  3. It jumps to the decompressed kernel’s entry point.

5.2 The start_kernel() Function

This is the “Big Bang” of the operating system. It is architecture-independent C code that:
  1. setup_arch(): Handles CPU-specific initialization.
  2. mm_init(): Initializes the full Buddy Allocator and Slab Allocator.
  3. sched_init(): Sets up the scheduler and the “Idle” task.
  4. rest_init(): Spawns Process 1 (init) and Process 2 (kthreadd).

6. The Handover to User Space

6.1 Initramfs (Initial RAM Filesystem)

The kernel cannot mount the real root disk immediately (it might need a driver for the SSD or a network driver for NFS).
  1. The bootloader loads a small CPIO archive into memory (initrd/initramfs).
  2. The kernel mounts this as /.
  3. It runs /init from the ramfs, which loads necessary drivers and finally “Switches Root” to the real disk.

6.2 PID 1: systemd / SysV init

The final step is to execute the first user-space process.
  • Path: /sbin/init (or whatever init= kernel parameter specifies).
  • The PID 1 Rule: This process is the ancestor of all others. If it ever exits, the kernel triggers a Kernel Panic.

7. Interview Deep Dive: Senior Level

On the original 8086, memory addresses wrapped around at 1MB. When the 80286 arrived, it could address more, but some old programs relied on the wrap-around bug. IBM added a gate on the 20th address line (A20) to manually enable/disable the wrap-around. Even today, x86 CPUs start with A20 disabled for compatibility, and the bootloader must enable it to access more than 1MB of RAM.
The kernel is compiled as a Position Independent Executable (PIE) or uses Relative Addressing. Early boot code uses the Instruction Pointer (RIP) relative instructions to find data. Once the kernel sets up the initial page tables and enables paging, it can “jump” into the virtual address space it has defined.
When the kernel enables paging (setting CR0.PG), the CPU immediately begins interpreting all addresses as virtual. If the kernel didn’t “Identity Map” (Map Virtual 0x1234 to Physical 0x1234) the code it is currently executing, the very next instruction fetch would fail because the MMU wouldn’t know where to find the code, resulting in an immediate crash.

8. Advanced Practice

  1. GDT Inspector: Use gdb and QEMU (-s -S) to inspect the GDT of a booting kernel. Use the command monitor info gdt.
  2. Early Printk: Add a printk("Hello from early boot!"); to the start_kernel function in a Linux source tree and compile it. Observe when the message appears during boot.
  3. UEFI Shell: Boot into a UEFI shell and use the ls and map commands to see how the firmware sees your disks and partitions.

9. Checklist: From Power-On to Login Prompt

Use this quick reference to understand (or debug) the complete boot sequence:

Phase-by-Phase Checklist

┌─────────────────────────────────────────────────────────────────────┐
│         BOOT SEQUENCE CHECKLIST (UEFI + Linux)                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  PHASE 1: HARDWARE (< 1 sec)                                        │
│  □ Power supply stabilizes                                          │
│  □ CPU executes reset vector (0xFFFFFFF0)                           │
│  □ UEFI firmware loads from flash ROM                               │
│                                                                     │
│  PHASE 2: FIRMWARE (2-10 sec)                                       │
│  □ SEC: CPU microcode, Cache-as-RAM                                 │
│  □ PEI: Memory controller init, DRAM available                      │
│  □ DXE: Load drivers (USB, SATA, NVMe, Graphics)                    │
│  □ BDS: Read BootOrder from NVRAM                                   │
│  □ Secure Boot: Validate bootloader signature                       │
│                                                                     │
│  PHASE 3: BOOTLOADER (1-3 sec)                                      │
│  □ GRUB/systemd-boot loads from ESP                                 │
│  □ Display boot menu (if configured)                                │
│  □ Load kernel (vmlinuz) + initramfs into RAM                       │
│  □ Pass kernel command line parameters                              │
│  □ ExitBootServices() - firmware hands off to kernel                │
│                                                                     │
│  PHASE 4: KERNEL EARLY (1-5 sec)                                    │
│  □ Decompress kernel (if compressed)                                │
│  □ Setup GDT, IDT, page tables                                      │
│  □ Transition: Real → Protected → Long Mode                         │
│  □ start_kernel(): Initialize subsystems                            │
│  □ Mount initramfs as /                                             │
│  □ Run /init from initramfs                                         │
│                                                                     │
│  PHASE 5: INITRAMFS (1-5 sec)                                       │
│  □ Load essential drivers (storage, filesystem, LVM, RAID)          │
│  □ Find real root partition                                         │
│  □ Decrypt root (if LUKS encrypted)                                 │
│  □ switch_root to real filesystem                                   │
│                                                                     │
│  PHASE 6: INIT SYSTEM (2-10 sec)                                    │
│  □ systemd (PID 1) starts                                           │
│  □ Mount remaining filesystems (/etc/fstab)                         │
│  □ Start services (parallel by dependencies)                        │
│  □ Reach default.target (multi-user or graphical)                   │
│  □ Spawn getty on TTYs / Display Manager                            │
│                                                                     │
│  ═══════════════════════════════════════════════════════════════    │
│  LOGIN PROMPT APPEARS!                                              │
└─────────────────────────────────────────────────────────────────────┘

Debugging Boot Issues

SymptomPhaseDebug Command
No display, fans spinFirmwareCheck POST codes, clear CMOS
”No bootable device”Firmware/ESPefibootmgr -v, check ESP mount
GRUB rescue promptBootloaderBoot from live USB, reinstall GRUB
Kernel panic earlyKernel earlyAdd debug to kernel cmdline
initramfs drops to shellInitramfsCheck /dev, load missing drivers
systemd failsInit systemsystemctl --failed, journalctl -b

Quick Boot Time Analysis

# Total boot time breakdown
systemd-analyze

# Blame: which services took longest
systemd-analyze blame | head -10

# Critical path visualization
systemd-analyze critical-chain

# Full boot chart (generates SVG)
systemd-analyze plot > boot.svg

Next: Memory Management & Allocators