Skip to main content

System Boot Process

The boot process transforms a powered-off machine into a running operating system. Understanding this sequence is essential for systems engineers debugging boot issues, configuring servers, and understanding system initialization.
Interview Frequency: Medium
Key Topics: BIOS vs UEFI, bootloaders, kernel parameters, init systems
Time to Master: 6-8 hours

Boot Sequence Overview

┌─────────────────────────────────────────────────────────────────┐
│                    COMPLETE BOOT SEQUENCE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 1. POWER ON                                               │   │
│  │    • PSU provides power                                   │   │
│  │    • CPU reset vector → firmware                          │   │
│  └────────────────────────┬─────────────────────────────────┘   │
│                           ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 2. FIRMWARE (BIOS/UEFI)                                   │   │
│  │    • POST (Power-On Self-Test)                            │   │
│  │    • Initialize hardware                                  │   │
│  │    • Find bootable device                                 │   │
│  └────────────────────────┬─────────────────────────────────┘   │
│                           ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 3. BOOTLOADER (GRUB/systemd-boot)                        │   │
│  │    • Load kernel image                                    │   │
│  │    • Load initial ramdisk (initrd/initramfs)             │   │
│  │    • Pass parameters to kernel                            │   │
│  └────────────────────────┬─────────────────────────────────┘   │
│                           ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 4. KERNEL INITIALIZATION                                  │   │
│  │    • Decompress kernel                                    │   │
│  │    • Initialize memory management                         │   │
│  │    • Start scheduler                                      │   │
│  │    • Mount initramfs as temporary root                    │   │
│  └────────────────────────┬─────────────────────────────────┘   │
│                           ▼                                      │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ 5. INIT SYSTEM (systemd/SysV init)                       │   │
│  │    • PID 1 — first userspace process                      │   │
│  │    • Mount real root filesystem                           │   │
│  │    • Start system services                                │   │
│  │    • Reach default target (multi-user/graphical)         │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

CPU Reset Vector and Architecture Fundamentals

Understanding how different CPU architectures initialize is crucial for systems engineering:

The Reset Vector Explained

┌─────────────────────────────────────────────────────────────────┐
│                    CPU RESET VECTOR                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Definition: Fixed memory address where CPU fetches first       │
│  instruction after power-on or reset                            │
│                                                                  │
│  This address is mapped to ROM/flash memory containing:         │
│  • Legacy systems: BIOS code                                    │
│  • Modern systems: UEFI firmware                                │
│  • Embedded systems: Bootloader (U-Boot, etc.)                  │
│                                                                  │
│  Why a fixed address?                                           │
│  • CPU needs predictable starting point                         │
│  • No OS or memory initialization yet                           │
│  • Hardware designers wire ROM to this address                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Architecture Comparison: x86 vs x86-64 vs ARM vs MIPS

┌─────────────────────────────────────────────────────────────────┐
│           ARCHITECTURE COMPARISON TABLE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Feature         x86 (32-bit)  x86-64 (64-bit)  ARM     MIPS    │
│  ──────────────────────────────────────────────────────────────  │
│  Reset Vector    0xFFFFFFF0    0xFFFFFFF0       0x00000000  0xBFC00000 │
│  Initial Mode    Real (16-bit) Long (64-bit)    Supervisor  Kernel │
│  Word Size       32-bit        64-bit            32/64-bit   32/64-bit │
│  Address Space   4 GB          16 EB (48-bit)   4GB/Large   4GB/Large │
│  Registers       8 general     16 general       31 general  32 general │
│  Endianness      Little        Little           Bi-endian   Bi-endian │
│  Pipeline        CISC          CISC             RISC        RISC   │
│  Common Use      Legacy PC     Modern PC/Server Mobile/IoT  Routers │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

x86 (32-bit) Architecture Details

┌─────────────────────────────────────────────────────────────────┐
│                    x86 (IA-32) ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Reset Vector: 0xFFFFFFF0 (16 bytes below 4GB)                 │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│                                                                  │
│  Initial CPU State:                                             │
│  • Real Mode (16-bit addressing)                                │
│  • CS:IP = 0xF000:0xFFF0                                        │
│    Physical = (CS << 4) + IP = 0xFFFF0                         │
│  • But reset vector is at 0xFFFFFFF0!                           │
│    How? Shadow register for CS base                             │
│    CS.base = 0xFFFF0000 (hidden, not 0xF0000)                  │
│    So: 0xFFFF0000 + 0xFFF0 = 0xFFFFFFF0 ✓                      │
│                                                                  │
│  Register Set (32-bit width):                                   │
│  ═════════════════════════════════                              │
│                                                                  │
│  General Purpose (8 registers):                                 │
│  ┌────────────────────────────────────────────────────┐        │
│  │ EAX - Accumulator (arithmetic operations)          │        │
│  │ EBX - Base (base pointer for memory access)        │        │
│  │ ECX - Counter (loop counter, shift/rotate count)   │        │
│  │ EDX - Data (I/O operations, multiplication)        │        │
│  │ ESI - Source Index (string/array operations)       │        │
│  │ EDI - Destination Index (string/array operations)  │        │
│  │ EBP - Base Pointer (stack frame base)              │        │
│  │ ESP - Stack Pointer (top of stack)                 │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                  │
│  Backward Compatibility:                                        │
│  Each 32-bit register can access 16-bit and 8-bit parts:       │
│  ┌─────────────────────────────────────────┐                   │
│  │ EAX (32-bit): │ 0x12345678 │            │                   │
│  │               └─────┬──────┘            │                   │
│  │ AX  (16-bit):      │ 0x5678 │          │                   │
│  │                    ├────┬────┤          │                   │
│  │ AH  (8-bit high):  │0x56│    │          │                   │
│  │ AL  (8-bit low):   │    │0x78│          │                   │
│  └─────────────────────────────────────────┘                   │
│                                                                  │
│  Segment Registers (16-bit):                                    │
│  ┌────────────────────────────────────────────────────┐        │
│  │ CS - Code Segment                                  │        │
│  │ DS - Data Segment                                  │        │
│  │ SS - Stack Segment                                 │        │
│  │ ES - Extra Segment                                 │        │
│  │ FS - General Purpose Segment                       │        │
│  │ GS - General Purpose Segment                       │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                  │
│  Special Purpose:                                               │
│  ┌────────────────────────────────────────────────────┐        │
│  │ EIP - Instruction Pointer (program counter)        │        │
│  │ EFLAGS - Status flags (zero, carry, overflow, etc)│        │
│  └────────────────────────────────────────────────────┘        │
│                                                                  │
│  Control Registers:                                             │
│  ┌────────────────────────────────────────────────────┐        │
│  │ CR0 - Protected mode, paging control               │        │
│  │ CR2 - Page fault linear address                    │        │
│  │ CR3 - Page directory base (PDBR)                   │        │
│  │ CR4 - Architecture extensions (PAE, PSE, etc)      │        │
│  └────────────────────────────────────────────────────┘        │
│                                                                  │
│  Memory Segmentation:                                           │
│  ┌──────────────────────────────────────┐                       │
│  │ Segment Selector                     │                       │
│  │  • 16 bits: index into descriptor    │                       │
│  │  • GDT/LDT tables                    │                       │
│  └──────────────────────────────────────┘                       │
│                                                                  │
│  Address Calculation (Real Mode):                               │
│    Physical = (Segment << 4) + Offset                           │
│    Example: CS:IP = 0xF000:0x0100                              │
│            = 0xF0000 + 0x0100 = 0xF0100                        │
│                                                                  │
│  Addressable Memory:                                            │
│  • Real Mode: 1 MB (20-bit addressing)                          │
│  • Protected Mode: 4 GB (32-bit addressing)                     │
│  • With PAE: 64 GB physical (36-bit physical addresses)         │
│                                                                  │
│  Limitations:                                                   │
│  • 4 GB virtual memory per process                              │
│  • Only 8 general-purpose registers (causes register pressure)  │
│  • Complex segmentation model                                   │
│  • Each register is 32-bit wide                                 │
│                                                                  │
│  Common Instructions (CISC - Complex Instruction Set):          │
│  MOV, PUSH, POP, ADD, SUB, JMP, CALL, INT, REP MOVSB           │
│  (Variable length: 1-15 bytes - complex to decode)              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

x86-64 (AMD64/Intel 64) Architecture Details

┌─────────────────────────────────────────────────────────────────┐
│                    x86-64 ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Reset Vector: 0xFFFFFFF0 (backward compatible!)               │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│                                                                  │
│  Initial CPU State:                                             │
│  • Starts in Real Mode (16-bit) for compatibility!              │
│  • Firmware transitions to Long Mode (64-bit)                   │
│  • CS.base still 0xFFFF0000 at reset                            │
│                                                                  │
│  Key Improvements over x86:                                     │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                   │
│                                                                  │
│  1. Extended Registers:                                         │
│     • 16 general-purpose (RAX, RBX, ..., R15)                  │
│     • 64-bit wide (vs 32-bit)                                   │
│     • R8-R15 are new additions                                  │
│                                                                  │
│  2. Address Space:                                              │
│     • Theoretical: 64-bit = 16 EB                               │
│     • Practical: 48-bit = 256 TB (current CPUs)                │
│     • Canonical addressing: bits 48-63 = bit 47                │
│                                                                  │
│  3. Operating Modes:                                            │
│     ┌──────────────────────────────────────┐                    │
│     │ Real Mode → Protected → Long Mode    │                    │
│     │ (16-bit)    (32-bit)    (64-bit)     │                    │
│     └──────────────────────────────────────┘                    │
│                                                                  │
│  4. Segmentation Simplified:                                    │
│     • Flat memory model preferred                               │
│     • Segment registers mostly ignored                          │
│     • FS/GS still used (thread-local storage)                  │
│                                                                  │
│  5. Calling Conventions Changed:                                │
│     • First 6 args in registers (not stack)                     │
│     • RDI, RSI, RDX, RCX, R8, R9                               │
│     • Red Zone: 128 bytes below RSP                             │
│                                                                  │
│  6. Mandatory Features:                                         │
│     • PAE, NX bit, SSE2 required                                │
│     • No legacy cruft                                           │
│                                                                  │
│  Transition Example (UEFI firmware):                            │
│  1. Start: Real mode at 0xFFFFFFF0                              │
│  2. Enable A20 gate (access >1MB)                               │
│  3. Load GDT, enable protected mode                             │
│  4. Enable PAE paging                                           │
│  5. Set LME bit in EFER MSR                                     │
│  6. Enable paging → Long mode active!                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

How Instructions Actually Work: From Fetch to Execute

Understanding instruction execution is fundamental to system programming:
┌─────────────────────────────────────────────────────────────────┐
│              INSTRUCTION EXECUTION CYCLE                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  The CPU continuously executes this cycle (Fetch-Decode-Execute):│
│                                                                  │
│  ┌─────────┐      ┌─────────┐      ┌─────────┐      ┌────────┐ │
│  │ FETCH   │ ───► │ DECODE  │ ───► │ EXECUTE │ ───► │ WRITE  │ │
│  │         │      │         │      │         │      │  BACK  │ │
│  └─────────┘      └─────────┘      └─────────┘      └────────┘ │
│       │                                                    │     │
│       └────────────────────────────────────────────────────┘     │
│                    Repeat forever                                │
│                                                                  │
│  Step 1: FETCH                                                  │
│  ━━━━━━━━━━━━━━                                                  │
│  • Read instruction from memory at address in PC (Program Counter)│
│  • Instruction goes to Instruction Register (IR)                │
│  • PC = PC + instruction_size                                   │
│                                                                  │
│  Step 2: DECODE                                                 │
│  ━━━━━━━━━━━━━━━                                                 │
│  • Identify what instruction it is (ADD, MOV, JMP, etc.)        │
│  • Extract operands (which registers, memory addresses)         │
│  • For x86: May translate to microcode (µops)                   │
│  • For ARM: Direct decode (simpler)                             │
│                                                                  │
│  Step 3: EXECUTE                                                │
│  ━━━━━━━━━━━━━━━━                                                │
│  • Perform the actual operation                                 │
│  • ALU does arithmetic/logic                                    │
│  • Memory access (load/store)                                   │
│  • Branch evaluation                                            │
│                                                                  │
│  Step 4: WRITE BACK                                             │
│  ━━━━━━━━━━━━━━━━━━━                                             │
│  • Store result back to register or memory                      │
│  • Update flags (zero, carry, overflow, etc.)                   │
│  • Update PC if branch taken                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Assembly Language Crash Course

Let’s learn with practical examples on x86, ARM, and MIPS:
┌─────────────────────────────────────────────────────────────────┐
│           ASSEMBLY LANGUAGE BASICS                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Assembly = Human-readable representation of machine code       │
│  • One assembly instruction ≈ one machine instruction           │
│  • Assembler converts assembly → machine code                   │
│                                                                  │
│  Basic Syntax:                                                  │
│  ┌──────────────────────────────────────────────────────┐       │
│  │  MNEMONIC  DESTINATION, SOURCE                       │       │
│  │  ────────  ──────────────────────                    │       │
│  │  What to   Where result  Where data                  │       │
│  │  do        goes          comes from                  │       │
│  └──────────────────────────────────────────────────────┘       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Example 1: Simple Addition

┌─────────────────────────────────────────────────────────────────┐
│           ADDING TWO NUMBERS IN ASSEMBLY                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Task: Calculate 5 + 3 and store in a register                 │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  x86 Assembly (Intel syntax):                                   │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  MOV  EAX, 5        ; EAX = 5                                   │
│  ADD  EAX, 3        ; EAX = EAX + 3 = 8                         │
│                                                                  │
│  Step-by-step execution:                                        │
│  ┌────────────────────────────────────────────────────┐         │
│  │ Before:  EAX = ???????? (garbage)                  │         │
│  │                                                     │         │
│  │ After MOV EAX, 5:                                  │         │
│  │   EAX = 0x00000005                                 │         │
│  │                                                     │         │
│  │ After ADD EAX, 3:                                  │         │
│  │   EAX = 0x00000008                                 │         │
│  │   Flags: ZF=0 (not zero), CF=0 (no carry)         │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  Machine code (hex):                                            │
│  B8 05 00 00 00    ; MOV EAX, 5  (5 bytes)                     │
│  83 C0 03          ; ADD EAX, 3  (3 bytes)                     │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  ARM Assembly (ARMv7):                                          │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  MOV  R0, #5        ; R0 = 5                                    │
│  ADD  R0, R0, #3    ; R0 = R0 + 3 = 8                           │
│                                                                  │
│  Machine code (hex):                                            │
│  E3 A0 00 05       ; MOV R0, #5  (4 bytes, always!)            │
│  E2 80 00 03       ; ADD R0, R0, #3  (4 bytes, always!)        │
│                                                                  │
│  Note: Fixed 32-bit length makes decode simple!                │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  MIPS Assembly:                                                 │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  LI   $t0, 5        ; $t0 = 5 (pseudo-instruction)             │
│  ADDI $t0, $t0, 3   ; $t0 = $t0 + 3 = 8                         │
│                                                                  │
│  Expands to real instructions:                                  │
│  ORI  $t0, $zero, 5 ; $t0 = 0 | 5 = 5                           │
│  ADDI $t0, $t0, 3   ; $t0 = $t0 + 3                             │
│                                                                  │
│  Machine code (hex):                                            │
│  34 08 00 05       ; ORI $t0, $zero, 5  (4 bytes)              │
│  21 08 00 03       ; ADDI $t0, $t0, 3   (4 bytes)              │
│                                                                  │
│  Note: $zero register always reads as 0!                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Example 2: Loading from Memory

┌─────────────────────────────────────────────────────────────────┐
│           MEMORY ACCESS OPERATIONS                               │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Task: Load a value from memory address 0x1000                 │
│                                                                  │
│  Memory layout:                                                 │
│  ┌──────────┬──────────┐                                        │
│  │ Address  │  Value   │                                        │
│  ├──────────┼──────────┤                                        │
│  │ 0x1000   │   42     │  ← We want this                       │
│  │ 0x1004   │   17     │                                        │
│  │ 0x1008   │   99     │                                        │
│  └──────────┴──────────┘                                        │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  x86 Assembly:                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  MOV  EAX, [0x1000]    ; EAX = *0x1000 (load from memory)      │
│                                                                  │
│  What happens:                                                  │
│  1. CPU sends address 0x1000 to memory                         │
│  2. Memory returns value at that address (42)                  │
│  3. Value stored in EAX                                         │
│  4. Result: EAX = 42                                            │
│                                                                  │
│  Complex addressing (x86 strength!):                            │
│  MOV  EAX, [EBX]           ; EAX = *EBX                         │
│  MOV  EAX, [EBX+4]         ; EAX = *(EBX+4)                     │
│  MOV  EAX, [EBX+ECX*4]     ; EAX = *(EBX + ECX*4) array[ECX]   │
│  MOV  EAX, [EBX+ECX*4+8]   ; Full complexity in ONE instruction│
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  ARM Assembly (Load/Store Architecture):                        │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  LDR  R0, =0x1000      ; R0 = address 0x1000                    │
│  LDR  R1, [R0]         ; R1 = *R0 = 42                          │
│                                                                  │
│  Key difference: Must load address first!                       │
│  • Step 1: Load address into register                           │
│  • Step 2: Load value from that address                         │
│  • More instructions, but simpler hardware                      │
│                                                                  │
│  Addressing modes (simpler than x86):                           │
│  LDR  R1, [R0]         ; R1 = *R0                               │
│  LDR  R1, [R0, #4]     ; R1 = *(R0+4)                           │
│  LDR  R1, [R0, R2]     ; R1 = *(R0+R2)                          │
│  LDR  R1, [R0, R2, LSL #2]  ; R1 = *(R0+R2*4) for arrays       │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  MIPS Assembly (Pure Load/Store):                              │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  LUI  $t0, 0x0000      ; Load upper 16 bits                     │
│  ORI  $t0, $t0, 0x1000 ; OR lower 16 bits → $t0 = 0x1000       │
│  LW   $t1, 0($t0)      ; $t1 = *($t0+0) = 42                    │
│                                                                  │
│  MIPS is the strictest: ONLY load/store touch memory!          │
│  All arithmetic MUST use registers                             │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Example 3: Complete Function

┌─────────────────────────────────────────────────────────────────┐
│           REAL FUNCTION: Add Two Numbers                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  C code:                                                        │
│  ┌────────────────────────────────────┐                         │
│  │ int add(int a, int b) {            │                         │
│  │     return a + b;                  │                         │
│  │ }                                  │                         │
│  └────────────────────────────────────┘                         │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  x86-64 Assembly (System V ABI):                                │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  add:                                                           │
│      MOV  EAX, EDI       ; EAX = a (1st arg in EDI)            │
│      ADD  EAX, ESI       ; EAX = a + b (2nd arg in ESI)        │
│      RET                 ; Return (EAX has result)             │
│                                                                  │
│  Calling convention:                                            │
│  • 1st arg in RDI/EDI                                           │
│  • 2nd arg in RSI/ESI                                           │
│  • Return value in RAX/EAX                                      │
│  • Just 3 instructions!                                         │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  ARM Assembly (AAPCS):                                          │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  add:                                                           │
│      ADD  R0, R0, R1     ; R0 = R0 + R1 (a + b)                │
│      BX   LR             ; Return (branch to link register)    │
│                                                                  │
│  Calling convention:                                            │
│  • 1st arg in R0                                                │
│  • 2nd arg in R1                                                │
│  • Return value in R0                                           │
│  • Even simpler: 2 instructions!                                │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  MIPS Assembly (O32 ABI):                                       │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  add:                                                           │
│      ADD  $v0, $a0, $a1  ; $v0 = $a0 + $a1                      │
│      JR   $ra            ; Jump to return address              │
│      NOP                 ; Branch delay slot (required!)       │
│                                                                  │
│  Calling convention:                                            │
│  • 1st arg in $a0                                               │
│  • 2nd arg in $a1                                               │
│  • Return value in $v0                                          │
│  • NOP needed due to branch delay slot                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Example 4: Array Access

┌─────────────────────────────────────────────────────────────────┐
│           ARRAY INDEXING: array[i]                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  C code:                                                        │
│  ┌────────────────────────────────────┐                         │
│  │ int arr[10];                       │                         │
│  │ int x = arr[5];  // Get element 5 │                         │
│  └────────────────────────────────────┘                         │
│                                                                  │
│  Memory layout:                                                 │
│  ┌──────────┬──────────┬──────────┐                             │
│  │ arr[0]   │ arr[1]   │  ...     │ arr[5] at offset 20       │
│  │ +0 bytes │ +4 bytes │          │ (5 * 4 bytes)              │
│  └──────────┴──────────┴──────────┘                             │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  x86 Assembly (shows CISC power!):                              │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  ; Assume EBX = base address of arr, ECX = 5 (index)           │
│  MOV  EAX, [EBX + ECX*4]   ; EAX = arr[ECX] IN ONE INSTRUCTION!│
│                                                                  │
│  Address calculation: EBX + (ECX * 4)                           │
│  • EBX: base address                                            │
│  • ECX: index (5)                                               │
│  • *4: size of int (4 bytes)                                    │
│  • Result: EBX + 20 → address of arr[5]                         │
│  • Load value at that address into EAX                          │
│                                                                  │
│  This is why x86 is powerful for dense code!                    │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  ARM Assembly (more explicit):                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  ; R0 = base address of arr, R1 = 5 (index)                    │
│  LSL  R2, R1, #2         ; R2 = R1 << 2 = R1 * 4 = 20          │
│  ADD  R2, R0, R2         ; R2 = R0 + R2 (address of arr[5])    │
│  LDR  R3, [R2]           ; R3 = *R2 (load value)               │
│                                                                  │
│  Or using indexed addressing:                                   │
│  LDR  R3, [R0, R1, LSL #2]  ; R3 = *(R0 + R1*4)                │
│                                                                  │
│  More instructions, but each is simple and fast!                │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  MIPS Assembly (most explicit):                                 │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  ; $t0 = base address, $t1 = 5 (index)                          │
│  SLL  $t2, $t1, 2        ; $t2 = $t1 << 2 = $t1 * 4            │
│  ADD  $t2, $t0, $t2      ; $t2 = $t0 + $t2 (address)           │
│  LW   $t3, 0($t2)        ; $t3 = *($t2 + 0)                     │
│                                                                  │
│  Most steps, but hardware is simplest!                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Deep Dive: Register Operations

┌─────────────────────────────────────────────────────────────────┐
│           HOW REGISTERS ACTUALLY WORK                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Registers = Ultra-fast storage inside CPU                      │
│                                                                  │
│  Physical Implementation:                                       │
│  ┌────────────────────────────────────────────────────┐         │
│  │  Register File (Circuit of Flip-Flops)            │         │
│  │                                                    │         │
│  │  Each register = 32 or 64 flip-flops              │         │
│  │  (one flip-flop per bit)                          │         │
│  │                                                    │         │
│  │  EAX Register (32-bit):                           │         │
│  │  ┌──┬──┬──┬──┬──┬──┬──┬──┬   ┬──┬──┬──┬──┐       │         │
│  │  │31│30│29│28│27│26│25│24│...│03│02│01│00│       │         │
│  │  └──┴──┴──┴──┴──┴──┴──┴──┴   ┴──┴──┴──┴──┘       │         │
│  │   Each box = 1 flip-flop (holds 1 bit)           │         │
│  │                                                    │         │
│  │  Access time: ~0.3 nanoseconds                    │         │
│  │  Compare: RAM = 50-100 nanoseconds                │         │
│  │  Registers are 200x faster!                       │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  Register Transfer:                                             │
│  ═══════════════════                                            │
│                                                                  │
│  MOV  EBX, EAX    ; Copy value from EAX to EBX                 │
│                                                                  │
│  Internal steps (single CPU cycle):                             │
│  ┌────────────────────────────────────────────────────┐         │
│  │ 1. Read EAX register                               │         │
│  │    • Enable read line for EAX                      │         │
│  │    • Value appears on internal bus                 │         │
│  │                                                    │         │
│  │ 2. Write to EBX register                           │         │
│  │    • Enable write line for EBX                     │         │
│  │    • Value from bus stored in EBX flip-flops       │         │
│  │                                                    │         │
│  │ 3. Complete in ONE clock cycle!                    │         │
│  │    At 3 GHz: Takes 0.33 nanoseconds                │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  ADD Instruction (register to register):                        │
│  ═══════════════════════════════════════                        │
│                                                                  │
│  ADD  EAX, EBX    ; EAX = EAX + EBX                             │
│                                                                  │
│  Internal execution:                                            │
│  ┌────────────────────────────────────────────────────┐         │
│  │ 1. Read both operands:                             │         │
│  │    • EAX → Operand A bus                           │         │
│  │    • EBX → Operand B bus                           │         │
│  │                                                    │         │
│  │ 2. ALU (Arithmetic Logic Unit) performs addition: │         │
│  │    ┌─────────────────┐                            │         │
│  │    │      ALU        │                            │         │
│  │    │  ┌──────────┐   │                            │         │
│  │    │  │  Adder   │   │  Full adder circuit       │         │
│  │    │  │ Circuit  │   │  (ripple carry or         │         │
│  │    │  └──────────┘   │   carry lookahead)        │         │
│  │    └─────────────────┘                            │         │
│  │                                                    │         │
│  │ 3. Result written back to EAX:                    │         │
│  │    • ALU output → Result bus                      │         │
│  │    • Result bus → EAX register                    │         │
│  │                                                    │         │
│  │ 4. Update flags (EFLAGS register):                │         │
│  │    • ZF (Zero Flag): Set if result = 0            │         │
│  │    • CF (Carry Flag): Set if carry out           │         │
│  │    • SF (Sign Flag): Set if result negative      │         │
│  │    • OF (Overflow Flag): Set if overflow         │         │
│  │                                                    │         │
│  │ 5. All in 1-2 clock cycles!                       │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  Memory Load (more complex):                                    │
│  ═══════════════════════════                                    │
│                                                                  │
│  MOV  EAX, [EBX]  ; Load from memory address in EBX            │
│                                                                  │
│  Internal execution (multiple cycles):                          │
│  ┌────────────────────────────────────────────────────┐         │
│  │ 1. Calculate address (cycle 1):                   │         │
│  │    • Read EBX → Address bus                        │         │
│  │                                                    │         │
│  │ 2. Send to memory controller (cycle 2):           │         │
│  │    • Address bus → Memory controller               │         │
│  │    • Assert READ signal                            │         │
│  │                                                    │         │
│  │ 3. Check cache (L1 → L2 → L3):                    │         │
│  │    • L1 hit: +4 cycles                             │         │
│  │    • L2 hit: +12 cycles                            │         │
│  │    • L3 hit: +40 cycles                            │         │
│  │    • RAM miss: +200+ cycles!                       │         │
│  │                                                    │         │
│  │ 4. Data arrives on data bus:                      │         │
│  │    • Data bus → EAX register                       │         │
│  │                                                    │         │
│  │ Total: 5-200+ cycles depending on cache           │         │
│  │ This is why registers are so important!           │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why This Matters: Performance Impact

┌─────────────────────────────────────────────────────────────────┐
│           PERFORMANCE COMPARISON                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Operation               Latency        Example                 │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│  Register→Register       1 cycle        ADD EAX, EBX            │
│  L1 Cache hit            ~4 cycles      MOV EAX, [0x1000]       │
│  L2 Cache hit            ~12 cycles     (if not in L1)          │
│  L3 Cache hit            ~40 cycles     (if not in L2)          │
│  Main RAM                ~200 cycles    (cache miss)            │
│  NVMe SSD                ~25,000 cycles (page fault)            │
│  HDD                     ~10,000,000    (disk seek)             │
│                                                                  │
│  At 3 GHz CPU:                                                  │
│  • 1 cycle = 0.33 nanoseconds                                   │
│  • RAM access = 67 nanoseconds                                  │
│  • Register access = 0.33 nanoseconds                           │
│  • Register is 200x faster!                                     │
│                                                                  │
│  Real-World Example:                                            │
│  ═══════════════════════                                        │
│                                                                  │
│  // Bad: Memory access in loop                                  │
│  for (int i = 0; i < 1000000; i++) {                            │
│      sum = sum + array[i];  // sum in memory! (slow)            │
│  }                                                               │
│  Time: ~200 cycles × 1M = 200M cycles = 67ms                    │
│                                                                  │
│  // Good: Use register                                          │
│  int temp_sum = sum;  // Load once                              │
│  for (int i = 0; i < 1000000; i++) {                            │
│      temp_sum = temp_sum + array[i];  // temp_sum in register! │
│  }                                                               │
│  sum = temp_sum;  // Store once                                 │
│  Time: ~1 cycle × 1M = 1M cycles = 0.33ms                       │
│  → 200x faster!                                                  │
│                                                                  │
│  Assembly (x86):                                                │
│  ───────────────                                                │
│  MOV  EAX, [sum]       ; Load sum once                          │
│  XOR  ECX, ECX         ; i = 0                                  │
│  loop_start:                                                    │
│      ADD  EAX, [array+ECX*4]  ; temp_sum += array[i]           │
│      INC  ECX                  ; i++                            │
│      CMP  ECX, 1000000         ; i < 1000000?                   │
│      JL   loop_start           ; if yes, continue               │
│  MOV  [sum], EAX       ; Store sum once                         │
│                                                                  │
│  Key: EAX stays in register the entire loop!                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RISC vs CISC: Philosophy and Trade-offs

┌─────────────────────────────────────────────────────────────────┐
│           RISC vs CISC COMPARISON                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  CISC (Complex Instruction Set Computing)                       │
│  ═══════════════════════════════════════════                    │
│  Examples: x86, x86-64                                          │
│                                                                  │
│  Philosophy:                                                    │
│  • "Do more with each instruction"                              │
│  • Complex instructions that do multiple operations             │
│  • Variable-length instructions (1-15 bytes on x86)             │
│  • Instructions can directly access memory                      │
│  • Many addressing modes                                        │
│                                                                  │
│  Advantages:                                                    │
│  ✓ Dense code (fewer instructions)                              │
│  ✓ Powerful single instructions                                 │
│  ✓ Good for compiler optimization                               │
│  ✓ Backward compatibility (decades of code still runs!)         │
│                                                                  │
│  Disadvantages:                                                 │
│  ✗ Complex hardware (more transistors for decode logic)         │
│  ✗ Variable instruction length complicates pipeline             │
│  ✗ Higher power consumption                                     │
│  ✗ Requires microcode for complex instructions                  │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  RISC (Reduced Instruction Set Computing)                       │
│  ═══════════════════════════════════════════                    │
│  Examples: ARM, MIPS, RISC-V                                    │
│                                                                  │
│  Philosophy:                                                    │
│  • "Do simple things fast"                                      │
│  • Simple instructions, one per cycle                           │
│  • Fixed-length instructions (always 32-bit)                    │
│  • Load/store architecture (only LD/ST touch memory)           │
│  • Few addressing modes                                         │
│                                                                  │
│  Advantages:                                                    │
│  ✓ Simple hardware (fewer transistors, lower cost)              │
│  ✓ Fixed length simplifies pipeline                             │
│  ✓ Lower power consumption                                      │
│  ✓ Easier to optimize at hardware level                         │
│  ✓ Better for battery-powered devices                           │
│                                                                  │
│  Disadvantages:                                                 │
│  ✗ More instructions for same task                              │
│  ✗ Code bloat (larger binaries)                                 │
│  ✗ More memory bandwidth for instruction fetch                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Same Task, Different Approaches

┌─────────────────────────────────────────────────────────────────┐
│           EXAMPLE: array[i] = array[i] + 5                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  x86 (CISC):                                                    │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  ADD  DWORD PTR [EBX+ECX*4], 5   ; ONE instruction!            │
│                                                                  │
│  What this does internally:                                     │
│  1. Calculate address: EBX + ECX*4                              │
│  2. Load value from memory                                      │
│  3. Add 5                                                       │
│  4. Store back to memory                                        │
│  All in ONE instruction! (but takes ~7-10 cycles)               │
│                                                                  │
│  Machine code: 83 04 8B 05  (4 bytes)                           │
│  Execution: ~7-10 cycles (cache hit)                            │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  ARM (RISC):                                                    │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  LSL  R2, R1, #2         ; R2 = R1 * 4                          │
│  ADD  R2, R0, R2         ; R2 = R0 + R2 (address)              │
│  LDR  R3, [R2]           ; R3 = *R2 (load)                      │
│  ADD  R3, R3, #5         ; R3 = R3 + 5                          │
│  STR  R3, [R2]           ; *R2 = R3 (store)                     │
│                                                                  │
│  5 instructions, but each is simple!                            │
│  Machine code: 20 bytes (5 × 4 bytes)                           │
│  Execution: ~5 cycles (1 cycle each, except memory ops)         │
│                                                                  │
│  Note: ARM can do it in 3 with indexed addressing:              │
│  LDR  R3, [R0, R1, LSL #2]  ; R3 = *(R0 + R1*4)                │
│  ADD  R3, R3, #5             ; R3 = R3 + 5                      │
│  STR  R3, [R0, R1, LSL #2]  ; *(R0 + R1*4) = R3                │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  MIPS (Pure RISC):                                              │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  SLL  $t2, $t1, 2        ; $t2 = $t1 << 2 (multiply by 4)      │
│  ADD  $t2, $t0, $t2      ; $t2 = $t0 + $t2 (address)           │
│  LW   $t3, 0($t2)        ; $t3 = *($t2 + 0) (load)             │
│  ADDI $t3, $t3, 5        ; $t3 = $t3 + 5                        │
│  SW   $t3, 0($t2)        ; *($t2 + 0) = $t3 (store)            │
│                                                                  │
│  5 instructions, strictest load/store                           │
│  Machine code: 20 bytes (5 × 4 bytes)                           │
│  Execution: ~5 cycles                                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The Microcode Secret: How x86 is Actually RISC Inside

┌─────────────────────────────────────────────────────────────────┐
│           x86 MICROCODE (µops)                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Modern x86 CPUs are RISC internally!                           │
│                                                                  │
│  Complex x86 instructions are broken into micro-operations      │
│  (µops) that look a lot like RISC instructions.                 │
│                                                                  │
│  ┌──────────────────────────────────────────────────────┐       │
│  │         x86 Instruction Processing                   │       │
│  │                                                      │       │
│  │  ┌────────────┐                                     │       │
│  │  │ x86 Inst   │                                     │       │
│  │  │ (CISC)     │                                     │       │
│  │  └──────┬─────┘                                     │       │
│  │         │                                           │       │
│  │         ▼                                           │       │
│  │  ┌────────────┐    Decode Unit                     │       │
│  │  │  Decoder   │    (Most complex part)             │       │
│  │  └──────┬─────┘                                     │       │
│  │         │                                           │       │
│  │         ▼                                           │       │
│  │  ┌────────────┐                                     │       │
│  │  │ µops       │    Simple RISC-like operations     │       │
│  │  │ (1-4 µops) │                                     │       │
│  │  └──────┬─────┘                                     │       │
│  │         │                                           │       │
│  │         ▼                                           │       │
│  │  ┌────────────┐    Execute on RISC-like pipeline   │       │
│  │  │  Execute   │                                     │       │
│  │  │  Engine    │                                     │       │
│  │  └────────────┘                                     │       │
│  └──────────────────────────────────────────────────────┘       │
│                                                                  │
│  Example: ADD [EBX+ESI*4+0x10], 42                              │
│                                                                  │
│  This single instruction becomes 4 µops:                        │
│  ┌────────────────────────────────────────────────────┐         │
│  │ µop1: LOAD_TEMP = [EBX+ESI*4+0x10]  (address calc)│         │
│  │ µop2: TEMP2 = LOAD_TEMP + 42         (add)         │         │
│  │ µop3: [EBX+ESI*4+0x10] = TEMP2       (store)       │         │
│  │ µop4: Update flags                                │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  Benefits of Microcode:                                         │
│  ━━━━━━━━━━━━━━━━━━━━━━                                         │
│  ✓ Maintain backward compatibility (x86 since 1978!)            │
│  ✓ Add new features without changing ISA                        │
│  ✓ Can be updated (microcode updates via BIOS)                  │
│  ✓ Execute efficiently on modern pipeline                       │
│                                                                  │
│  Cost of Microcode:                                             │
│  ━━━━━━━━━━━━━━━━━━━                                            │
│  ✗ Decode stage is complex and power-hungry                     │
│  ✗ Takes ~30% of die area and power                             │
│  ✗ Variable-length instructions still complicate fetch          │
│  ✗ Multiple µops can clog pipeline                              │
│                                                                  │
│  Compare to ARM:                                                │
│  • Fixed 32-bit instructions decode directly                    │
│  • No microcode layer needed                                    │
│  • Decode stage: ~5% of die area/power                          │
│  • One instruction = one operation (usually)                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Real-World Impact: Why It Matters

┌─────────────────────────────────────────────────────────────────┐
│           PERFORMANCE & POWER ANALYSIS                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Pipeline Depth Comparison:                                     │
│  ═══════════════════════════                                    │
│                                                                  │
│  x86-64 (Intel Core):                                           │
│  ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐ │
│  │Fch │Dec │Dec │Ren │Sch │Ex │Ex │Ex │WB │Ret │    │    │    │ │
│  └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘ │
│  14-19 stages (complex decode!)                                 │
│                                                                  │
│  ARM Cortex-A76:                                                │
│  ┌────┬────┬────┬────┬────┬────┬────┬────┐                     │
│  │Fch │Dec │Ren │Iss │Ex │Ex │WB │Ret │                        │
│  └────┴────┴────┴────┴────┴────┴────┴────┘                     │
│  8-12 stages (simpler decode!)                                  │
│                                                                  │
│  Impact: Shorter pipeline = faster branch recovery              │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Power Consumption Breakdown:                                   │
│  ═══════════════════════════════                                │
│                                                                  │
│  Intel x86 CPU (typical desktop):                               │
│  ┌────────────────────────────────────────┐                     │
│  │ Component          % Power             │                     │
│  ├────────────────────────────────────────┤                     │
│  │ Decode/µop cache   30%  ◄── CISC cost  │                     │
│  │ Execution units    40%                 │                     │
│  │ L1/L2 Cache        15%                 │                     │
│  │ Memory controller   8%                 │                     │
│  │ Other               7%                 │                     │
│  └────────────────────────────────────────┘                     │
│  Total: 65W TDP (desktop i7)                                    │
│                                                                  │
│  ARM CPU (typical mobile):                                      │
│  ┌────────────────────────────────────────┐                     │
│  │ Component          % Power             │                     │
│  ├────────────────────────────────────────┤                     │
│  │ Decode              5%  ◄── RISC benefit│                     │
│  │ Execution units    50%                 │                     │
│  │ L1/L2 Cache        20%                 │                     │
│  │ Memory controller  15%                 │                     │
│  │ Other              10%                 │                     │
│  └────────────────────────────────────────┘                     │
│  Total: 3-5W TDP (mobile A-series)                              │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Real Chips Comparison:                                         │
│  ═══════════════════════                                        │
│                                                                  │
│  Apple M1 (ARM):                                                │
│  • 8 cores @ 3.2 GHz                                            │
│  • 20 billion transistors                                       │
│  • 39W TDP (full chip including GPU)                            │
│  • Performance: ~1500 Cinebench R23 per core                    │
│  • Efficiency: 38.5 points/watt                                 │
│                                                                  │
│  Intel Core i7-11700K (x86):                                    │
│  • 8 cores @ 3.6 GHz (5.0 boost)                                │
│  • 10 billion transistors                                       │
│  • 125W TDP (CPU only!)                                         │
│  • Performance: ~1600 Cinebench R23 per core                    │
│  • Efficiency: 12.8 points/watt                                 │
│                                                                  │
│  → ARM is 3x more power efficient!                              │
│  → x86 still leads in peak performance (for now)                │
│                                                                  │
│  Why ARM Dominates Mobile:                                      │
│  • Lower power = longer battery life                            │
│  • Simpler decode = less heat                                   │
│  • Fixed instruction length = better power management           │
│                                                                  │
│  Why x86 Still Leads Desktop/Server:                            │
│  • Decades of optimized software                                │
│  • Backward compatibility is critical                           │
│  • Complex instructions help specific workloads                 │
│  • Wall power available (not battery constrained)               │
│                                                                  │
│  The Future: AWS Graviton (ARM servers!)                        │
│  • 40% better price/performance than x86                        │
│  • 60% lower energy consumption                                 │
│  • ARM is eating into datacenter too!                           │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

RIP-Relative Addressing: Position-Independent Code

┌─────────────────────────────────────────────────────────────────┐
│           RIP-RELATIVE ADDRESSING (x86-64)                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  RIP = Instruction Pointer (Program Counter in x86-64)          │
│                                                                  │
│  Traditional x86 (32-bit):                                      │
│  ═══════════════════════════                                    │
│  MOV  EAX, [0x12345678]   ; Absolute address                    │
│                                                                  │
│  Problem: Code is NOT relocatable!                              │
│  • If loaded at different address, breaks!                      │
│  • Can't use ASLR (security feature)                            │
│  • Shared libraries need fixups                                 │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Modern x86-64 with RIP-relative:                               │
│  ═══════════════════════════════════════                        │
│  MOV  EAX, [RIP+0x100]    ; Relative to current instruction    │
│                                                                  │
│  Benefits:                                                      │
│  ✓ Code is position-independent!                                │
│  ✓ Can be loaded anywhere in memory                             │
│  ✓ Enables ASLR for security                                    │
│  ✓ Shared libraries work without modification                   │
│  ✓ Smaller machine code (offset vs full 64-bit address)         │
│                                                                  │
│  Example:                                                       │
│  ┌────────────────────────────────────────────────────┐         │
│  │ Address    Instruction                             │         │
│  ├────────────────────────────────────────────────────┤         │
│  │ 0x1000:    MOV  EAX, [RIP+0x100]                   │         │
│  │            ; RIP points to NEXT instruction        │         │
│  │            ; RIP = 0x1007 (after this instruction) │         │
│  │            ; Effective address = 0x1007 + 0x100    │         │
│  │            ;                   = 0x1107            │         │
│  │ 0x1107:    <data>                                  │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Real-World: ASLR (Address Space Layout Randomization)          │
│  ═══════════════════════════════════════════════════            │
│                                                                  │
│  Without ASLR (vulnerable):                                     │
│  ┌────────────────────────────────────────┐                     │
│  │ Run 1: Code loaded at 0x400000         │                     │
│  │ Run 2: Code loaded at 0x400000  (same!)│                     │
│  │ Run 3: Code loaded at 0x400000  (same!)│                     │
│  │                                        │                     │
│  │ Attacker knows exact addresses!        │                     │
│  │ → Easy to exploit buffer overflows     │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  With ASLR (secure):                                            │
│  ┌────────────────────────────────────────┐                     │
│  │ Run 1: Code loaded at 0x7f2a31000000   │                     │
│  │ Run 2: Code loaded at 0x7f8b94000000   │                     │
│  │ Run 3: Code loaded at 0x7fc61a000000   │                     │
│  │                                        │                     │
│  │ Random every time!                     │                     │
│  │ → Attacker can't predict addresses     │                     │
│  │ → Much harder to exploit               │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  How RIP-relative enables this:                                 │
│                                                                  │
│  Code:                                                          │
│  ┌────────────────────────────────────────┐                     │
│  │ get_value:                             │                     │
│  │     MOV  EAX, [RIP+data]  ; Relative!  │                     │
│  │     RET                                │                     │
│  │                                        │                     │
│  │ data:                                  │                     │
│  │     .long 42                           │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  Loaded at 0x400000:                                            │
│  • get_value at 0x400000                                        │
│  • data at 0x400010                                             │
│  • RIP+offset still works!                                      │
│                                                                  │
│  Loaded at 0x7f2a31000000 (ASLR):                               │
│  • get_value at 0x7f2a31000000                                  │
│  • data at 0x7f2a31000010                                       │
│  • RIP+offset STILL works!                                      │
│                                                                  │
│  → Same code, any address! This is the power of PIC.            │
│                                                                  │
│  ARM note: ARM always had PC-relative addressing!               │
│  x86-64 finally caught up in 2003.                              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Modern x86 Optimizations: Saving CPUs from Wasted Time

Intel and AMD invest billions to make CISC competitive with RISC:
┌─────────────────────────────────────────────────────────────────┐
│           x86 MICROARCHITECTURE OPTIMIZATIONS                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Problem: Decode is slow and power-hungry                       │
│  Solution: Cache the decoded results!                           │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  1. Micro-op Cache (µop Cache / Decoded Stream Buffer)          │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Traditional path (slow):                                       │
│  ┌────────┐   ┌────────┐   ┌────────┐   ┌─────────┐            │
│  │ Fetch  │──►│ Decode │──►│ µops   │──►│ Execute │            │
│  │  x86   │   │ (slow!)│   │        │   │         │            │
│  └────────┘   └────────┘   └────────┘   └─────────┘            │
│                                                                  │
│  With µop cache (fast!):                                        │
│  ┌────────┐                                                     │
│  │ Fetch  │──►X (skip decode!)                                 │
│  │  x86   │                                                     │
│  └────────┘   ┌────────────┐   ┌────────┐   ┌─────────┐        │
│               │ µop Cache  │──►│ µops   │──►│ Execute │        │
│               │ (hit!)     │   │        │   │         │        │
│               └────────────┘   └────────┘   └─────────┘        │
│                                                                  │
│  µop Cache Specs (Intel):                                       │
│  • Size: 1536-2304 µops                                         │
│  • Organization: 32 sets × 6-8 ways                             │
│  • Can deliver 6 µops per cycle!                                │
│  • Hit rate: 80-95% in typical workloads                        │
│                                                                  │
│  Benefits:                                                      │
│  ✓ Skip decode entirely on cache hit                            │
│  ✓ Save ~30% power on frontend                                  │
│  ✓ Higher throughput (6 µops vs 4-5 from decoders)              │
│  ✓ Smaller loops fit entirely in µop cache                      │
│                                                                  │
│  Example: Tight loop                                            │
│  ┌────────────────────────────────────┐                         │
│  │ loop:                              │                         │
│  │   ADD  EAX, [RBX+RCX*4]            │                         │
│  │   INC  RCX                         │                         │
│  │   CMP  RCX, 1000                   │                         │
│  │   JL   loop                        │                         │
│  └────────────────────────────────────┘                         │
│                                                                  │
│  First iteration: Decode (slow)                                 │
│  All subsequent: µop cache (fast!)                              │
│  Result: Loop runs at near-RISC speeds!                         │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  2. Instruction Queue & Parallel Decode                         │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Problem: Variable-length x86 makes fetch hard                  │
│  Solution: Aggressive prefetch + multiple decoders              │
│                                                                  │
│  Fetch Stage:                                                   │
│  ┌──────────────────────────────────────────────┐               │
│  │ Instruction Fetch Queue: 16-20 bytes         │               │
│  │ ┌──────────────────────────────────────────┐ │               │
│  │ │ [var len] [var len] [var len] [var len] │ │               │
│  │ └──────────────────────────────────────────┘ │               │
│  └──────────────────────────────────────────────┘               │
│          │      │       │       │                                │
│          ▼      ▼       ▼       ▼                                │
│  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐                   │
│  │Decoder │ │Decoder │ │Decoder │ │Decoder │                   │
│  │  #1    │ │  #2    │ │  #3    │ │  #4    │                   │
│  │(complex│ │(complex│ │(simple)│ │(simple)│                   │
│  │4 µops) │ │4 µops) │ │1 µop)  │ │1 µop)  │                   │
│  └────────┘ └────────┘ └────────┘ └────────┘                   │
│          │      │       │       │                                │
│          └──────┴───────┴───────┘                                │
│                  │                                               │
│           Up to 4-5 µops/cycle                                  │
│                                                                  │
│  Predecode bits help:                                           │
│  • Mark instruction boundaries                                  │
│  • Identify instruction length                                  │
│  • Cache this info in L1-I cache                                │
│  • Makes parallel decode possible!                              │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│  3. Branch Prediction: Massive Investment                       │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Problem: x86 pipelines are 14-19 stages deep                   │
│            Wrong branch = flush pipeline (huge cost!)           │
│                                                                  │
│  Solution: Predict with 99%+ accuracy                           │
│                                                                  │
│  Branch Prediction Budget (Intel Skylake):                      │
│  ┌──────────────────────────────────────────────┐               │
│  │ Component              Transistors            │               │
│  ├──────────────────────────────────────────────┤               │
│  │ Branch Target Buffer   ~5 million transistors│               │
│  │ Pattern History Table  ~3 million transistors│               │
│  │ Return Stack Buffer    ~1 million transistors│               │
│  │ TOTAL: ~9 million just for branch prediction!│               │
│  └──────────────────────────────────────────────┘               │
│                                                                  │
│  Modern techniques:                                             │
│  ┌──────────────────────────────────────────────┐               │
│  │ • Two-level adaptive predictors               │               │
│  │ • Perceptron-based prediction (neural-like!)  │               │
│  │ • TAGE (TAgged GEometric) predictors          │               │
│  │ • Loop stream detector                        │               │
│  │ • Return address predictor                    │               │
│  └──────────────────────────────────────────────┘               │
│                                                                  │
│  Accuracy:                                                      │
│  • Simple if: ~95% correct                                      │
│  • Complex patterns: ~90% correct                               │
│  • Returns: ~99% correct                                        │
│  • Overall: ~97% in real workloads                              │
│                                                                  │
│  Cost of misprediction:                                         │
│  • 15-20 cycles wasted (entire pipeline flushed!)               │
│  • This is why branch predictors are SO important               │
│  • ARM needs less aggressive prediction (8-12 stage pipeline)   │
│                                                                  │
│  Example impact:                                                │
│  ┌────────────────────────────────────┐                         │
│  │ if (condition) {                   │                         │
│  │   // 5 cycles of work              │                         │
│  │ }                                  │                         │
│  │                                    │                         │
│  │ Correct prediction: 5 cycles       │                         │
│  │ Wrong prediction: 20 cycles!       │                         │
│  │ → 4x slower on misprediction       │                         │
│  └────────────────────────────────────┘                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

The ARM Migration: Why the Industry is Shifting

┌─────────────────────────────────────────────────────────────────┐
│           THE GREAT ARM MIGRATION                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Market Share Shift (2020-2025):                                │
│  ═══════════════════════════════                                │
│                                                                  │
│  Mobile Devices:                                                │
│  • 2020: ARM 90%, x86 negligible                                │
│  • 2025: ARM 95%, x86 effectively dead                          │
│  • Winner: ARM (total dominance)                                │
│                                                                  │
│  Laptops/Desktops:                                              │
│  • 2020: x86 90%, ARM 5%                                        │
│  • 2025: x86 70%, ARM 25% (Apple M-series!)                     │
│  • Trend: ARM gaining rapidly                                   │
│                                                                  │
│  Servers/Cloud:                                                 │
│  • 2020: x86 98%, ARM 2%                                        │
│  • 2025: x86 85%, ARM 15%                                       │
│  • Trend: ARM growing fast (AWS Graviton, Ampere Altra)        │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Real-World Adoption: Who's Using ARM Servers?                  │
│  ═══════════════════════════════════════════════════            │
│                                                                  │
│  AWS Graviton (ARM-based instances):                            │
│  ┌────────────────────────────────────────────┐                 │
│  │ • Graviton3: 64 cores, 3 GHz               │                 │
│  │ • 40% better price/performance vs x86      │                 │
│  │ • 60% lower energy consumption             │                 │
│  │ • Used by: Netflix, Snap, Formula 1        │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  Companies that migrated:                                       │
│  ┌────────────────────────────────────────────────────┐         │
│  │ Netflix:                                           │         │
│  │ • Moved encoding workloads to Graviton            │         │
│  │ • 30-40% cost savings                              │         │
│  │ • Same or better performance                       │         │
│  │ • Lower latency (simpler pipeline)                 │         │
│  │                                                    │         │
│  │ Snap (Snapchat):                                   │         │
│  │ • Migrated compute-intensive workloads             │         │
│  │ • 30% better price/performance                     │         │
│  │ • Reduced data center power by 40%                │         │
│  │                                                    │         │
│  │ SmugMug (photo hosting):                           │         │
│  │ • Image processing on Graviton                     │         │
│  │ • 35% cost reduction                               │         │
│  │ • 20% performance improvement                      │         │
│  │                                                    │         │
│  │ Honeycomb.io (observability):                      │         │
│  │ • Database workloads to Graviton                   │         │
│  │ • 40% cost savings                                 │         │
│  │ • Better query performance                         │         │
│  └────────────────────────────────────────────────────┘         │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  The Shift Explained:                                           │
│  ═══════════════════════                                        │
│                                                                  │
│  Mobile → Already ARM (since forever)                           │
│  ┌────────────────────────────────────────────┐                 │
│  │ • Qualcomm Snapdragon                      │                 │
│  │ • Apple A-series                           │                 │
│  │ • Samsung Exynos                           │                 │
│  │ • MediaTek Dimensity                       │                 │
│  │ Why: Power efficiency critical for battery │                 │
│  │ x86 never had a chance                     │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  Laptops → Apple M-series proved ARM viable                     │
│  ┌────────────────────────────────────────────┐                 │
│  │ Apple M1/M2/M3:                            │                 │
│  │ • Faster than Intel/AMD in many workloads  │                 │
│  │ • 2-3x battery life                        │                 │
│  │ • Fanless operation (MacBook Air)          │                 │
│  │ • Runs x86 apps via Rosetta 2 translation! │                 │
│  │                                            │                 │
│  │ Others following:                          │                 │
│  │ • Qualcomm Snapdragon X Elite (Windows)    │                 │
│  │ • MediaTek for Chromebooks                 │                 │
│  │ • Microsoft Surface (ARM versions)         │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  Servers → Economics driving migration                          │
│  ┌────────────────────────────────────────────┐                 │
│  │ Cost breakdown (typical workload):         │                 │
│  │                                            │                 │
│  │ x86 server (Intel Xeon):                   │                 │
│  │ • Hardware: $5,000                         │                 │
│  │ • Power (3 years): $2,000                  │                 │
│  │ • Cooling (3 years): $1,500                │                 │
│  │ • Total: $8,500                            │                 │
│  │                                            │                 │
│  │ ARM server (Graviton):                     │                 │
│  │ • Hardware: $3,500                         │                 │
│  │ • Power (3 years): $800                    │                 │
│  │ • Cooling (3 years): $600                  │                 │
│  │ • Total: $4,900                            │                 │
│  │                                            │                 │
│  │ Savings: $3,600 per server!                │                 │
│  │ × 10,000 servers = $36 million saved!      │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Where x86 Still Dominates (and Fighting Back!):                │
│  ═══════════════════════════════════════════════                │
│                                                                  │
│  ✓ Desktop/Gaming PCs (90%+ market share)                       │
│    └─► Game engines optimized for x86                           │
│    └─► DirectX/Vulkan drivers mature on x86                     │
│    └─► Ecosystem lock-in (decades of software)                  │
│                                                                  │
│  ✓ Workstations & Content Creation                              │
│    └─► Adobe, Autodesk, etc. heavily optimized for x86          │
│    └─► Professional GPUs (NVIDIA/AMD) work best with x86        │
│                                                                  │
│  ✓ Legacy enterprise applications                               │
│    └─► Decades of x86-only software                             │
│    └─► Recompiling costs millions                               │
│                                                                  │
│  ✓ High-frequency trading                                       │
│    └─► Need absolute peak single-thread performance             │
│    └─► x86 still has slight edge here                           │
│                                                                  │
│  ✓ Majority of servers (still 85%)                              │
│    └─► But ARM growing fast in cloud                            │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Intel & AMD's Response: Fighting ARM with Efficiency           │
│  ═══════════════════════════════════════════════════════        │
│                                                                  │
│  Intel Core Ultra (Meteor Lake/Arrow Lake - 2023-2024):         │
│  ┌────────────────────────────────────────────┐                 │
│  │ Architecture: x86-64 (NOT ARM!)            │                 │
│  │                                            │                 │
│  │ Key Innovation: Hybrid "Tile" Design       │                 │
│  │ ┌────────────────────────────────────────┐ │                 │
│  │ │ Compute Tile (CPU cores):              │ │                 │
│  │ │ • P-cores: 6-16 Performance cores      │ │                 │
│  │ │   └─► High freq, high power (desktop)  │ │                 │
│  │ │ • E-cores: 8-12 Efficiency cores       │ │                 │
│  │ │   └─► Lower freq, 4x lower power       │ │                 │
│  │ │ • Scheduler decides which to use       │ │                 │
│  │ ├────────────────────────────────────────┤ │                 │
│  │ │ GPU Tile (integrated graphics):        │ │                 │
│  │ │ • Intel Arc (Xe architecture)          │ │                 │
│  │ │ • 128 execution units                  │ │                 │
│  │ ├────────────────────────────────────────┤ │                 │
│  │ │ NPU (Neural Processing Unit):          │ │                 │
│  │ │ • 10-34 TOPS (AI acceleration)         │ │                 │
│  │ │ • Power: <1W for AI tasks              │ │                 │
│  │ ├────────────────────────────────────────┤ │                 │
│  │ │ SoC Tile:                              │ │                 │
│  │ │ • Memory controller, I/O, etc.         │ │                 │
│  │ └────────────────────────────────────────┘ │                 │
│  │                                            │                 │
│  │ Power Efficiency Improvements:             │                 │
│  │ • E-cores based on Atom (RISC-like!)       │                 │
│  │ • Tile design: Power only what's needed    │                 │
│  │ • Intel 4 process (7nm equivalent)         │                 │
│  │ • Thread Director: Smart scheduling        │                 │
│  │   └─► Background tasks → E-cores           │                 │
│  │   └─► Performance tasks → P-cores          │                 │
│  │                                            │                 │
│  │ Results:                                   │                 │
│  │ • Idle power: 2-3W (was 10W on 12th gen)   │                 │
│  │ • Battery life: 10-15 hours (was 5-8)      │                 │
│  │ • Performance: Competitive with M2/M3      │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  AMD Ryzen (Zen 4/5 with c-cores):                              │
│  ┌────────────────────────────────────────────┐                 │
│  │ Similar hybrid approach:                   │                 │
│  │ • High-perf cores (up to 5.7 GHz)          │                 │
│  │ • Efficiency cores (Zen 4c - compact)      │                 │
│  │ • 35% smaller die, lower power             │                 │
│  │ • TSMC 4nm process                         │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  Clarification: Intel Ultra 7/9 are x86-64, NOT ARM!            │
│  ═════════════════════════════════════════════════              │
│                                                                  │
│  Common Misconception:                                          │
│  "Intel Ultra uses ARM cores like Apple M-series"               │
│  → FALSE!                                                       │
│                                                                  │
│  Reality:                                                       │
│  • Intel Ultra 7/9: 100% x86-64 architecture                    │
│  • Uses Intel's own P-core (Performance-core) and E-core        │
│    (Efficiency-core) designs                                    │
│  • E-cores are based on Intel Atom (still x86-64!)              │
│  • Inspired by ARM's big.LITTLE, but NOT ARM cores              │
│                                                                  │
│  Comparison:                                                    │
│  ┌────────────────────────────────────────────┐                 │
│  │           Intel Ultra   Apple M3           │                 │
│  │ ISA:      x86-64        ARM64              │                 │
│  │ P-cores:  Lion Cove     Avalanche (ARM)    │                 │
│  │ E-cores:  Skymont       Blizzard (ARM)     │                 │
│  │ Process:  Intel 4 (7nm) TSMC 3nm           │                 │
│  │ TDP:      15-115W       ~20-40W            │                 │
│  │ Battery:  10-15 hours   15-22 hours        │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  Why the confusion?                                             │
│  • Both use hybrid architecture (P + E cores)                   │
│  • Both focus on efficiency                                     │
│  • Intel marketing emphasizes "mobile-first" design             │
│  • But Intel Ultra is firmly x86-64!                            │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  Realistic Future (2025-2030):                                  │
│  ═══════════════════════════                                    │
│                                                                  │
│  CORRECTED PREDICTION:                                          │
│                                                                  │
│  Mobile: ARM dominates (95%+) ✓                                 │
│  • Android (ARM), iOS (ARM)                                     │
│  • x86 effectively dead in phones/tablets                       │
│                                                                  │
│  Laptops: Mixed, trending toward ARM                            │
│  • 2025: x86 70%, ARM 25%, other 5%                             │
│  • 2030: x86 50%, ARM 45%, other 5% (predicted)                 │
│  • ARM gaining due to battery life                              │
│  • x86 holding due to software compatibility                    │
│                                                                  │
│  Desktops: x86 dominates (90%+) and will continue               │
│  • Gaming, workstations need x86 software                       │
│  • Power consumption less critical (plugged in)                 │
│  • Ecosystem lock-in very strong                                │
│                                                                  │
│  Servers/Cloud: x86 majority, ARM growing                       │
│  • 2025: x86 85%, ARM 15%                                       │
│  • 2030: x86 65%, ARM 35% (predicted)                           │
│  • Economics favor ARM for many workloads                       │
│  • Legacy apps keep x86 dominant                                │
│                                                                  │
│  REALISTIC TAKE:                                                │
│  • ARM is NOT "taking over everything"                          │
│  • ARM winning in: mobile, laptops (slowly), cloud (growing)    │
│  • x86 retaining: desktops, gaming, workstations, legacy        │
│  • Both architectures will coexist for decades                  │
│  • Competition is GOOD for consumers!                           │
│                                                                  │
│  Why x86 is MORE competitive than earlier suggested:            │
│  ✓ Hybrid architectures (P+E cores) closed efficiency gap       │
│  ✓ Advanced process nodes (Intel 4, TSMC 3nm for AMD)           │
│  ✓ Better power management (Thread Director, etc.)              │
│  ✓ NPUs for AI (Intel, AMD now have these too)                  │
│  ✓ Massive software ecosystem (decades of investment)           │
│  ✓ Backward compatibility (can run 1980s DOS programs!)         │
│                                                                  │
│  Why ARM is competitive:                                        │
│  ✓ Simpler ISA = inherently more efficient                      │
│  ✓ Apple proved ARM can be fast AND efficient                   │
│  ✓ Qualcomm Snapdragon X Elite bringing ARM to Windows          │
│  ✓ AWS Graviton proving ARM viable for servers                  │
│  ✓ Software becoming more portable (containers, WASM, etc.)     │
│                                                                  │
│  Quote from industry analyst:                                   │
│  "x86 will be like mainframes: never fully dead, but            │
│   relegated to niche legacy applications. ARM is the            │
│   future for everything that matters."                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
CORRECTION: The quote above is overly pessimistic. More balanced view:
“x86 and ARM will coexist for decades. ARM dominates mobile and is growing in cloud/laptops due to efficiency. x86 remains dominant in desktops, gaming, and enterprise due to software ecosystem. Competition between them drives innovation—everyone wins!”
— Realistic Industry Analysis, 2025

ARM Architecture Details

┌─────────────────────────────────────────────────────────────────┐
│                    ARM ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Reset Vector: 0x00000000 (or 0xFFFF0000 for high vectors)     │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│                                                                  │
│  Philosophy: RISC (Reduced Instruction Set)                     │
│  • Simple, fixed-length instructions (32-bit, or 16-bit Thumb)  │
│  • Load/store architecture (only LD/ST access memory)          │
│  • Many registers reduce memory traffic                         │
│  • Optimized for power efficiency                               │
│                                                                  │
│  Initial CPU State:                                             │
│  • Supervisor mode (privileged)                                 │
│  • MMU disabled                                                 │
│  • Caches disabled                                              │
│  • PC = 0x00000000 (reset vector)                              │
│                                                                  │
│  Exception Vector Table (at reset vector):                      │
│  ┌─────────────┬────────────────────────┐                       │
│  │ 0x00000000  │ Reset                  │                       │
│  │ 0x00000004  │ Undefined Instruction  │                       │
│  │ 0x00000008  │ Supervisor Call (SVC)  │                       │
│  │ 0x0000000C  │ Prefetch Abort         │                       │
│  │ 0x00000010  │ Data Abort             │                       │
│  │ 0x00000014  │ Reserved               │                       │
│  │ 0x00000018  │ IRQ                    │                       │
│  │ 0x0000001C  │ FIQ                    │                       │
│  └─────────────┴────────────────────────┘                       │
│                                                                  │
│  Key Features:                                                  │
│  ━━━━━━━━━━━━━━                                                  │
│                                                                  │
│  1. Register File:                                              │
│     • 31 general-purpose registers (ARM64)                      │
│     • R0-R30, plus SP, PC                                       │
│     • Different register banks per mode                         │
│                                                                  │
│  2. Instruction Set:                                            │
│     • ARM: 32-bit instructions                                  │
│     • Thumb: 16-bit compressed (code density)                   │
│     • Thumb-2: Mix of 16/32-bit                                 │
│                                                                  │
│  3. Conditional Execution:                                      │
│     • Every instruction can be conditional!                     │
│     • Reduces branches                                          │
│     Example: ADDEQ R0, R1, R2  (add if equal flag set)         │
│                                                                  │
│  4. Privilege Levels:                                           │
│     • EL0: User/Application                                     │
│     • EL1: OS Kernel                                            │
│     • EL2: Hypervisor                                           │
│     • EL3: Secure Monitor                                       │
│                                                                  │
│  5. Power Management:                                           │
│     • big.LITTLE: Mix fast/slow cores                           │
│     • WFI (Wait For Interrupt) for idle                         │
│     • Dynamic voltage/frequency scaling                         │
│                                                                  │
│  Differences from x86:                                          │
│  • No segmentation (flat memory)                                │
│  • Fixed instruction length (easier decode)                     │
│  • More registers (less stack usage)                            │
│  • No microcode (simpler, faster)                               │
│  • Better power efficiency                                      │
│                                                                  │
│  Common Use Cases:                                              │
│  • Smartphones, tablets (Apple Silicon, Qualcomm)               │
│  • IoT devices (Cortex-M series)                                │
│  • Embedded systems                                             │
│  • Servers (AWS Graviton, Ampere)                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

MIPS Architecture Details

┌─────────────────────────────────────────────────────────────────┐
│                    MIPS ARCHITECTURE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Reset Vector: 0xBFC00000 (in uncached kseg1 segment)          │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│                                                                  │
│  Philosophy: "Microprocessor without Interlocked Pipeline Stages"│
│  • Pure RISC design                                             │
│  • Emphasizes compiler optimization                             │
│  • Simple, regular instruction format                           │
│                                                                  │
│  Initial CPU State:                                             │
│  • Kernel mode                                                  │
│  • MMU disabled (direct physical addressing)                    │
│  • Exceptions disabled                                          │
│  • PC = 0xBFC00000 (ROM/flash mapped here)                     │
│                                                                  │
│  Memory Segments (32-bit MIPS):                                 │
│  ┌─────────────┬─────────────┬──────────────────┐               │
│  │ 0x00000000  │ kuseg       │ 2GB, user, TLB   │               │
│  │ 0x80000000  │ kseg0       │ 512MB, kernel,   │               │
│  │             │             │ cached, no TLB   │               │
│  │ 0xA0000000  │ kseg1       │ 512MB, kernel,   │               │
│  │             │             │ uncached, no TLB │ ← Reset here  │
│  │ 0xC0000000  │ kseg2       │ 1GB, kernel, TLB │               │
│  └─────────────┴─────────────┴──────────────────┘               │
│                                                                  │
│  Why 0xBFC00000?                                                │
│  • kseg1 = uncached (ROM needs no cache)                        │
│  • Physical address = Virtual address - 0xA0000000              │
│  • So 0xBFC00000 → physical 0x1FC00000                          │
│  • ROM/flash typically mapped here by hardware                  │
│                                                                  │
│  Key Features:                                                  │
│  ━━━━━━━━━━━━━━                                                  │
│                                                                  │
│  1. Register File:                                              │
│     • 32 general-purpose registers                              │
│     • $0 (zero) always reads as 0                               │
│     • $29 = stack pointer, $31 = return address                 │
│                                                                  │
│  2. Branch Delay Slot:                                          │
│     • Instruction after branch ALWAYS executes!                 │
│     Example:                                                    │
│       BEQ  $1, $2, target  # Branch if equal                    │
│       ADDI $3, $3, 1       # This runs BEFORE branch!           │
│       target: ...                                               │
│                                                                  │
│  3. Load Delay Slot (older MIPS):                               │
│     • Can't use loaded value in next instruction                │
│     • Need NOP or independent instruction                       │
│                                                                  │
│  4. Coprocessors:                                               │
│     • CP0: System control, MMU, exceptions                      │
│     • CP1: Floating point (FPU)                                 │
│     • CP2: Custom/vendor-specific                               │
│                                                                  │
│  5. Exception Handling:                                         │
│     • Single exception vector (usually 0x80000180)              │
│     • Software determines exception type                        │
│                                                                  │
│  Instruction Format (all 32-bit):                               │
│  ┌──────────────────────────────────────────┐                   │
│  │ R-type: op | rs | rt | rd | shamt | func│                   │
│  │ I-type: op | rs | rt | immediate         │                   │
│  │ J-type: op | address                     │                   │
│  └──────────────────────────────────────────┘                   │
│                                                                  │
│  Differences from x86:                                          │
│  • No condition codes (compare sets register)                   │
│  • No dedicated stack ops (manual SP management)                │
│  • Exposed pipeline (delay slots)                               │
│  • Simpler, more regular instruction set                        │
│                                                                  │
│  Common Use Cases:                                              │
│  • Routers (Cisco, historically)                                │
│  • Network equipment                                            │
│  • Gaming consoles (PlayStation 1, 2, PSP)                      │
│  • Embedded systems                                             │
│  • Academic teaching (simple to understand)                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

ROM/Flash Memory Mapping Comparison

┌─────────────────────────────────────────────────────────────────┐
│              FIRMWARE STORAGE COMPARISON                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  LEGACY: BIOS in ROM                                            │
│  ━━━━━━━━━━━━━━━━━━━━━━━                                         │
│                                                                  │
│  Physical Chip:                                                 │
│  • ROM (Read-Only Memory) or Flash                              │
│  • Soldered on motherboard                                      │
│  • Typically 512 KB - 2 MB                                      │
│  • Memory-mapped at 0xFFFE0000 - 0xFFFFFFFF                     │
│                                                                  │
│  Memory Map:                                                    │
│  ┌──────────────────────────────────────┐                       │
│  │ 0xFFFFFFFF                           │                       │
│  │    ┌────────────────────┐            │                       │
│  │    │  BIOS Code         │            │                       │
│  │    │  (512KB - 2MB)     │            │                       │
│  │    │                    │            │                       │
│  │    │  • POST routines   │            │                       │
│  │    │  • Hardware init   │            │                       │
│  │    │  • Boot device sel │            │                       │
│  │    │  • Setup menus     │            │                       │
│  │    │                    │            │                       │
│  │ 0xFFFFFFF0 ← Reset vector (FAR JMP) │                       │
│  │    └────────────────────┘            │                       │
│  │                                      │                       │
│  │         (Gap: Not mapped)            │                       │
│  │                                      │                       │
│  │ 0x00100000 ← 1MB boundary            │                       │
│  │    ┌────────────────────┐            │                       │
│  │    │  RAM (extended)    │            │                       │
│  │    │                    │            │                       │
│  │ 0x000FFFFF                           │                       │
│  │    ├────────────────────┤            │                       │
│  │    │  BIOS Shadow RAM   │ ← Copy of  │                       │
│  │ 0x000F0000 (optional, for speed)    │                       │
│  │    ├────────────────────┤            │                       │
│  │    │  Extended BIOS Area│            │                       │
│  │ 0x000E0000                           │                       │
│  │    ├────────────────────┤            │                       │
│  │    │  VGA BIOS          │            │                       │
│  │ 0x000C0000 (GPU firmware)           │                       │
│  │    ├────────────────────┤            │                       │
│  │    │  VGA RAM/Buffers   │            │                       │
│  │ 0x000A0000                           │                       │
│  │    ├────────────────────┤            │                       │
│  │    │  Low RAM           │            │                       │
│  │    │  (available to OS) │            │                       │
│  │ 0x00000000                           │                       │
│  └──────────────────────────────────────┘                       │
│                                                                  │
│  Limitations:                                                   │
│  • Fixed in ROM (hard to update)                                │
│  • Small size (limited features)                                │
│  • 16-bit real mode code                                        │
│                                                                  │
│  ═══════════════════════════════════════════════════════════    │
│                                                                  │
│  MODERN: UEFI in Flash                                          │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━                                       │
│                                                                  │
│  Physical Chip:                                                 │
│  • SPI Flash (Serial Peripheral Interface)                      │
│  • Easily updatable                                             │
│  • Typically 8 MB - 32 MB                                       │
│  • Memory-mapped at reset                                       │
│                                                                  │
│  UEFI Firmware Layout:                                          │
│  ┌──────────────────────────────────────┐                       │
│  │                                      │                       │
│  │  ┌────────────────────────────────┐  │                       │
│  │  │ Recovery/Fallback Region       │  │                       │
│  │  │ (Protected from updates)       │  │                       │
│  │  └────────────────────────────────┘  │                       │
│  │                                      │                       │
│  │  ┌────────────────────────────────┐  │                       │
│  │  │ Main Firmware Volume           │  │                       │
│  │  │                                │  │                       │
│  │  │  ┌──────────────────────────┐  │  │                       │
│  │  │  │ SEC (Security Phase)     │  │  │                       │
│  │  │  │ - First code executed    │  │  │                       │
│  │  │  │ - Verify firmware sig    │  │  │                       │
│  │  │  └──────────────────────────┘  │  │                       │
│  │  │                                │  │                       │
│  │  │  ┌──────────────────────────┐  │  │                       │
│  │  │  │ PEI (Pre-EFI Init)       │  │  │                       │
│  │  │  │ - Memory init            │  │  │                       │
│  │  │  │ - CPU setup              │  │  │                       │
│  │  │  └──────────────────────────┘  │  │                       │
│  │  │                                │  │                       │
│  │  │  ┌──────────────────────────┐  │  │                       │
│  │  │  │ DXE (Driver Execution)   │  │  │                       │
│  │  │  │ - Device drivers         │  │  │                       │
│  │  │  │ - Protocol handlers      │  │  │                       │
│  │  │  └──────────────────────────┘  │  │                       │
│  │  │                                │  │                       │
│  │  │  ┌──────────────────────────┐  │  │                       │
│  │  │  │ BDS (Boot Device Select) │  │  │                       │
│  │  │  │ - Boot manager           │  │  │                       │
│  │  │  └──────────────────────────┘  │  │                       │
│  │  │                                │  │                       │
│  │  └────────────────────────────────┘  │                       │
│  │                                      │                       │
│  │  ┌────────────────────────────────┐  │                       │
│  │  │ NVRAM (Variables)              │  │                       │
│  │  │ - Boot order, settings         │  │                       │
│  │  └────────────────────────────────┘  │                       │
│  │                                      │                       │
│  │  ┌────────────────────────────────┐  │                       │
│  │  │ Fault Tolerant Write (FTW)     │  │                       │
│  │  │ - Prevent brick on bad update  │  │                       │
│  │  └────────────────────────────────┘  │                       │
│  │                                      │                       │
│  └──────────────────────────────────────┘                       │
│                                                                  │
│  Advantages:                                                    │
│  • Updatable (security patches)                                 │
│  • Larger size (more features)                                  │
│  • 32/64-bit code (no real mode)                                │
│  • Modular driver architecture                                  │
│  • Secure Boot support                                          │
│  • Network boot, GUI                                            │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Jump Loader and Early Firmware Execution

┌─────────────────────────────────────────────────────────────────┐
│                    JUMP LOADER MECHANICS                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  What is a Jump Loader?                                         │
│  ━━━━━━━━━━━━━━━━━━━━━                                           │
│  First few instructions at reset vector that redirect           │
│  execution to main firmware code                                │
│                                                                  │
│  Why needed?                                                    │
│  • Reset vector location is fixed by architecture               │
│  • Main firmware may be elsewhere in ROM                        │
│  • ROM may be bank-switched or have multiple regions            │
│                                                                  │
│  ═══════════════════════════════════════════════════════════    │
│                                                                  │
│  x86 BIOS Jump Loader:                                          │
│  ━━━━━━━━━━━━━━━━━━━━━━                                          │
│                                                                  │
│  At 0xFFFFFFF0 (Reset Vector):                                 │
│  ┌────────────────────────────────────────┐                     │
│  │ Machine Code    Assembly               │                     │
│  ├────────────────────────────────────────┤                     │
│  │ EA 5B E0 00 F0  JMP FAR F000:E05B      │                     │
│  │                                        │                     │
│  │ Breakdown:                             │                     │
│  │   EA          = Far jump opcode        │                     │
│  │   5B E0       = Offset E05B            │                     │
│  │   00 F0       = Segment F000           │                     │
│  │                                        │                     │
│  │ Physical address = F0000 + E05B        │                     │
│  │                  = 0xFE05B             │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  This 5-byte instruction fits in the 16 bytes before end of     │
│  addressable space (0xFFFFFFF0 to 0xFFFFFFFF)                   │
│                                                                  │
│  ═══════════════════════════════════════════════════════════    │
│                                                                  │
│  ARM Boot Sequence (Typical):                                   │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                                   │
│                                                                  │
│  At 0x00000000 (Reset Vector):                                 │
│  ┌────────────────────────────────────────┐                     │
│  │ Machine Code    Assembly               │                     │
│  ├────────────────────────────────────────┤                     │
│  │ E59FF018        LDR PC, [PC, #24]      │                     │
│  │                                        │                     │
│  │ What this does:                        │                     │
│  │ • Load word from [PC + 24]             │                     │
│  │ • Store into PC (program counter)      │                     │
│  │ • Effect: Jump to address in table     │                     │
│  │                                        │                     │
│  │ At offset +24:                         │                     │
│  │ 0x20000000   ← Address of main code    │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  Vector Table Layout (ARM):                                     │
│  ┌────────┬─────────────────────────────┐                       │
│  │ Offset │ Exception                   │                       │
│  ├────────┼─────────────────────────────┤                       │
│  │ 0x00   │ Reset: LDR PC, [reset_addr] │                       │
│  │ 0x04   │ Undef: LDR PC, [undef_addr] │                       │
│  │ 0x08   │ SVC:   LDR PC, [svc_addr]   │                       │
│  │ 0x0C   │ Prefetch Abort              │                       │
│  │ 0x10   │ Data Abort                  │                       │
│  │ 0x14   │ Reserved                    │                       │
│  │ 0x18   │ IRQ                         │                       │
│  │ 0x1C   │ FIQ                         │                       │
│  ├────────┼─────────────────────────────┤                       │
│  │ 0x20   │ reset_addr: .word 0x8000    │  ← Jump addresses     │
│  │ 0x24   │ undef_addr: .word 0x8100    │                       │
│  │  ...   │  ...                        │                       │
│  └────────┴─────────────────────────────┘                       │
│                                                                  │
│  ═══════════════════════════════════════════════════════════    │
│                                                                  │
│  MIPS Boot Sequence:                                            │
│  ━━━━━━━━━━━━━━━━━━━                                             │
│                                                                  │
│  At 0xBFC00000 (Reset Vector in ROM):                          │
│  ┌────────────────────────────────────────┐                     │
│  │ Machine Code    Assembly               │                     │
│  ├────────────────────────────────────────┤                     │
│  │ 3C1ABFC0        LUI  $k0, 0xBFC0       │                     │
│  │ 375A1000        ORI  $k0, $k0, 0x1000  │                     │
│  │ 03400008        JR   $k0               │                     │
│  │ 00000000        NOP (delay slot)       │                     │
│  │                                        │                     │
│  │ Effect: Jump to 0xBFC01000             │                     │
│  │ (main bootloader code)                 │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
│  Why the complexity?                                            │
│  • First 16 bytes may be used for metadata                      │
│  • Need to skip over boot header                                │
│  • Allow flexibility in code organization                       │
│                                                                  │
│  ═══════════════════════════════════════════════════════════    │
│                                                                  │
│  UEFI Jump (x86-64):                                            │
│  ━━━━━━━━━━━━━━━━━━                                              │
│                                                                  │
│  Still starts at 0xFFFFFFF0:                                    │
│  ┌────────────────────────────────────────┐                     │
│  │ JMP FAR to SEC phase entry point       │                     │
│  │                                        │                     │
│  │ SEC phase:                             │                     │
│  │ 1. Setup temporary stack (cache-as-RAM)│                     │
│  │ 2. Verify firmware signature           │                     │
│  │ 3. Call PEI entry point                │                     │
│  │                                        │                     │
│  │ Much more complex than BIOS!           │                     │
│  │ • Multiple phases                      │                     │
│  │ • Security checks                      │                     │
│  │ • Modular architecture                 │                     │
│  └────────────────────────────────────────┘                     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Stage 0: Power-On and Hardware Initialization

Before any software runs, the hardware must be powered and initialized:
┌─────────────────────────────────────────────────────────────────┐
│              POWER-ON SEQUENCE (Stage 0)                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. PSU (Power Supply Unit) Activation                          │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│     • User presses power button                                 │
│     • PSU performs self-test                                    │
│     • Generates power rails: +12V, +5V, +3.3V, -12V, -5V       │
│     • Sends "Power Good" signal to motherboard                  │
│     • Typical delay: 100-500ms                                  │
│                                                                  │
│  2. Motherboard Power Sequencing                                │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│     • Chipset receives power                                    │
│     • Clock generator starts (crystal oscillator)               │
│     • CPU receives stable clock signal                          │
│     • Reset signal de-asserted                                  │
│                                                                  │
│  3. CPU Reset and Initialization                                │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│     CPU State at Reset:                                         │
│     ┌──────────────────────────────────────────┐                │
│     │ • Real mode (16-bit)                     │                │
│     │ • CS:IP = 0xFFFF:0x0000 (reset vector)  │                │
│     │   → Physical address 0xFFFFFFF0          │                │
│     │ • Caches disabled                        │                │
│     │ • Paging disabled                        │                │
│     │ • Interrupts disabled                    │                │
│     │ • All registers cleared                  │                │
│     └──────────────────────────────────────────┘                │
│                                                                  │
│  4. Firmware ROM Mapping                                        │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│     Memory Map at Reset:                                        │
│     ┌────────────────────────────────────────────┐              │
│     │ 0xFFFFFFFF ┌──────────────────────┐        │              │
│     │            │                      │        │              │
│     │            │   Flash ROM/BIOS     │        │              │
│     │            │   (Firmware code)    │        │              │
│     │            │                      │        │              │
│     │ 0xFFFFF000 └──────────────────────┘        │              │
│     │ 0xFFFFFFF0 ← Reset vector (JMP to BIOS)   │              │
│     │            ...                             │              │
│     │ 0x000FFFFF ┌──────────────────────┐        │              │
│     │            │   BIOS shadow        │        │              │
│     │ 0x000C0000 ├──────────────────────┤        │              │
│     │            │   VGA BIOS, buffers  │        │              │
│     │ 0x000A0000 ├──────────────────────┤        │              │
│     │            │   RAM                │        │              │
│     │ 0x00000000 └──────────────────────┘        │              │
│     └────────────────────────────────────────────┘              │
│                                                                  │
│  5. First Instruction Execution                                 │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│     • CPU fetches instruction at 0xFFFFFFF0                     │
│     • Typically a JMP instruction to BIOS entry point           │
│     • Control transfers to firmware initialization code         │
│                                                                  │
│     x86 Reset Vector:                                           │
│     Assembly at 0xFFFFFFF0:                                     │
│       JMP FAR F000:E05B    ; Jump to BIOS initialization        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Power Supply Details

┌─────────────────────────────────────────────────────────────────┐
│                   PSU POWER RAILS                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Voltage Rail    Purpose                    Current             │
│  ───────────────────────────────────────────────────────────    │
│  +12V           CPU, GPU, drives           High (20-40A+)       │
│  +5V            Legacy devices, USB        Medium (15-25A)      │
│  +3.3V          Memory, chipset, logic     Medium (15-25A)      │
│  +5VSB          Standby power              Low (2-3A)           │
│  -12V           Legacy serial/audio        Very low (<1A)       │
│                                                                  │
│  Power Good Signal:                                             │
│  • PSU monitors all rails                                       │
│  • If voltages stable: Assert PG signal (5V signal)             │
│  • Motherboard holds CPU in reset until PG received             │
│  • If voltage drops: De-assert PG → system reset                │
│                                                                  │
│  Soft-Off State (S5 → S0 transition):                           │
│  1. +5VSB always powered (wake-on-LAN, USB charging)            │
│  2. Power button press → Chipset enables PSU_ON signal          │
│  3. PSU starts, generates all rails                             │
│  4. Power sequencing ensures proper voltage order               │
│  5. PG signal → CPU released from reset                         │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

POST (Power-On Self-Test)

Comprehensive hardware verification before OS boot:
┌─────────────────────────────────────────────────────────────────┐
│                DETAILED POST SEQUENCE                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Phase 1: Critical Component Testing                            │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│  1. CPU verification                                            │
│     • Check CPU signature, stepping                             │
│     • Verify microcode version                                  │
│     • Initialize CPU caches                                     │
│     • Enable cache-as-RAM (CAR) if needed                       │
│                                                                  │
│  2. Memory detection and test                                   │
│     • Enumerate memory modules (SPD read via SMBus)             │
│     • Determine speed, capacity, timings                        │
│     • Memory controller initialization                          │
│     • Basic memory test (write/read patterns)                   │
│     • Build memory map                                          │
│                                                                  │
│  3. Chipset initialization                                      │
│     • North bridge: CPU, memory, PCIe                           │
│     • South bridge: I/O, USB, SATA                              │
│     • Configure buses and bridges                               │
│                                                                  │
│  Phase 2: Peripheral Discovery                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│  4. PCI/PCIe device enumeration                                 │
│     • Scan all buses, devices, functions                        │
│     • Read device IDs, capabilities                             │
│     • Assign memory-mapped I/O regions                          │
│     • Assign IRQs                                               │
│                                                                  │
│  5. Storage controller initialization                           │
│     • SATA/NVMe controller setup                                │
│     • Detect attached drives                                    │
│     • Read drive parameters                                     │
│                                                                  │
│  6. Video initialization                                        │
│     • Initialize GPU or integrated graphics                     │
│     • Set up framebuffer                                        │
│     • Display POST screen/logo                                  │
│                                                                  │
│  7. USB controller and devices                                  │
│     • Initialize USB host controllers                           │
│     • Enumerate USB devices                                     │
│     • Setup keyboard/mouse for BIOS                             │
│                                                                  │
│  Phase 3: Boot Preparation                                      │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                             │
│  8. Load Option ROMs                                            │
│     • Network card option ROMs (PXE boot)                       │
│     • Storage adapter ROMs                                      │
│     • GPU VBIOS                                                 │
│                                                                  │
│  9. ACPI table setup                                            │
│     • Build ACPI tables (DSDT, SSDT, etc.)                      │
│     • Provide hardware info to OS                               │
│                                                                  │
│  10. Boot device selection                                      │
│     • Check boot order in CMOS/NVRAM                            │
│     • Attempt boot from each device in order                    │
│                                                                  │
│  POST Beep Codes (Example - AMI BIOS):                          │
│  1 beep      → POST successful                                  │
│  2 beeps     → POST error, check display                        │
│  3 beeps     → Memory error                                     │
│  5 beeps     → CPU error                                        │
│  Continuous  → Power/cooling issue                              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Stage 1: Firmware (BIOS vs UEFI)

BIOS (Legacy)

Basic Input/Output System — the original PC firmware:
┌─────────────────────────────────────────────────────────────────┐
│                    BIOS BOOT PROCESS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. CPU starts at reset vector (0xFFFF0)                        │
│  2. Jumps to BIOS ROM code                                      │
│  3. POST (Power-On Self-Test):                                  │
│     • Test RAM                                                  │
│     • Detect hardware (CPU, disks, video)                       │
│     • Initialize devices                                         │
│  4. Find boot device (order in BIOS settings)                   │
│  5. Load MBR (first 512 bytes of disk)                          │
│  6. Execute MBR bootloader code                                  │
│                                                                  │
│  MBR Layout (512 bytes):                                         │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Bootstrap Code     │  Partition Table  │  Signature    │    │
│  │     (446 bytes)     │   (64 bytes)      │  (2 bytes)    │    │
│  │                     │   4 × 16 bytes    │   0x55AA      │    │
│  └─────────────────────┴───────────────────┴───────────────┘    │
│                                                                  │
│  Limitations:                                                    │
│  • 16-bit real mode                                             │
│  • 1 MB addressable memory                                       │
│  • 4 primary partitions max                                      │
│  • 2 TB disk size limit                                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

UEFI (Modern)

Unified Extensible Firmware Interface — modern replacement for BIOS:
┌─────────────────────────────────────────────────────────────────┐
│                    UEFI BOOT PROCESS                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. SEC (Security Phase)                                         │
│     • CPU reset, cache-as-RAM                                   │
│     • Verify firmware integrity                                  │
│                                                                  │
│  2. PEI (Pre-EFI Initialization)                                │
│     • Initialize chipset, memory controller                     │
│     • Discover memory                                            │
│                                                                  │
│  3. DXE (Driver Execution Environment)                          │
│     • Load UEFI drivers                                         │
│     • Initialize devices                                         │
│     • Build system tables                                        │
│                                                                  │
│  4. BDS (Boot Device Selection)                                 │
│     • Read boot variables (NVRAM)                               │
│     • Find EFI System Partition (ESP)                           │
│     • Load EFI bootloader                                        │
│                                                                  │
│  EFI System Partition (FAT32):                                   │
│  /boot/efi/                                                      │
│  └── EFI/                                                        │
│      ├── BOOT/                                                   │
│      │   └── BOOTX64.EFI    ← Default fallback                 │
│      ├── ubuntu/                                                 │
│      │   └── shimx64.efi    ← Ubuntu bootloader                │
│      └── Microsoft/                                              │
│          └── Boot/                                               │
│              └── bootmgfw.efi                                    │
│                                                                  │
│  Advantages over BIOS:                                           │
│  • 64-bit mode, GBs of addressable memory                       │
│  • GPT: 128 partitions, 9.4 ZB disk size                        │
│  • Secure Boot (cryptographic verification)                      │
│  • Network boot, GUI setup                                       │
│  • Modular drivers                                               │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

BIOS vs UEFI Comparison

FeatureBIOSUEFI
Mode16-bit real mode32/64-bit protected mode
Partition tableMBRGPT
Max disk size2 TB9.4 ZB
Max partitions4 primary128
Boot code locationMBR (446 bytes)ESP (FAT32 partition)
Secure BootNoYes
Network bootLimitedNative
Boot timeSlowerFaster

Understanding Partition Tables: MBR vs GPT

┌─────────────────────────────────────────────────────────────────┐
│                MBR (Master Boot Record)                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  What is MBR?                                                   │
│  ═══════════                                                    │
│  • First 512 bytes of disk                                      │
│  • Contains boot code + partition table                         │
│  • Legacy standard from 1983 (IBM PC DOS 2.0)                   │
│  • Still widely used, but being phased out                      │
│                                                                  │
│  MBR Structure:                                                 │
│  ┌────────────────────────────────────────────┐                 │
│  │ Byte 0-445: Boot code (446 bytes)         │                 │
│  │ ─────────────────────────────────────────  │                 │
│  │ • x86 assembly code                        │                 │
│  │ • Loads bootloader from active partition   │                 │
│  │ • Very limited space!                      │                 │
│  │                                            │                 │
│  │ Example boot code (simplified):            │                 │
│  │   ORG 0x7C00        ; BIOS loads here      │                 │
│  │   MOV SI, msg       ; "Loading..."         │                 │
│  │   CALL print        ; Display message      │                 │
│  │   MOV AH, 02h       ; Read disk            │                 │
│  │   INT 13h           ; BIOS disk service    │                 │
│  │   JMP 0x7E00        ; Jump to bootloader   │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Byte 446-509: Partition table (64 bytes)  │                 │
│  │ ─────────────────────────────────────────  │                 │
│  │ 4 partition entries × 16 bytes each:       │                 │
│  │                                            │                 │
│  │ Partition 1 (16 bytes):                    │                 │
│  │   Byte 0:    Boot flag (0x80 = active)     │                 │
│  │   Byte 1-3:  CHS start address (legacy)    │                 │
│  │   Byte 4:    Partition type                │                 │
│  │              0x07 = NTFS                   │                 │
│  │              0x83 = Linux                  │                 │
│  │              0x82 = Linux swap             │                 │
│  │              0xEE = GPT protective         │                 │
│  │   Byte 5-7:  CHS end address               │                 │
│  │   Byte 8-11: LBA start (32-bit)            │                 │
│  │   Byte 12-15: Size in sectors (32-bit)     │                 │
│  │                                            │                 │
│  │ Partition 2 (16 bytes): [same format]     │                 │
│  │ Partition 3 (16 bytes): [same format]     │                 │
│  │ Partition 4 (16 bytes): [same format]     │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Byte 510-511: Boot signature (0x55AA)     │                 │
│  │ ─────────────────────────────────────────  │                 │
│  │ • Magic number to validate MBR             │                 │
│  │ • BIOS checks this before executing        │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  MBR Limitations:                                               │
│  ━━━━━━━━━━━━━━━━                                               │
│                                                                  │
│  1. Max 4 primary partitions                                    │
│     • Workaround: Extended partitions (logical drives)          │
│     • Complicated and error-prone                               │
│                                                                  │
│  2. Max disk size: 2 TB                                         │
│     • Uses 32-bit LBA (Logical Block Address)                   │
│     • 2^32 sectors × 512 bytes = 2 TB                           │
│     • Modern disks are 20+ TB!                                  │
│                                                                  │
│  3. No redundancy                                               │
│     • Single point of failure                                   │
│     • Corruption = unbootable system                            │
│                                                                  │
│  4. No partition names                                          │
│     • Only type codes                                           │
│                                                                  │
│  ═══════════════════════════════════════════════════════════════ │
│                                                                  │
│  GPT (GUID Partition Table)                                     │
│  ═══════════════════════════                                    │
│                                                                  │
│  Modern replacement for MBR:                                    │
│  • Part of UEFI standard                                        │
│  • Introduced in late 1990s, mainstream since 2010s             │
│  • Solves all MBR limitations                                   │
│                                                                  │
│  GPT Disk Layout:                                               │
│  ┌────────────────────────────────────────────┐                 │
│  │ LBA 0: Protective MBR                      │ ◄── Backwards   │
│  │   • Prevents old tools from damaging disk  │     compat      │
│  │   • Single partition (type 0xEE)           │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ LBA 1: Primary GPT Header                  │                 │
│  │   • Signature: "EFI PART"                  │                 │
│  │   • Disk GUID (unique identifier)          │                 │
│  │   • LBA of partition entries               │                 │
│  │   • Number of partitions (usually 128)     │                 │
│  │   • CRC32 checksum                         │                 │
│  │   • LBA of backup header (end of disk)     │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ LBA 2-33: Partition Entry Array           │                 │
│  │   • 128 entries × 128 bytes each           │                 │
│  │   • Each entry:                            │                 │
│  │     - Partition Type GUID                  │                 │
│  │     - Unique Partition GUID                │                 │
│  │     - Starting LBA (64-bit!)               │                 │
│  │     - Ending LBA (64-bit!)                 │                 │
│  │     - Attributes (read-only, hidden, etc.) │                 │
│  │     - Partition name (72 bytes, Unicode!)  │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ LBA 34-...: Actual partitions              │                 │
│  │   [data, data, data, ...]                  │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Last 33 LBAs: Backup GPT                   │ ◄── Redundancy! │
│  │   • Backup partition entries               │                 │
│  │   • Backup GPT header                      │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  Example Partition Entry:                                       │
│  ┌────────────────────────────────────────────┐                 │
│  │ Type GUID: EBD0A0A2-B9E5-4433-...         │                 │
│  │   (Microsoft Basic Data)                   │                 │
│  │ Partition GUID: 1234-5678-ABCD-...        │                 │
│  │   (Unique to this specific partition)      │                 │
│  │ Start LBA: 2048 (1 MB offset)              │                 │
│  │ End LBA:   41943039 (20 GB partition)      │                 │
│  │ Attributes: 0 (none)                       │                 │
│  │ Name: "EFI System Partition"               │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  GPT Advantages:                                                │
│  ━━━━━━━━━━━━━━                                                │
│                                                                  │
│  1. Up to 128 partitions (default, can be more!)                │
│  2. Max disk size: 9.4 ZB (Zettabytes!)                         │
│     • 64-bit LBA: 2^64 sectors × 512 bytes                      │
│     • 2^64 × 512 = 9,444,732,965,739,290,427,392 bytes          │
│     • = 9.4 ZB = 9,400,000,000 TB!                              │
│     • No disk will hit this limit for decades                   │
│  3. Redundancy: Backup header + entries at end of disk          │
│  4. CRC32 checksums detect corruption                           │
│  5. Partition names (human-readable, Unicode)                   │
│  6. GUIDs prevent conflicts                                     │
│                                                                  │
│  Why 9.4 ZB Specifically?                                       │
│  • 2^64 = 18,446,744,073,709,551,616 sectors                    │
│  • × 512 bytes/sector = 9,444,732,965,739,290,427,392 bytes     │
│  • ÷ 1024^7 (bytes → ZB) ≈ 9.44 ZB                              │
│                                                                  │
│  For context:                                                   │
│  • 1 KB = 1,024 bytes                                           │
│  • 1 MB = 1,024 KB = 1,048,576 bytes                            │
│  • 1 GB = 1,024 MB = 1,073,741,824 bytes                        │
│  • 1 TB = 1,024 GB ≈ 1 trillion bytes                           │
│  • 1 PB = 1,024 TB ≈ 1 quadrillion bytes                        │
│  • 1 EB = 1,024 PB ≈ 1 quintillion bytes                        │
│  • 1 ZB = 1,024 EB ≈ 1 sextillion bytes                         │
│                                                                  │
│  Largest disk today (2025): ~30 TB                              │
│  → GPT supports 300 million times larger!                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Why Partition Tables Matter

┌─────────────────────────────────────────────────────────────────┐
│                PARTITION TABLE IMPORTANCE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. Boot Process Dependency                                     │
│  ════════════════════════════                                   │
│                                                                  │
│  BIOS boot:                                                     │
│  • BIOS reads MBR (LBA 0)                                       │
│  • Executes boot code in MBR                                    │
│  • Boot code reads partition table                              │
│  • Finds active partition                                       │
│  • Loads VBR (Volume Boot Record) from partition                │
│  • VBR loads bootloader (GRUB, etc.)                            │
│                                                                  │
│  → If MBR corrupted: System won't boot!                         │
│                                                                  │
│  UEFI boot:                                                     │
│  • UEFI reads GPT header                                        │
│  • Finds EFI System Partition (ESP)                             │
│  • Mounts ESP as FAT32                                          │
│  • Loads .efi bootloader from ESP                               │
│                                                                  │
│  → If GPT corrupted: Can use backup at end of disk!             │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  2. Operating System Installation                               │
│  ══════════════════════════════════                             │
│                                                                  │
│  Partitioning scheme affects what you can install:              │
│                                                                  │
│  Windows 11 requirements:                                       │
│  • MUST use GPT                                                 │
│  • MUST use UEFI (no BIOS/MBR)                                  │
│  • Secure Boot enabled                                          │
│  • TPM 2.0 required                                             │
│                                                                  │
│  Linux:                                                         │
│  • Works with both MBR and GPT                                  │
│  • Works with both BIOS and UEFI                                │
│  • More flexible!                                               │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  3. Multi-Boot Scenarios                                        │
│  ═════════════════════                                          │
│                                                                  │
│  Example dual-boot setup (GPT):                                 │
│  ┌────────────────────────────────────────────┐                 │
│  │ Partition 1: EFI System Partition (ESP)   │ 500 MB         │
│  │   • FAT32                                  │                 │
│  │   • Contains bootloaders for all OSes      │                 │
│  │   • /EFI/Microsoft/bootmgfw.efi            │                 │
│  │   • /EFI/ubuntu/shimx64.efi                │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Partition 2: Windows C:                    │ 500 GB         │
│  │   • NTFS                                   │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Partition 3: Linux /                       │ 100 GB         │
│  │   • ext4                                   │                 │
│  ├────────────────────────────────────────────┤                 │
│  │ Partition 4: Linux swap                    │ 16 GB          │
│  ├────────────────────────────────────────────┤                 │
│  │ Partition 5: Shared data                   │ 400 GB         │
│  │   • exFAT (readable by both OSes)          │                 │
│  └────────────────────────────────────────────┘                 │
│                                                                  │
│  With MBR: Only 4 partitions, would need extended partitions!   │
│  With GPT: Easy! Can have 128 partitions if needed.             │
│                                                                  │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ │
│                                                                  │
│  4. Data Recovery                                               │
│  ══════════════                                                 │
│                                                                  │
│  MBR corruption:                                                │
│  • Often unrecoverable without backup                           │
│  • Tools like TestDisk can sometimes help                       │
│                                                                  │
│  GPT corruption:                                                │
│  • Backup GPT at end of disk!                                   │
│  • gdisk can automatically repair from backup                   │
│  • Much more resilient                                          │
│                                                                  │
│  Example recovery:                                              │
│  $ gdisk /dev/sda                                               │
│  Command: r (recovery menu)                                     │
│  Command: c (load backup GPT)                                   │
│  Command: w (write repaired GPT)                                │
│  → Partition table restored!                                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

UEFI POST Integration

Modern UEFI firmware runs POST during the PEI phase:
┌─────────────────────────────────────────────────────────────────┐
│                UEFI PHASES WITH POST                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. SEC (Security Phase) - Immediate after reset                │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━                │
│     Duration: ~10-50ms                                          │
│     ┌────────────────────────────────────────────┐              │
│     │ • CPU reset vector executed                │              │
│     │ • Minimal CPU initialization               │              │
│     │ • Cache-as-RAM (CAR) setup                 │              │
│     │   └─► No RAM yet! Use CPU cache as stack   │              │
│     │ • Verify firmware integrity (measured boot)│              │
│     │   └─► TPM measurements                     │              │
│     │ • Jump to PEI                              │              │
│     └────────────────────────────────────────────┘              │
│                                                                  │
│  2. PEI (Pre-EFI Init) - THIS IS WHERE POST HAPPENS!            │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│     Duration: ~100-500ms                                        │
│     ┌────────────────────────────────────────────┐              │
│     │ PEI Module 1: CPU Initialization           │              │
│     │   • Microcode update load                  │              │
│     │   • Enable CPU features (SSE, AVX, etc.)   │              │
│     │   • Set up MTRRs (Memory Type Range Regs)  │              │
│     │   • Configure power management             │              │
│     │                                            │              │
│     │ PEI Module 2: Memory Controller Init       │              │
│     │   • Read SPD from RAM modules (I2C/SMBus)  │              │
│     │   • Configure memory timings               │              │
│     │   • Train memory (signal integrity tests)  │              │
│     │     └─► This takes MOST of POST time!      │              │
│     │   • Memory scrambling (security)           │              │
│     │   • Build memory map (E820/UEFI memory map)│              │
│     │   • Migrate from CAR to real RAM           │              │
│     │                                            │              │
│     │ PEI Module 3: Chipset Initialization       │              │
│     │   • PCIe link training                     │              │
│     │   • USB controller pre-init                │              │
│     │   • SATA controller setup                  │              │
│     │   • Integrated GPU initialization          │              │
│     └────────────────────────────────────────────┘              │
│                                                                  │
│  Memory Training Deep Dive:                                     │
│  ┌──────────────────────────────────────────────────────┐       │
│  │ Why it takes so long:                                │       │
│  │                                                      │       │
│  │ • DDR4/DDR5 runs at 3200-6400 MT/s                   │       │
│  │ • Signal integrity critical at these speeds          │       │
│  │ • UEFI must find optimal timings                     │       │
│  │                                                      │       │
│  │ Training process (per DIMM):                         │       │
│  │ 1. Write test patterns to memory                     │       │
│  │ 2. Read back with varying delays                     │       │
│  │ 3. Adjust:                                           │       │
│  │    • CAS latency (tCL)                               │       │
│  │    • RAS-to-CAS delay (tRCD)                         │       │
│  │    • RAS precharge time (tRP)                        │       │
│  │    • Command rate (1T vs 2T)                         │       │
│  │ 4. Find "eye" (stable region)                        │       │
│  │ 5. Set final timings in memory controller            │       │
│  │                                                      │       │
│  │ Result: Stable, optimized memory operation           │       │
│  │                                                      │       │
│  │ Fast Boot optimization:                              │       │
│  │ • Save training results to NVRAM                     │       │
│  │ • Skip training on subsequent boots                  │       │
│  │ • Re-train only if config changes                    │       │
│  └──────────────────────────────────────────────────────┘       │
│                                                                  │
│  3. DXE (Driver Execution Environment)                          │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│     Duration: ~200-1000ms                                       │
│     ┌────────────────────────────────────────────┐              │
│     │ • Full driver environment with C runtime   │              │
│     │ • Load device drivers from firmware volume │              │
│     │ • PCI enumeration and resource allocation  │              │
│     │ • ACPI table generation                    │              │
│     │ • SMBIOS table creation                    │              │
│     │ • Graphics output protocol (display logo)  │              │
│     │ • Console I/O (keyboard/mouse)             │              │
│     │ • Network stack (PXE boot support)         │              │
│     └────────────────────────────────────────────┘              │
│                                                                  │
│  4. BDS (Boot Device Selection)                                 │
│  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━              │
│     Duration: ~100-500ms                                        │
│     ┌────────────────────────────────────────────┐              │
│     │ • Connect console devices                  │              │
│     │ • Process BootOrder variable               │              │
│     │ • Display boot menu (if configured)        │              │
│     │ • Launch boot manager or EFI application   │              │
│     │ • Hand off to OS loader                    │              │
│     └────────────────────────────────────────────┘              │
│                                                                  │
│  Total UEFI boot time: ~500ms - 2 seconds                       │
│  (Compare to legacy BIOS: 3-10 seconds!)                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Stage 2: Bootloader

GRUB 2 (Grand Unified Bootloader)

The most common Linux bootloader:
┌─────────────────────────────────────────────────────────────────┐
│                    GRUB 2 BOOT STAGES                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  BIOS Mode:                                                      │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ Stage 1: boot.img (MBR, 446 bytes)                     │     │
│  │ • Loads Stage 1.5                                       │     │
│  └───────────────────────────┬────────────────────────────┘     │
│                              ▼                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ Stage 1.5: core.img (post-MBR gap or partition)       │     │
│  │ • Filesystem drivers                                    │     │
│  │ • Loads Stage 2                                         │     │
│  └───────────────────────────┬────────────────────────────┘     │
│                              ▼                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ Stage 2: /boot/grub/* (full GRUB)                     │     │
│  │ • Read grub.cfg                                         │     │
│  │ • Display menu                                          │     │
│  │ • Load kernel + initrd                                  │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                  │
│  UEFI Mode:                                                      │
│  ┌────────────────────────────────────────────────────────┐     │
│  │ grubx64.efi (on ESP)                                   │     │
│  │ • Single EFI application                                │     │
│  │ • Contains all GRUB functionality                       │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

GRUB Configuration

# /boot/grub/grub.cfg (auto-generated, don't edit directly)
# Edit /etc/default/grub instead

# /etc/default/grub
GRUB_DEFAULT=0                    # Default menu entry
GRUB_TIMEOUT=5                    # Seconds to wait
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""             # Always applied

# Regenerate grub.cfg after changes
sudo update-grub  # Debian/Ubuntu
sudo grub2-mkconfig -o /boot/grub2/grub.cfg  # RHEL/Fedora

Important Kernel Parameters

# Common kernel command line parameters
root=/dev/sda2          # Root filesystem
ro                      # Mount root read-only initially
quiet                   # Suppress boot messages
splash                  # Show graphical splash screen
init=/bin/bash          # Override init (recovery)
single                  # Single-user mode
nomodeset               # Disable kernel mode setting
mem=4G                  # Limit usable memory
maxcpus=2               # Limit CPUs
console=ttyS0,115200    # Serial console
systemd.unit=rescue.target  # Boot to rescue mode

# View current parameters
cat /proc/cmdline

Other Bootloaders

BootloaderUse Case
GRUB 2Most Linux distributions
systemd-bootSimple UEFI-only, used by Arch, Pop!_OS
LILOLegacy, rarely used now
rEFIndMulti-boot with nice GUI
SyslinuxLightweight, USB/CD boot
U-BootEmbedded systems, ARM

Stage 3: Kernel Initialization

Kernel Loading

┌─────────────────────────────────────────────────────────────────┐
│                    KERNEL LOADING PROCESS                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Bootloader loads:                                               │
│  1. vmlinuz (compressed kernel image)                           │
│  2. initrd/initramfs (initial RAM disk)                         │
│                                                                  │
│  Kernel image structure:                                         │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │  Boot sector  │  Setup code  │  Compressed kernel       │    │
│  │  (legacy)     │  (real mode) │  (protected mode)        │    │
│  └───────────────┴──────────────┴──────────────────────────┘    │
│                                                                  │
│  Decompression:                                                  │
│  1. Setup code switches to protected mode                        │
│  2. Decompresses kernel to fixed address                         │
│  3. Jumps to kernel entry point                                  │
│                                                                  │
│  Kernel start (start_kernel):                                    │
│  • Initialize memory management                                  │
│  • Set up interrupt handlers                                     │
│  • Initialize scheduler                                          │
│  • Start kernel threads                                          │
│  • Mount initramfs                                               │
│  • Execute /init from initramfs                                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

initramfs (Initial RAM Filesystem)

┌─────────────────────────────────────────────────────────────────┐
│                    INITRAMFS PURPOSE                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Problem: Kernel needs drivers to mount root, but drivers       │
│           might be on the root filesystem!                       │
│                                                                  │
│  Solution: initramfs is a minimal filesystem in RAM with        │
│            essential drivers and tools                           │
│                                                                  │
│  Contents of initramfs:                                          │
│  /                                                               │
│  ├── bin/                  # Busybox, essential binaries        │
│  ├── sbin/                 # udevd, modprobe                    │
│  ├── lib/                  # Shared libraries                   │
│  ├── lib/modules/          # Kernel modules                     │
│  │   └── 5.15.0/                                                │
│  │       ├── ext4.ko       # Filesystem drivers                 │
│  │       ├── ahci.ko       # Disk controller drivers            │
│  │       └── ...                                                │
│  ├── etc/                  # Configuration                      │
│  ├── init                  # First script to run                │
│  └── dev/, proc/, sys/     # Virtual filesystems                │
│                                                                  │
│  Flow:                                                           │
│  1. Kernel unpacks initramfs to RAM                             │
│  2. Runs /init script                                            │
│  3. init loads necessary modules                                 │
│  4. init mounts real root filesystem                            │
│  5. init does switch_root to real root                          │
│  6. Executes real /sbin/init (systemd)                          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Examining initramfs

# List contents
lsinitramfs /boot/initrd.img-$(uname -r)  # Debian/Ubuntu
lsinitrd /boot/initramfs-$(uname -r).img  # RHEL/Fedora

# Extract and examine
mkdir /tmp/initrd
cd /tmp/initrd
zcat /boot/initrd.img-$(uname -r) | cpio -idmv

# Regenerate initramfs
update-initramfs -u  # Debian/Ubuntu
dracut --force       # RHEL/Fedora
mkinitcpio -P        # Arch Linux

Stage 4: Init System

Evolution of Init Systems

┌─────────────────────────────────────────────────────────────────┐
│                    INIT SYSTEM EVOLUTION                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SysV Init (1983)                                                │
│  └── Upstart (2006, Ubuntu)                                     │
│      └── systemd (2010, most distros now)                       │
│                                                                  │
│  BSD-style init                                                  │
│  └── OpenRC (Gentoo, Alpine)                                    │
│                                                                  │
│  Other:                                                          │
│  • runit (Void Linux)                                           │
│  • s6 (minimal systems)                                         │
│  • launchd (macOS)                                              │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

SysV Init (Traditional)

# Runlevels
0 - Halt
1 - Single-user mode
2 - Multi-user without networking
3 - Multi-user with networking
4 - Unused/custom
5 - Multi-user with GUI
6 - Reboot

# Boot process
# 1. /sbin/init reads /etc/inittab
# 2. Runs /etc/rc.d/rc.sysinit (system initialization)
# 3. Runs scripts for target runlevel: /etc/rc.d/rc3.d/

# Script naming convention
/etc/rc3.d/
├── K01bluetooth   # K = Kill (stop), 01 = order
├── S01sysstat     # S = Start, 01 = order
├── S10network
└── S99local

# All are symlinks to /etc/init.d/servicename

systemd (Modern)

┌─────────────────────────────────────────────────────────────────┐
│                    SYSTEMD ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Core Components:                                                │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │                      systemd (PID 1)                      │  │
│  │  • Service manager                                        │  │
│  │  • Socket/device activation                               │  │
│  │  • Parallelized boot                                      │  │
│  │  • Dependency management                                   │  │
│  └───────────────────────────────────────────────────────────┘  │
│                              │                                   │
│     ┌────────────────────────┼────────────────────────┐         │
│     │                        │                        │         │
│     ▼                        ▼                        ▼         │
│  ┌─────────┐           ┌──────────┐           ┌────────────┐   │
│  │journald │           │ logind   │           │ networkd   │   │
│  │(logging)│           │(sessions)│           │(networking)│   │
│  └─────────┘           └──────────┘           └────────────┘   │
│                                                                  │
│  Unit Types:                                                     │
│  • .service  - Daemons and processes                            │
│  • .socket   - Socket activation                                │
│  • .target   - Groups of units (like runlevels)                 │
│  • .mount    - Mount points                                     │
│  • .timer    - Scheduled tasks (like cron)                      │
│  • .path     - Path-based activation                            │
│  • .device   - Device units                                     │
│  • .slice    - Resource management (cgroups)                    │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

systemd Boot Targets

# Targets replace runlevels
poweroff.target    # Runlevel 0
rescue.target      # Runlevel 1
multi-user.target  # Runlevel 3
graphical.target   # Runlevel 5
reboot.target      # Runlevel 6

# Check current target
systemctl get-default

# Change default target
systemctl set-default multi-user.target

# Switch target immediately
systemctl isolate rescue.target

systemd Service Files

# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
Documentation=https://example.com/docs
After=network.target postgresql.service
Requires=postgresql.service

[Service]
Type=simple
User=myapp
Group=myapp
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/server --config /etc/myapp/config.yaml
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
StandardOutput=journal
StandardError=journal

# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes

[Install]
WantedBy=multi-user.target

systemd Commands

# Service management
systemctl start nginx           # Start service
systemctl stop nginx            # Stop service
systemctl restart nginx         # Restart
systemctl reload nginx          # Reload config
systemctl status nginx          # Show status
systemctl enable nginx          # Enable at boot
systemctl disable nginx         # Disable at boot
systemctl is-enabled nginx      # Check if enabled

# System commands
systemctl poweroff              # Shutdown
systemctl reboot                # Reboot
systemctl suspend               # Suspend
systemctl hibernate             # Hibernate

# Diagnostics
systemctl list-units            # List active units
systemctl list-unit-files       # List all units
systemctl list-dependencies     # Show dependencies
systemctl --failed              # Show failed units
systemd-analyze                 # Boot time analysis
systemd-analyze blame           # Time per service
systemd-analyze critical-chain  # Critical path

# Logs
journalctl -u nginx             # Logs for unit
journalctl -f                   # Follow logs
journalctl -b                   # Current boot
journalctl --since "1 hour ago" # Time filter

Boot Time Optimization

Analyzing Boot Time

# Overall boot time
$ systemd-analyze
Startup finished in 3.456s (firmware) + 1.234s (loader) + 
                   2.345s (kernel) + 5.678s (userspace) = 12.713s

# Per-service breakdown
$ systemd-analyze blame
          5.012s NetworkManager-wait-online.service
          2.345s docker.service
          1.234s snapd.service
          ...

# Critical path
$ systemd-analyze critical-chain
graphical.target @5.678s
└─multi-user.target @5.678s
  └─docker.service @3.333s +2.345s
    └─network.target @3.332s
      └─NetworkManager.service @2.123s +1.209s

Optimization Techniques

# 1. Disable unnecessary services
systemctl disable bluetooth
systemctl mask NetworkManager-wait-online.service

# 2. Use socket activation (defer startup)
# Service starts only when socket is accessed

# 3. Reduce kernel modules in initramfs
# /etc/initramfs-tools/modules - only needed modules

# 4. Use faster filesystem
# ext4 with fast_commit, or XFS

# 5. Parallel service startup (systemd default)
# Ensure proper After=/Requires= dependencies

# 6. Profile boot with bootchart
systemd-bootchart

Troubleshooting Boot Issues

Common Problems

┌─────────────────────────────────────────────────────────────────┐
│                    BOOT TROUBLESHOOTING                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Symptom                    Likely Cause                         │
│  ────────────────────────   ─────────────────────────────────   │
│  No POST                    Hardware (PSU, RAM, CPU)            │
│  BIOS but no bootloader     MBR/ESP corrupted, wrong boot order │
│  GRUB rescue prompt         Missing grub files                  │
│  Kernel panic               Missing initramfs, wrong root=      │
│  initramfs drops to shell   Can't find/mount root filesystem    │
│  systemd emergency mode     Failed mounts, bad fstab            │
│  Hangs at starting X        GPU driver issues                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Recovery Techniques

# Boot to rescue mode
# In GRUB: edit entry, add 'single' or 'init=/bin/bash'

# Fix GRUB
# Boot from live USB
mount /dev/sda2 /mnt
mount /dev/sda1 /mnt/boot/efi  # If UEFI
for i in /dev /dev/pts /proc /sys /run; do
    mount -B $i /mnt$i
done
chroot /mnt
grub-install /dev/sda  # or for UEFI:
grub-install --target=x86_64-efi --efi-directory=/boot/efi
update-grub
exit
reboot

# Fix initramfs
chroot /mnt
update-initramfs -u -k all

# Fix fstab (common boot hang cause)
# Boot with init=/bin/bash
mount -o remount,rw /
vim /etc/fstab  # Fix or comment out bad entries
reboot

# View boot logs after recovery
journalctl -b -1  # Previous boot
journalctl -xb     # Current boot with explanations

Interview Questions

Answer:
  1. PSU provides power, CPU reset vector loads firmware
  2. BIOS/UEFI POST tests hardware
  3. Firmware finds boot device, loads bootloader
  4. Bootloader (GRUB) loads kernel + initramfs
  5. Kernel initializes hardware, mounts initramfs
  6. initramfs /init finds and mounts real root
  7. systemd (PID 1) starts and brings up services
  8. Login prompt or display manager appears
Key differences:
  • Mode: BIOS is 16-bit real mode, UEFI is 32/64-bit
  • Partition: BIOS uses MBR (2TB limit), UEFI uses GPT
  • Boot code: BIOS in MBR (446 bytes), UEFI in ESP (FAT32)
  • Security: UEFI has Secure Boot
  • Speed: UEFI boots faster
  • Interface: UEFI has graphical setup, network boot
Answer: The kernel needs drivers to mount the root filesystem, but those drivers might be on the root filesystem (chicken-and-egg problem).initramfs is a temporary root filesystem in RAM containing:
  • Essential kernel modules (disk, filesystem drivers)
  • Tools to find and mount the real root
  • Support for LVM, RAID, encryption
After mounting the real root, initramfs switches to it.
Improvements:
  • Parallel startup: Dependencies allow concurrent start
  • Socket activation: Services start on-demand
  • Unified logging: journald replaces multiple log files
  • cgroups: Resource management per service
  • Declarative: Unit files vs shell scripts
  • Dependency management: Automatic ordering
  • Transactional: Start/stop is all-or-nothing

Summary

┌─────────────────────────────────────────────────────────────────┐
│                    BOOT PROCESS SUMMARY                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Stage       │ Component        │ Key Files                     │
│  ───────────────────────────────────────────────────────────    │
│  Firmware    │ BIOS/UEFI        │ ROM, NVRAM                    │
│  Bootloader  │ GRUB             │ /boot/grub/grub.cfg           │
│  Kernel      │ Linux            │ /boot/vmlinuz-*               │
│  initramfs   │ Initial root     │ /boot/initrd.img-*            │
│  Init        │ systemd          │ /etc/systemd/system/*         │
│                                                                  │
│  Key Commands:                                                   │
│  • systemd-analyze      - Boot time analysis                    │
│  • journalctl -b        - Boot logs                             │
│  • update-grub          - Regenerate GRUB config                │
│  • update-initramfs -u  - Regenerate initramfs                  │
│  • systemctl            - Manage services                       │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘