Linux Internals Deep Dive
If you love understanding how things actually work, this chapter is for you. If you just want to run commands and get things done, feel free to skip ahead. No judgment.This chapter takes you beneath the surface of Linux. We will explore how the kernel manages processes, understand how system calls bridge user space and kernel space, and demystify the virtual filesystem. This knowledge is what transforms a Linux user into a Linux engineer.
Why Internals Matter
Understanding Linux internals helps you:- Debug performance issues when top and htop are not enough
- Write better software that works with the kernel, not against it
- Ace interviews where internals questions are common
- Understand containers since Docker relies on kernel features
- Troubleshoot production systems at a deeper level
User Space vs Kernel Space
The most fundamental concept: Linux divides memory into two distinct spaces.- Security: Buggy user programs cannot crash the kernel
- Stability: One process cannot corrupt another
- Abstraction: Applications do not need to know hardware details
System Calls: The Bridge
When a user program needs kernel services (read file, open network connection, create process), it makes a system call.Anatomy of a System Call
Common System Calls
| Category | System Calls | Purpose |
|---|---|---|
| Process | fork, exec, exit, wait | Create and manage processes |
| File | open, read, write, close | File operations |
| Network | socket, bind, listen, accept | Network operations |
| Memory | mmap, brk, mprotect | Memory management |
| IPC | pipe, shmget, semget | Inter-process communication |
Tracing System Calls
Process Management
What is a Process?
A process is a running program. It includes:- Code: The program instructions
- Data: Variables and heap
- Stack: Function calls and local variables
- Registers: CPU state
- File descriptors: Open files, sockets
- Memory mappings: Virtual memory layout
Process Control Block (PCB)
The kernel maintains atask_struct for each process:
Process States
The Scheduler
Linux uses the Completely Fair Scheduler (CFS) for normal processes:Real-Time Scheduling
For time-critical tasks, Linux provides real-time schedulers:| Policy | Description |
|---|---|
| SCHED_FIFO | First-in, first-out, runs until blocks or yields |
| SCHED_RR | Round-robin, time-sliced FIFO |
| SCHED_DEADLINE | Earliest deadline first (newest) |
| SCHED_OTHER | Default CFS scheduler |
Memory Management
Virtual Memory
Every process gets its own virtual address space:Page Tables
Virtual addresses translate to physical addresses via page tables:The Page Cache
Linux aggressively caches file data in RAM:- Read a file? It stays in cache for future reads
- Write a file? Goes to cache first, flushed to disk later
- Running low on memory? Cache pages are evicted first
Memory Allocation
When a process requests memory:The Virtual Filesystem (VFS)
Linux abstracts all filesystems through a common interface.VFS Architecture
Key VFS Concepts
Inode: Metadata about a file (permissions, size, timestamps, block pointers). Does NOT contain the filename.Everything is a File
This Unix philosophy extends to:| Path | What It Is |
|---|---|
/dev/sda | Block device (hard drive) |
/dev/null | Bit bucket (discards writes) |
/dev/random | Random number generator |
/proc/cpuinfo | CPU information (kernel data) |
/sys/class/net | Network interface info |
/dev/stdin | Standard input |
Networking Internals
The Network Stack
Socket Buffers
Data flows through socket buffers:Netfilter and iptables
Netfilter provides hooks for packet processing:Interview Deep Dive Questions
What happens when you run a program?
What happens when you run a program?
Answer: 1) Shell calls fork() to create child process, 2) Child calls exec() to load program, 3) Kernel loads ELF binary, sets up memory mappings, 4) Kernel sets up stack with argc, argv, environment, 5) Control transfers to program entry point (usually _start in libc), 6) _start calls main(), 7) Program runs, 8) exit() called, kernel cleans up resources.
Explain the difference between processes and threads
Explain the difference between processes and threads
Answer: Processes have separate address spaces, file descriptors, and resources. Threads share address space, heap, and file descriptors within a process, but have separate stacks and registers. In Linux, both are task_struct - threads share mm_struct (memory) while processes have separate ones. Threads are cheaper to create (no memory copy) but share bugs.
What is a context switch?
What is a context switch?
Answer: When the kernel switches from running one process to another: 1) Save current process registers to task_struct, 2) Save current MMU context (page tables), 3) Select next process to run (scheduler), 4) Load new process registers from task_struct, 5) Restore MMU context, 6) Jump to new process instruction pointer. Context switches are expensive (cache invalidation, TLB flush).
How does Linux handle memory overcommit?
How does Linux handle memory overcommit?
Answer: By default, Linux allows processes to allocate more virtual memory than physical RAM (overcommit). Actual physical pages are allocated on first access (demand paging). If system runs out of memory, OOM killer selects and kills processes. Controlled by vm.overcommit_memory: 0=heuristic, 1=always allow, 2=never overcommit.
Explain the purpose of /proc and /sys
Explain the purpose of /proc and /sys
Answer: Both are virtual filesystems - no actual disk storage. /proc exposes kernel data structures and process info (originated in Unix). /sys is newer, provides structured hardware/driver info (Linux 2.6+). /proc has accumulated cruft, /sys is more organized. Examples: /proc/meminfo, /proc/1234/status, /sys/class/net/eth0/address.
What is the OOM killer and how does it work?
What is the OOM killer and how does it work?
Answer: Out-of-Memory killer is invoked when system is critically low on memory. It calculates oom_score for each process based on: memory usage, runtime, nice value, whether it is privileged. Highest score gets killed first. oom_score_adj (-1000 to 1000) can be set in /proc/pid/oom_score_adj. -1000 makes process unkillable (risky).
Exploring Internals Yourself
Key Takeaways
- User space and kernel space are separated - for security and stability
- System calls are the bridge - only way to request kernel services
- Everything is a file - devices, processes, kernel data exposed as files
- Virtual memory provides isolation - each process has its own address space
- CFS scheduler ensures fairness - virtual runtime tracks CPU usage
- Page cache makes I/O fast - files cached in RAM automatically
- VFS abstracts filesystems - same interface for ext4, NFS, procfs
- Namespaces and cgroups enable containers - isolation and resource limits
Ready to master the command line? Next up: Linux Permissions where we will dive deep into users, groups, and access control.