Virtual File System (VFS)
0. Files, Inodes, and Paths: The Basics
What is a File?
What is an Inode?
What is a Directory?
Hard Links and Soft Links
Path Resolution
1. The VFS Architecture
The Four Primary Objects
2. Path Lookup: The “Walk”
The Lookup Process
Optimization: RCU-Walk
3. On-Disk Layout & Allocation
Block Allocation
Journaling: Atomic Operations
4. Log-Structured File Systems (LFS)
5. Page Cache & Writeback
6. Virtual Filesystems (procfs, sysfs)
7. Choosing a Filesystem in Production
Decision Flowchart
Comparison Table
Production Checklist
Quick Commands
Summary for Senior Engineers

Virtual File System (VFS)

The File System is the primary abstraction for long-term storage. A “Senior” understanding requires knowing how the kernel provides a unified interface across hundreds of different disk formats and how it optimizes the expensive process of finding a file on disk.

0. Files, Inodes, and Paths: The Basics

Before diving into VFS internals, make sure these fundamentals are crystal clear.

What is a File?

A file is an abstraction the OS provides over raw disk blocks. To the user, a file is:

A name (like report.txt).
A sequence of bytes (the content).
Some metadata (size, permissions, timestamps).

To the kernel, a file is not the name—the name is just a pointer.

What is an Inode?

An Inode (Index Node) is the kernel’s internal representation of a file. It contains:

Metadata: size, owner (UID/GID), permissions, timestamps (atime, mtime, ctime).
Block pointers: Where on the disk the file’s data actually lives.
No filename: The inode does not store the file’s name.

Think of an inode as a library catalog card: it tells you everything about the book (author, subject, location) but not what the book is called on the shelf.

What is a Directory?

A directory is a special file that maps names → inode numbers. When you ls a directory, the kernel:

Reads the directory’s data blocks.
Lists all (name, inode) pairs inside.

This is why renaming a file within the same filesystem is instant: only the directory entry changes, not the file’s data or inode.

Hard Links and Soft Links

Hard link: Another name pointing to the same inode. Both names are equal; deleting one doesn’t delete the data until the link count reaches zero.
Soft link (symlink): A file whose content is the path to another file. It can break if the target is deleted.

Path Resolution

When you open("/home/user/file.txt", ...):

Kernel starts at the root inode (/).
Looks up home in /’s directory → gets inode for /home.
Looks up user in /home → gets inode for /home/user.
Looks up file.txt in /home/user → gets inode for the file.
Returns a file descriptor referencing that inode.

This is the path walk, and optimizing it (via the dentry cache) is one of the kernel’s most critical jobs.

1. The VFS Architecture

The Virtual File System (VFS) is a software layer in the kernel that provides the standard file-related system calls (open, read, write) to user-space, regardless of the underlying filesystem (EXT4, XFS, NFS, ProcFS).

The Four Primary Objects

Superblock: Represents an entire mounted filesystem. It contains metadata like the block size, total number of inodes, and the “magic number.”
Inode (Index Node): Represents a specific file (or directory) on disk. It contains all metadata (size, permissions, timestamps, block pointers) but not the filename.
Dentry (Directory Entry): Connects an Inode to a filename. Dentries are transient objects created in memory to speed up path lookups.
File: Represents an open file in a specific process. It stores the current “cursor” position (f_pos) and the access mode (Read/Write).

2. Path Lookup: The “Walk”

Converting a string like /home/user/code/main.c into an Inode is one of the most performance-critical paths in the kernel.

The Lookup Process

Split the path: Divide by /.
Dentry Cache (dcache) Lookup: The kernel first checks the memory-resident Dcache. If the dentry for home is found, it moves to the next part.
Directory Traversal: If not in cache, the kernel must read the directory’s data blocks from disk to find the mapping from the filename to an Inode number.
Inode Cache: Once the Inode number is found, the kernel checks the Inode cache or reads it from the Inode table on disk.

Optimization: RCU-Walk

To handle thousands of concurrent lookups (e.g., on a web server), modern Linux uses RCU-walk. It performs the path walk without taking any locks, assuming the directory structure won’t change. If a change is detected, it falls back to the slower “Ref-walk” with proper locking.

3. On-Disk Layout & Allocation

How data is actually stored on the physical platters or NAND cells.

Block Allocation

Extents: Instead of a list of block numbers (which is inefficient for large files), modern filesystems (EXT4, XFS) use Extents—a starting block number and a length (e.g., “Blocks 1000 to 2000”).
Delayed Allocation: The kernel buffers writes in memory and waits as long as possible before choosing physical blocks on disk. This allows the allocator to find a single contiguous extent for the entire file, reducing fragmentation.

Journaling: Atomic Operations

Writing to a file involves multiple steps: updating the bitmap, the Inode, and the data blocks. If power is lost mid-way, the filesystem becomes inconsistent.

The Journal: A dedicated circular buffer on disk. The kernel first writes the intended changes to the journal. Once the journal write is “committed,” it updates the actual filesystem. On reboot, the kernel simply “replays” the journal to restore consistency.

4. Log-Structured File Systems (LFS)

Used in Flash/SSDs (e.g., F2FS).

Philosophy: Never overwrite data. Treat the entire disk as a circular log.
Benefit: Converts random writes into sequential writes, which is significantly faster for NAND and reduces wear.
Trade-off: Requires a Garbage Collector to reclaim space from “dead” versions of files.

5. Page Cache & Writeback

The filesystem does not talk directly to the disk for every write().

Write to Page Cache: The write() syscall simply copies data into kernel memory (Pages). The page is marked as Dirty.
Flushing (Writeback): A background kernel thread (bdi_writeback) periodically writes dirty pages to disk.
Direct I/O: Applications (like Databases) can use O_DIRECT to bypass the page cache and manage their own buffers to avoid “Double Buffering.”

6. Virtual Filesystems (procfs, sysfs)

Not all filesystems represent disks.

ProcFS (/proc): Exposes kernel data structures (processes, memory stats) as files.
SysFS (/sys): Exposes the hardware device tree and driver configurations.
TmpFS: A filesystem that exists entirely in RAM.

7. Choosing a Filesystem in Production

Selecting the right filesystem for your workload is a key architectural decision. Here’s a practical guide:

Decision Flowchart

┌─────────────────────────────────────────────────────────────────────┐
│                  FILESYSTEM SELECTION GUIDE                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  What's your primary concern?                                       │
│                                                                     │
│  ┌─────────────────┐     ┌─────────────────┐    ┌────────────────┐ │
│  │  DATA INTEGRITY │     │   PERFORMANCE   │    │   FLEXIBILITY  │ │
│  └────────┬────────┘     └────────┬────────┘    └───────┬────────┘ │
│           │                       │                      │          │
│           ▼                       ▼                      ▼          │
│  ┌─────────────────┐     ┌─────────────────┐    ┌────────────────┐ │
│  │      ZFS        │     │   XFS or EXT4   │    │     Btrfs      │ │
│  │  • Checksums    │     │   • Low latency │    │  • Snapshots   │ │
│  │  • Scrubbing    │     │   • High IOPS   │    │  • Compression │ │
│  │  • RAID-Z       │     │   • Scalability │    │  • Subvolumes  │ │
│  └─────────────────┘     └─────────────────┘    └────────────────┘ │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Comparison Table

Filesystem	Best For	Avoid When	Journaling	Max File Size
EXT4	General Linux, boot partitions	Very large files (>16TB)	Metadata or Data	16 TB
XFS	Large files, high parallelism	Lots of small files, shrinking	Metadata	8 EB
Btrfs	Snapshots, compression, dev	Production databases (still maturing)	Copy-on-Write	16 EB
ZFS	Data integrity, NAS, backups	Low-memory systems (less than 8GB)	Copy-on-Write	16 EB
F2FS	SSDs, flash storage	HDDs	Log-structured	16 TB

Production Checklist

Database workloads (MySQL, PostgreSQL): XFS or EXT4 with noatime, data=ordered.
Container hosts (Docker, Kubernetes): XFS for overlay2 driver, or Btrfs for native driver.
Backup servers: ZFS (checksums detect silent corruption) or Btrfs (snapshots).
High-throughput analytics: XFS (better large file and parallel write performance).
Embedded/Flash: F2FS (reduces write amplification).

Quick Commands

# Check current filesystem
df -T /

# Create XFS filesystem (recommended for servers)
mkfs.xfs -f /dev/sdb1

# Mount with performance options
mount -o noatime,nodiratime,discard /dev/sdb1 /data

# Check filesystem-specific features
tune2fs -l /dev/sda1 | grep features   # EXT4
xfs_info /dev/sdb1                      # XFS
btrfs filesystem show /data             # Btrfs
zpool status                            # ZFS

Summary for Senior Engineers

Filenames are just pointers: A file can have multiple names (Hard Links) pointing to the same Inode.
Dentry Cache is the bottleneck for cold-start performance (e.g., npm install).
Extents and Delayed Allocation are the primary defenses against disk fragmentation.
Journaling protects the filesystem metadata, but not necessarily your data (unless data=journal mode is used).

Next: I/O Systems & Modern I/O →

IPC I/O Systems

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Virtual File System (VFS)

​0. Files, Inodes, and Paths: The Basics

​What is a File?

​What is an Inode?

​What is a Directory?

​Hard Links and Soft Links

​Path Resolution

​1. The VFS Architecture

​The Four Primary Objects

​2. Path Lookup: The “Walk”

​The Lookup Process

​Optimization: RCU-Walk

​3. On-Disk Layout & Allocation

​Block Allocation

​Journaling: Atomic Operations

​4. Log-Structured File Systems (LFS)

​5. Page Cache & Writeback

​6. Virtual Filesystems (procfs, sysfs)

​7. Choosing a Filesystem in Production

​Decision Flowchart

​Comparison Table

​Production Checklist

​Quick Commands

​Summary for Senior Engineers