Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Chapter 3: Staging & Index
The staging area is Git’s “preparation zone” for commits. It’s what makes Git special compared to other version control systems. In this chapter, we’ll implement the index file and theadd and status commands.
Think of the staging area like a photographer’s “shot list.” You don’t have to include every photo from the day in the album — you deliberately choose which shots make the cut. The staging area gives you the same power: you decide exactly which changes go into the next commit, even if you’ve modified ten files. This is one of Git’s most underappreciated features, and building it yourself will make you understand why git add -p (partial staging) is even possible.
Prerequisites: Completed Chapter 2: Object Model
Time: 2-3 hours
Outcome: Working
Time: 2-3 hours
Outcome: Working
add and status commandsWhy a Staging Area?
Other VCS systems commit all changes at once. Git’s staging area lets you:The Index File Format
The index is a binary file at.git/index. Let’s understand its format:
Implementation
Step 1: Index Parser/Writer
src/utils/index.js
Step 2: Implement the Add Command
src/commands/add.js
Step 3: Implement the Status Command
src/commands/status.js
Step 4: Update CLI Entry Point
src/mygit.js
Testing Your Implementation
Exercises
Exercise 1: Implement rm command
Exercise 1: Implement rm command
Create a
rm command that removes files from the index:Exercise 2: Add glob patterns
Exercise 2: Add glob patterns
Support glob patterns in the add command:Hint: Use the
minimatch npm package.Exercise 3: Implement diff
Exercise 3: Implement diff
Show the actual differences in
status output:Key Concepts
Three Trees
Git has three areas: Working Directory, Index (Stage), and HEAD commit
Binary Index
The index is a binary file optimized for fast status checks
Metadata Caching
File timestamps help Git quickly detect changes without hashing
Atomic Updates
The index is written atomically to prevent corruption
What’s Next?
In Chapter 4: Commits & History, we’ll implement:- The
commitcommand - Tree objects from the index
- Commit objects with parent linking
- The
logcommand
Next: Commits & History
Learn how Git creates commits and maintains history
Interview Deep-Dive
Why does Git have a staging area (index) at all? Other version control systems commit all changes at once. What does the index enable?
Why does Git have a staging area (index) at all? Other version control systems commit all changes at once. What does the index enable?
Strong Answer:
- The staging area enables partial commits: you can modify ten files but only commit three. This lets you create logical, atomic commits that each represent one coherent change, even when your working directory contains multiple unrelated modifications. This is essential for maintaining a clean, reviewable commit history.
- It enables
git add -p(patch mode), where you can stage individual hunks within a file. You might fix a bug and refactor a function in the same file — the index lets you commit the bugfix in one commit and the refactor in another, even though they are in the same file. - The index also serves as a performance cache. It stores file metadata (mtime, size, inode number) alongside the blob hash. When
git statusruns, it compares current file stats against the index’s cached stats. If they match, Git skips the expensive step of re-hashing the file. This “stat cache” is whygit statusis fast even on repositories with tens of thousands of files. - The three-tree model (working directory, index, HEAD) gives Git its flexibility. Each comparison reveals different information: index vs. HEAD shows staged changes; working directory vs. index shows unstaged changes; working directory vs. HEAD shows the total diff.
git diff,git diff --staged, andgit diff HEADcorrespond to these three comparisons.
How does git status determine whether a file has been modified without reading the file's entire contents?
How does git status determine whether a file has been modified without reading the file's entire contents?
Strong Answer:
- Git uses a stat cache optimization. The index stores filesystem metadata for each tracked file: mtime (modification time), ctime (status change time), file size, inode number, and device number. When
git statusruns, it callsstat()on each tracked file and compares the results against the cached values. - If mtime, size, and inode all match, Git assumes the file has not changed and skips hashing. This is the fast path that makes
git statussub-second on large repositories. - If any stat field differs, Git reads the file, computes its SHA-1 hash, and compares it to the hash stored in the index. If the hashes match (the file was touched but not actually changed, like a build system updating mtime), Git may update the cached stat values in the index to avoid re-checking next time. This is called “index refresh.”
- The edge case is the “racy git” problem: if a file is modified within the same second as the index was written (so mtime matches), Git cannot detect the change via stat alone. Git handles this by flagging entries written in the same second as the index and always re-hashing them on the next status check. This ensures correctness at the cost of a small performance penalty for recently-modified files.
git add and git status in the same second. If the filesystem’s mtime resolution is one second (common on ext3, HFS+), the index write and the file modification have the same timestamp. On the next git status, Git cannot tell whether the file was modified after the index write. Git’s defense is to mark entries whose mtime matches the index write time as “racily clean” and always re-hash them. You can reproduce it with a script that modifies a file and runs git add in a tight loop, checking if git status ever misses a modification. On modern filesystems with nanosecond timestamps (ext4, APFS), the window is much smaller, but the protection logic still exists for portability.What would happen if the index file became corrupted? How would you recover?
What would happen if the index file became corrupted? How would you recover?
Strong Answer:
- The index has a SHA-1 checksum appended at the end. If the file is corrupted, Git detects it on the next read and refuses to proceed with an error like “index file corrupt.” This prevents silent data loss.
- Recovery is straightforward: delete
.git/indexand rungit reset. This regenerates the index from the HEAD commit’s tree. You lose your staged changes (anything yougit added but did not commit), but committed history is unaffected because it is stored in the object database, not the index. - If you had important staged changes, you might try to recover them from the object database:
git fsck --unreachablelists objects not referenced by any commit, and some of those might be blobs you had staged. You can inspect them withgit cat-file -p <hash>and manually recover the content. - The broader lesson is that the index is a derived, regenerable cache. The authoritative data is in the object store (blobs, trees, commits) and refs (branches, tags). Losing the index is an inconvenience, not a disaster. This is by design — transient working state should never be the only copy of important data.
git status on machine B to think every file has changed (because the inodes and timestamps do not match). This is why .git/index is listed in .gitignore by convention and is never transferred during git clone or git fetch. Each clone generates its own index from the HEAD tree.