Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Chapter 3: Staging & Index

The staging area is Git’s “preparation zone” for commits. It’s what makes Git special compared to other version control systems. In this chapter, we’ll implement the index file and the add and status commands. Think of the staging area like a photographer’s “shot list.” You don’t have to include every photo from the day in the album — you deliberately choose which shots make the cut. The staging area gives you the same power: you decide exactly which changes go into the next commit, even if you’ve modified ten files. This is one of Git’s most underappreciated features, and building it yourself will make you understand why git add -p (partial staging) is even possible.
Prerequisites: Completed Chapter 2: Object Model
Time: 2-3 hours
Outcome: Working add and status commands

Why a Staging Area?

Other VCS systems commit all changes at once. Git’s staging area lets you:
┌─────────────────────────────────────────────────────────────────────────────┐
│                        GIT'S THREE-TREE ARCHITECTURE                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   WORKING DIRECTORY        INDEX (STAGING)           HEAD (LAST COMMIT)     │
│   ─────────────────        ───────────────           ─────────────────      │
│                                                                              │
│   ┌─────────────┐          ┌─────────────┐          ┌─────────────┐         │
│   │ file.txt    │   add    │ file.txt    │  commit  │ file.txt    │         │
│   │ (edited)    │ ───────► │ (staged)    │ ───────► │ (committed) │         │
│   └─────────────┘          └─────────────┘          └─────────────┘         │
│                                                                              │
│   • Your actual files      • Snapshot for next      • Previous commit       │
│   • What you edit            commit                 • Immutable             │
│   • Not tracked by Git     • Binary index file      • Object in store       │
│                              (.git/index)                                    │
│                                                                              │
│   BENEFITS:                                                                  │
│   • Stage partial changes from a file                                       │
│   • Review what will be committed                                           │
│   • Build commits incrementally                                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Index File Format

The index is a binary file at .git/index. Let’s understand its format:
INDEX FILE FORMAT (Version 2)
==============================

HEADER (12 bytes)
├── Signature: "DIRC" (4 bytes) - stands for "DirCache"
├── Version: 2 (4 bytes, big-endian)
└── Number of entries (4 bytes, big-endian)

ENTRIES (variable)
├── Entry 1
│   ├── ctime seconds (4 bytes)
│   ├── ctime nanoseconds (4 bytes)
│   ├── mtime seconds (4 bytes)
│   ├── mtime nanoseconds (4 bytes)
│   ├── dev (4 bytes)
│   ├── ino (4 bytes)
│   ├── mode (4 bytes) - e.g., 100644
│   ├── uid (4 bytes)
│   ├── gid (4 bytes)
│   ├── file size (4 bytes)
│   ├── SHA-1 hash (20 bytes)
│   ├── flags (2 bytes) - includes name length
│   ├── name (variable, null-terminated)
│   └── padding (1-8 bytes to align to 8 bytes)
├── Entry 2
│   └── ...
└── Entry N

CHECKSUM (20 bytes)
└── SHA-1 of everything above
The index format is complex because Git optimizes for speed. File metadata (mtime, size) helps Git quickly detect changes without reading file contents. This is called a “stat cache” — Git compares the file’s modification time and size against what the index remembers. If they match, Git skips the expensive step of hashing the file’s content. On large repos with tens of thousands of files, this optimization turns git status from a multi-second operation into a sub-second one.

Implementation

Step 1: Index Parser/Writer

src/utils/index.js
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

/**
 * Represents a single index entry
 */
class IndexEntry {
    // Each field maps directly to a fixed-width binary field in the index file.
    // ctime/mtime: Filesystem timestamps used for the stat cache optimization.
    // dev/ino: Device and inode numbers -- help detect if a file was replaced
    //          by a different file with the same name (e.g., `mv other file.txt`).
    // mode: Unix file permissions. 0o100644 = regular file, 0o100755 = executable.
    // uid/gid: Owner user/group. Mostly zeros on Windows.
    // size: File size in bytes. Combined with mtime, this is Git's fast-path
    //       for detecting "did this file change?" without re-hashing.
    // hash: SHA-1 of the blob stored in the object database.
    // flags: Packed field; the lower 12 bits store the path name length.
    // name: Relative path from the repo root (always forward slashes).
    constructor({
        ctimeSeconds = 0,
        ctimeNanoseconds = 0,
        mtimeSeconds = 0,
        mtimeNanoseconds = 0,
        dev = 0,
        ino = 0,
        mode = 0o100644,
        uid = 0,
        gid = 0,
        size = 0,
        hash = '',
        flags = 0,
        name = ''
    }) {
        this.ctimeSeconds = ctimeSeconds;
        this.ctimeNanoseconds = ctimeNanoseconds;
        this.mtimeSeconds = mtimeSeconds;
        this.mtimeNanoseconds = mtimeNanoseconds;
        this.dev = dev;
        this.ino = ino;
        this.mode = mode;
        this.uid = uid;
        this.gid = gid;
        this.size = size;
        this.hash = hash;
        this.flags = flags;
        this.name = name;
    }
    
    /**
     * Create an entry from a file and its hash
     */
    static fromFile(filePath, hash, repoRoot) {
        const stat = fs.statSync(filePath);
        const relativePath = path.relative(repoRoot, filePath)
            .split(path.sep).join('/'); // Normalize to forward slashes
        
        return new IndexEntry({
            ctimeSeconds: Math.floor(stat.ctimeMs / 1000),
            ctimeNanoseconds: (stat.ctimeMs % 1000) * 1000000,
            mtimeSeconds: Math.floor(stat.mtimeMs / 1000),
            mtimeNanoseconds: (stat.mtimeMs % 1000) * 1000000,
            dev: stat.dev,
            ino: stat.ino,
            mode: stat.mode,
            uid: stat.uid,
            gid: stat.gid,
            size: stat.size,
            hash: hash,
            flags: Math.min(relativePath.length, 0xFFF), // 12-bit name length
            name: relativePath
        });
    }
}

/**
 * The Git index (staging area)
 */
class Index {
    constructor() {
        this.version = 2;
        this.entries = new Map(); // name -> IndexEntry
    }
    
    /**
     * Add or update an entry.
     * Using a Map keyed by name means adding the same file twice
     * silently overwrites the old entry -- exactly what `git add` does.
     */
    addEntry(entry) {
        this.entries.set(entry.name, entry);
    }
    
    /**
     * Remove an entry
     */
    removeEntry(name) {
        this.entries.delete(name);
    }
    
    /**
     * Get an entry by name
     */
    getEntry(name) {
        return this.entries.get(name);
    }
    
    /**
     * Get all entries sorted by name.
     * Sorting is mandatory -- the index file format requires entries in
     * sorted order so that Git can binary-search for paths. If you skip
     * sorting, your index file will be unreadable by real Git.
     */
    getEntries() {
        return Array.from(this.entries.values())
            .sort((a, b) => a.name.localeCompare(b.name));
    }
    
    /**
     * Read index from .git/index file
     */
    static read(gitDir) {
        const indexPath = path.join(gitDir, 'index');
        const index = new Index();
        
        if (!fs.existsSync(indexPath)) {
            return index; // Empty index
        }
        
        const data = fs.readFileSync(indexPath);
        let offset = 0;
        
        // Parse header
        const signature = data.slice(0, 4).toString();
        if (signature !== 'DIRC') {
            throw new Error('Invalid index file signature');
        }
        offset += 4;
        
        const version = data.readUInt32BE(offset);
        if (version !== 2) {
            throw new Error(`Unsupported index version: ${version}`);
        }
        offset += 4;
        
        const entryCount = data.readUInt32BE(offset);
        offset += 4;
        
        // Parse entries
        for (let i = 0; i < entryCount; i++) {
            const entry = new IndexEntry({});
            
            entry.ctimeSeconds = data.readUInt32BE(offset); offset += 4;
            entry.ctimeNanoseconds = data.readUInt32BE(offset); offset += 4;
            entry.mtimeSeconds = data.readUInt32BE(offset); offset += 4;
            entry.mtimeNanoseconds = data.readUInt32BE(offset); offset += 4;
            entry.dev = data.readUInt32BE(offset); offset += 4;
            entry.ino = data.readUInt32BE(offset); offset += 4;
            entry.mode = data.readUInt32BE(offset); offset += 4;
            entry.uid = data.readUInt32BE(offset); offset += 4;
            entry.gid = data.readUInt32BE(offset); offset += 4;
            entry.size = data.readUInt32BE(offset); offset += 4;
            
            // SHA-1 hash (20 bytes)
            entry.hash = data.slice(offset, offset + 20).toString('hex');
            offset += 20;
            
            // Flags (2 bytes, includes name length in lower 12 bits)
            entry.flags = data.readUInt16BE(offset);
            offset += 2;
            
            // Name (null-terminated)
            const nameLength = entry.flags & 0xFFF;
            const nullIndex = data.indexOf(0, offset);
            entry.name = data.slice(offset, nullIndex).toString();
            offset = nullIndex + 1;
            
            // Skip padding (align to 8 bytes)
            // Entry size so far: 62 bytes + name length + 1 (null byte)
            const entrySize = 62 + entry.name.length + 1;
            const padding = (8 - (entrySize % 8)) % 8;
            offset += padding;
            
            index.addEntry(entry);
        }
        
        return index;
    }
    
    /**
     * Write index to .git/index file
     */
    write(gitDir) {
        const entries = this.getEntries();
        const chunks = [];
        
        // Header
        const header = Buffer.alloc(12);
        header.write('DIRC', 0);
        header.writeUInt32BE(this.version, 4);
        header.writeUInt32BE(entries.length, 8);
        chunks.push(header);
        
        // Entries
        for (const entry of entries) {
            const entryData = Buffer.alloc(62); // Fixed fields
            let offset = 0;
            
            entryData.writeUInt32BE(entry.ctimeSeconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.ctimeNanoseconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.mtimeSeconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.mtimeNanoseconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.dev, offset); offset += 4;
            entryData.writeUInt32BE(entry.ino, offset); offset += 4;
            entryData.writeUInt32BE(entry.mode, offset); offset += 4;
            entryData.writeUInt32BE(entry.uid, offset); offset += 4;
            entryData.writeUInt32BE(entry.gid, offset); offset += 4;
            entryData.writeUInt32BE(entry.size, offset); offset += 4;
            
            // SHA-1 hash
            const hashBuffer = Buffer.from(entry.hash, 'hex');
            hashBuffer.copy(entryData, offset);
            offset += 20;
            
            // Flags
            entryData.writeUInt16BE(entry.flags, offset);
            
            chunks.push(entryData);
            
            // Name (null-terminated with padding)
            const nameBuffer = Buffer.from(entry.name + '\0');
            chunks.push(nameBuffer);
            
            // Padding to 8-byte boundary
            const entrySize = 62 + entry.name.length + 1;
            const padding = (8 - (entrySize % 8)) % 8;
            if (padding > 0) {
                chunks.push(Buffer.alloc(padding, 0));
            }
        }
        
        // Combine all chunks
        const content = Buffer.concat(chunks);
        
        // Add checksum
        const checksum = crypto.createHash('sha1').update(content).digest();
        const finalData = Buffer.concat([content, checksum]);
        
        // Write to file
        const indexPath = path.join(gitDir, 'index');
        fs.writeFileSync(indexPath, finalData);
    }
}

module.exports = { Index, IndexEntry };

Step 2: Implement the Add Command

src/commands/add.js
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index, IndexEntry } = require('../utils/index');

/**
 * add - Add file contents to the index
 * 
 * Usage:
 *   mygit add <file>...
 *   mygit add .           # Add all files
 *   mygit add -A          # Add all files (including deletions)
 */
function execute(args) {
    if (args.length === 0) {
        throw new Error('Nothing specified, nothing added.');
    }
    
    const gitDir = requireGitDir();
    const repoRoot = getRepoRoot(gitDir);
    const index = Index.read(gitDir);
    
    // Collect all files to add
    const filesToAdd = [];
    
    for (const arg of args) {
        if (arg === '-A' || arg === '--all' || arg === '.') {
            // Add all tracked and untracked files
            collectAllFiles(repoRoot, repoRoot, filesToAdd);
        } else {
            const filePath = path.resolve(arg);
            
            if (!fs.existsSync(filePath)) {
                throw new Error(`fatal: pathspec '${arg}' did not match any files`);
            }
            
            if (fs.statSync(filePath).isDirectory()) {
                collectAllFiles(filePath, repoRoot, filesToAdd);
            } else {
                filesToAdd.push(filePath);
            }
        }
    }
    
    // Add each file to the index
    for (const filePath of filesToAdd) {
        addFileToIndex(filePath, repoRoot, gitDir, index);
    }
    
    // Write updated index
    index.write(gitDir);
}

/**
 * Recursively collect all files in a directory
 */
function collectAllFiles(dir, repoRoot, files) {
    const entries = fs.readdirSync(dir, { withFileTypes: true });
    
    for (const entry of entries) {
        const fullPath = path.join(dir, entry.name);
        
        // Skip .git directory
        if (entry.name === '.git') continue;
        
        // Skip common ignored patterns (simplified)
        if (entry.name === 'node_modules') continue;
        if (entry.name.startsWith('.')) continue;
        
        if (entry.isDirectory()) {
            collectAllFiles(fullPath, repoRoot, files);
        } else if (entry.isFile()) {
            files.push(fullPath);
        }
    }
}

/**
 * Add a single file to the index
 */
function addFileToIndex(filePath, repoRoot, gitDir, index) {
    // Read file content
    const content = fs.readFileSync(filePath);
    
    // Hash and store as blob
    const { hash, data } = hashObject(content, 'blob');
    writeObject(gitDir, hash, data);
    
    // Create index entry
    const entry = IndexEntry.fromFile(filePath, hash, repoRoot);
    index.addEntry(entry);
}

module.exports = { execute };

Step 3: Implement the Status Command

src/commands/status.js
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot, findGitDir } = require('../utils/paths');
const { hashObject, readObject } = require('../utils/objects');
const { Index } = require('../utils/index');

/**
 * status - Show the working tree status
 * 
 * Compares:
 * 1. HEAD commit tree vs Index (staged changes)
 * 2. Index vs Working directory (unstaged changes)
 * 3. Working directory vs Index (untracked files)
 */
function execute(args) {
    const gitDir = requireGitDir();
    const repoRoot = getRepoRoot(gitDir);
    const index = Index.read(gitDir);
    
    // Get current branch
    const branch = getCurrentBranch(gitDir);
    console.log(`On branch ${branch}`);
    console.log();
    
    // Get HEAD tree (if exists)
    const headTree = getHeadTree(gitDir);
    
    // Categorize files
    const staged = [];      // Changes between HEAD and index
    const unstaged = [];    // Changes between index and working dir
    const untracked = [];   // Files not in index
    
    // Check staged changes (HEAD vs Index)
    const indexEntries = index.getEntries();
    const headEntries = headTree ? parseTreeRecursive(gitDir, headTree, '') : new Map();
    
    for (const entry of indexEntries) {
        const headHash = headEntries.get(entry.name);
        
        if (!headHash) {
            staged.push({ name: entry.name, status: 'new file' });
        } else if (headHash !== entry.hash) {
            staged.push({ name: entry.name, status: 'modified' });
        }
    }
    
    // Check for deleted files (in HEAD but not in index)
    for (const [name, hash] of headEntries) {
        if (!index.getEntry(name)) {
            staged.push({ name, status: 'deleted' });
        }
    }
    
    // Check unstaged changes (Index vs Working Directory)
    for (const entry of indexEntries) {
        const filePath = path.join(repoRoot, entry.name);
        
        if (!fs.existsSync(filePath)) {
            unstaged.push({ name: entry.name, status: 'deleted' });
        } else {
            const content = fs.readFileSync(filePath);
            const { hash } = hashObject(content, 'blob');
            
            if (hash !== entry.hash) {
                unstaged.push({ name: entry.name, status: 'modified' });
            }
        }
    }
    
    // Check for untracked files
    collectUntracked(repoRoot, repoRoot, index, untracked);
    
    // Print results
    if (staged.length > 0) {
        console.log('Changes to be committed:');
        console.log('  (use "mygit restore --staged <file>..." to unstage)');
        console.log();
        for (const { name, status } of staged) {
            console.log(`\t\x1b[32m${status}:   ${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (unstaged.length > 0) {
        console.log('Changes not staged for commit:');
        console.log('  (use "mygit add <file>..." to update what will be committed)');
        console.log();
        for (const { name, status } of unstaged) {
            console.log(`\t\x1b[31m${status}:   ${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (untracked.length > 0) {
        console.log('Untracked files:');
        console.log('  (use "mygit add <file>..." to include in what will be committed)');
        console.log();
        for (const name of untracked) {
            console.log(`\t\x1b[31m${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (staged.length === 0 && unstaged.length === 0 && untracked.length === 0) {
        console.log('nothing to commit, working tree clean');
    }
}

/**
 * Get the current branch name
 */
function getCurrentBranch(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: refs/heads/')) {
        return headContent.slice('ref: refs/heads/'.length);
    }
    
    // Detached HEAD
    return headContent.slice(0, 7);
}

/**
 * Get the tree hash from HEAD commit
 */
function getHeadTree(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    let commitHash;
    
    if (headContent.startsWith('ref: ')) {
        // Symbolic reference
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (!fs.existsSync(refFile)) {
            return null; // No commits yet
        }
        
        commitHash = fs.readFileSync(refFile, 'utf8').trim();
    } else {
        commitHash = headContent;
    }
    
    // Read commit object to get tree hash
    try {
        const { content } = readObject(gitDir, commitHash);
        const lines = content.toString().split('\n');
        
        for (const line of lines) {
            if (line.startsWith('tree ')) {
                return line.slice(5);
            }
        }
    } catch (e) {
        return null;
    }
    
    return null;
}

/**
 * Parse a tree object recursively, returning Map of path -> hash
 */
function parseTreeRecursive(gitDir, treeHash, prefix) {
    const result = new Map();
    const { content } = readObject(gitDir, treeHash);
    
    let offset = 0;
    while (offset < content.length) {
        const spaceIndex = content.indexOf(0x20, offset);
        const mode = content.slice(offset, spaceIndex).toString();
        
        const nullIndex = content.indexOf(0, spaceIndex);
        const name = content.slice(spaceIndex + 1, nullIndex).toString();
        
        const hashBytes = content.slice(nullIndex + 1, nullIndex + 21);
        const hash = hashBytes.toString('hex');
        
        offset = nullIndex + 21;
        
        const fullPath = prefix ? `${prefix}/${name}` : name;
        
        if (mode === '40000') {
            // Recurse into subtree
            const subEntries = parseTreeRecursive(gitDir, hash, fullPath);
            for (const [subPath, subHash] of subEntries) {
                result.set(subPath, subHash);
            }
        } else {
            result.set(fullPath, hash);
        }
    }
    
    return result;
}

/**
 * Collect untracked files
 */
function collectUntracked(dir, repoRoot, index, untracked) {
    const entries = fs.readdirSync(dir, { withFileTypes: true });
    
    for (const entry of entries) {
        const fullPath = path.join(dir, entry.name);
        const relativePath = path.relative(repoRoot, fullPath)
            .split(path.sep).join('/');
        
        // Skip .git
        if (entry.name === '.git') continue;
        if (entry.name === 'node_modules') continue;
        
        if (entry.isDirectory()) {
            collectUntracked(fullPath, repoRoot, index, untracked);
        } else if (entry.isFile()) {
            if (!index.getEntry(relativePath)) {
                untracked.push(relativePath);
            }
        }
    }
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js
#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
    add: require('./commands/add'),
    status: require('./commands/status'),
};

// ... rest of main() function

Testing Your Implementation

# Initialize a new repo
mygit init

# Create some files
echo "Hello" > hello.txt
echo "World" > world.txt

# Check status (should show untracked)
mygit status
# Expected:
# Untracked files:
#     hello.txt
#     world.txt

# Stage a file
mygit add hello.txt

# Check status again
mygit status
# Expected:
# Changes to be committed:
#     new file:   hello.txt
# Untracked files:
#     world.txt

# Modify the staged file
echo "Hello Modified" > hello.txt

# Check status
mygit status
# Expected:
# Changes to be committed:
#     new file:   hello.txt
# Changes not staged for commit:
#     modified:   hello.txt

Exercises

Create a rm command that removes files from the index:
// mygit rm <file>
// 1. Remove from index
// 2. Optionally delete from working directory (--cached to keep file)
Support glob patterns in the add command:
mygit add "*.js"
mygit add "src/**/*.ts"
Hint: Use the minimatch npm package.
Show the actual differences in status output:
mygit status -v  # Show diff for staged changes

Key Concepts

Three Trees

Git has three areas: Working Directory, Index (Stage), and HEAD commit

Binary Index

The index is a binary file optimized for fast status checks

Metadata Caching

File timestamps help Git quickly detect changes without hashing

Atomic Updates

The index is written atomically to prevent corruption

What’s Next?

In Chapter 4: Commits & History, we’ll implement:
  • The commit command
  • Tree objects from the index
  • Commit objects with parent linking
  • The log command

Next: Commits & History

Learn how Git creates commits and maintains history

Interview Deep-Dive

Strong Answer:
  • The staging area enables partial commits: you can modify ten files but only commit three. This lets you create logical, atomic commits that each represent one coherent change, even when your working directory contains multiple unrelated modifications. This is essential for maintaining a clean, reviewable commit history.
  • It enables git add -p (patch mode), where you can stage individual hunks within a file. You might fix a bug and refactor a function in the same file — the index lets you commit the bugfix in one commit and the refactor in another, even though they are in the same file.
  • The index also serves as a performance cache. It stores file metadata (mtime, size, inode number) alongside the blob hash. When git status runs, it compares current file stats against the index’s cached stats. If they match, Git skips the expensive step of re-hashing the file. This “stat cache” is why git status is fast even on repositories with tens of thousands of files.
  • The three-tree model (working directory, index, HEAD) gives Git its flexibility. Each comparison reveals different information: index vs. HEAD shows staged changes; working directory vs. index shows unstaged changes; working directory vs. HEAD shows the total diff. git diff, git diff --staged, and git diff HEAD correspond to these three comparisons.
Follow-up: The index is a binary file. Why not just use a text file like everything else in .git?Performance. The index needs to be read on nearly every Git operation (status, diff, commit, checkout) and can contain entries for tens of thousands of files. A binary format with fixed-width fields allows direct memory mapping and offset-based access. Parsing a text format would require scanning for delimiters, converting strings to numbers, and handling encoding. The binary format also includes a SHA-1 checksum of the entire index at the end, enabling corruption detection. The trade-off is that the index is not human-readable (unlike commit objects or refs), but since it is a transient working cache rather than part of the permanent history, this is an acceptable cost.
Strong Answer:
  • Git uses a stat cache optimization. The index stores filesystem metadata for each tracked file: mtime (modification time), ctime (status change time), file size, inode number, and device number. When git status runs, it calls stat() on each tracked file and compares the results against the cached values.
  • If mtime, size, and inode all match, Git assumes the file has not changed and skips hashing. This is the fast path that makes git status sub-second on large repositories.
  • If any stat field differs, Git reads the file, computes its SHA-1 hash, and compares it to the hash stored in the index. If the hashes match (the file was touched but not actually changed, like a build system updating mtime), Git may update the cached stat values in the index to avoid re-checking next time. This is called “index refresh.”
  • The edge case is the “racy git” problem: if a file is modified within the same second as the index was written (so mtime matches), Git cannot detect the change via stat alone. Git handles this by flagging entries written in the same second as the index and always re-hashing them on the next status check. This ensures correctness at the cost of a small performance penalty for recently-modified files.
Follow-up: What is the racy git problem, and how would you reproduce it?Write to a file and immediately run git add and git status in the same second. If the filesystem’s mtime resolution is one second (common on ext3, HFS+), the index write and the file modification have the same timestamp. On the next git status, Git cannot tell whether the file was modified after the index write. Git’s defense is to mark entries whose mtime matches the index write time as “racily clean” and always re-hash them. You can reproduce it with a script that modifies a file and runs git add in a tight loop, checking if git status ever misses a modification. On modern filesystems with nanosecond timestamps (ext4, APFS), the window is much smaller, but the protection logic still exists for portability.
Strong Answer:
  • The index has a SHA-1 checksum appended at the end. If the file is corrupted, Git detects it on the next read and refuses to proceed with an error like “index file corrupt.” This prevents silent data loss.
  • Recovery is straightforward: delete .git/index and run git reset. This regenerates the index from the HEAD commit’s tree. You lose your staged changes (anything you git added but did not commit), but committed history is unaffected because it is stored in the object database, not the index.
  • If you had important staged changes, you might try to recover them from the object database: git fsck --unreachable lists objects not referenced by any commit, and some of those might be blobs you had staged. You can inspect them with git cat-file -p <hash> and manually recover the content.
  • The broader lesson is that the index is a derived, regenerable cache. The authoritative data is in the object store (blobs, trees, commits) and refs (branches, tags). Losing the index is an inconvenience, not a disaster. This is by design — transient working state should never be the only copy of important data.
Follow-up: Is it safe to copy .git/index between machines or share it in a repository?No. The index contains host-specific metadata: inode numbers, device IDs, and filesystem timestamps that differ between machines. An index from machine A would cause git status on machine B to think every file has changed (because the inodes and timestamps do not match). This is why .git/index is listed in .gitignore by convention and is never transferred during git clone or git fetch. Each clone generates its own index from the HEAD tree.