Chapter 3: Staging & Index

The staging area is Git’s “preparation zone” for commits. It’s what makes Git special compared to other version control systems. In this chapter, we’ll implement the index file and the add and status commands.

Prerequisites: Completed Chapter 2: Object Model
Time: 2-3 hours
Outcome: Working add and status commands

Why a Staging Area?

Other VCS systems commit all changes at once. Git’s staging area lets you:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        GIT'S THREE-TREE ARCHITECTURE                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   WORKING DIRECTORY        INDEX (STAGING)           HEAD (LAST COMMIT)     │
│   ─────────────────        ───────────────           ─────────────────      │
│                                                                              │
│   ┌─────────────┐          ┌─────────────┐          ┌─────────────┐         │
│   │ file.txt    │   add    │ file.txt    │  commit  │ file.txt    │         │
│   │ (edited)    │ ───────► │ (staged)    │ ───────► │ (committed) │         │
│   └─────────────┘          └─────────────┘          └─────────────┘         │
│                                                                              │
│   • Your actual files      • Snapshot for next      • Previous commit       │
│   • What you edit            commit                 • Immutable             │
│   • Not tracked by Git     • Binary index file      • Object in store       │
│                              (.git/index)                                    │
│                                                                              │
│   BENEFITS:                                                                  │
│   • Stage partial changes from a file                                       │
│   • Review what will be committed                                           │
│   • Build commits incrementally                                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Index File Format

The index is a binary file at .git/index. Let’s understand its format:

INDEX FILE FORMAT (Version 2)
==============================

HEADER (12 bytes)
├── Signature: "DIRC" (4 bytes) - stands for "DirCache"
├── Version: 2 (4 bytes, big-endian)
└── Number of entries (4 bytes, big-endian)

ENTRIES (variable)
├── Entry 1
│   ├── ctime seconds (4 bytes)
│   ├── ctime nanoseconds (4 bytes)
│   ├── mtime seconds (4 bytes)
│   ├── mtime nanoseconds (4 bytes)
│   ├── dev (4 bytes)
│   ├── ino (4 bytes)
│   ├── mode (4 bytes) - e.g., 100644
│   ├── uid (4 bytes)
│   ├── gid (4 bytes)
│   ├── file size (4 bytes)
│   ├── SHA-1 hash (20 bytes)
│   ├── flags (2 bytes) - includes name length
│   ├── name (variable, null-terminated)
│   └── padding (1-8 bytes to align to 8 bytes)
├── Entry 2
│   └── ...
└── Entry N

CHECKSUM (20 bytes)
└── SHA-1 of everything above

The index format is complex because Git optimizes for speed. File metadata (mtime, size) helps Git quickly detect changes without reading file contents.

Implementation

Step 1: Index Parser/Writer

src/utils/index.js

const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

/**
 * Represents a single index entry
 */
class IndexEntry {
    constructor({
        ctimeSeconds = 0,
        ctimeNanoseconds = 0,
        mtimeSeconds = 0,
        mtimeNanoseconds = 0,
        dev = 0,
        ino = 0,
        mode = 0o100644,
        uid = 0,
        gid = 0,
        size = 0,
        hash = '',
        flags = 0,
        name = ''
    }) {
        this.ctimeSeconds = ctimeSeconds;
        this.ctimeNanoseconds = ctimeNanoseconds;
        this.mtimeSeconds = mtimeSeconds;
        this.mtimeNanoseconds = mtimeNanoseconds;
        this.dev = dev;
        this.ino = ino;
        this.mode = mode;
        this.uid = uid;
        this.gid = gid;
        this.size = size;
        this.hash = hash;
        this.flags = flags;
        this.name = name;
    }
    
    /**
     * Create an entry from a file and its hash
     */
    static fromFile(filePath, hash, repoRoot) {
        const stat = fs.statSync(filePath);
        const relativePath = path.relative(repoRoot, filePath)
            .split(path.sep).join('/'); // Normalize to forward slashes
        
        return new IndexEntry({
            ctimeSeconds: Math.floor(stat.ctimeMs / 1000),
            ctimeNanoseconds: (stat.ctimeMs % 1000) * 1000000,
            mtimeSeconds: Math.floor(stat.mtimeMs / 1000),
            mtimeNanoseconds: (stat.mtimeMs % 1000) * 1000000,
            dev: stat.dev,
            ino: stat.ino,
            mode: stat.mode,
            uid: stat.uid,
            gid: stat.gid,
            size: stat.size,
            hash: hash,
            flags: Math.min(relativePath.length, 0xFFF), // 12-bit name length
            name: relativePath
        });
    }
}

/**
 * The Git index (staging area)
 */
class Index {
    constructor() {
        this.version = 2;
        this.entries = new Map(); // name -> IndexEntry
    }
    
    /**
     * Add or update an entry
     */
    addEntry(entry) {
        this.entries.set(entry.name, entry);
    }
    
    /**
     * Remove an entry
     */
    removeEntry(name) {
        this.entries.delete(name);
    }
    
    /**
     * Get an entry by name
     */
    getEntry(name) {
        return this.entries.get(name);
    }
    
    /**
     * Get all entries sorted by name
     */
    getEntries() {
        return Array.from(this.entries.values())
            .sort((a, b) => a.name.localeCompare(b.name));
    }
    
    /**
     * Read index from .git/index file
     */
    static read(gitDir) {
        const indexPath = path.join(gitDir, 'index');
        const index = new Index();
        
        if (!fs.existsSync(indexPath)) {
            return index; // Empty index
        }
        
        const data = fs.readFileSync(indexPath);
        let offset = 0;
        
        // Parse header
        const signature = data.slice(0, 4).toString();
        if (signature !== 'DIRC') {
            throw new Error('Invalid index file signature');
        }
        offset += 4;
        
        const version = data.readUInt32BE(offset);
        if (version !== 2) {
            throw new Error(`Unsupported index version: ${version}`);
        }
        offset += 4;
        
        const entryCount = data.readUInt32BE(offset);
        offset += 4;
        
        // Parse entries
        for (let i = 0; i < entryCount; i++) {
            const entry = new IndexEntry({});
            
            entry.ctimeSeconds = data.readUInt32BE(offset); offset += 4;
            entry.ctimeNanoseconds = data.readUInt32BE(offset); offset += 4;
            entry.mtimeSeconds = data.readUInt32BE(offset); offset += 4;
            entry.mtimeNanoseconds = data.readUInt32BE(offset); offset += 4;
            entry.dev = data.readUInt32BE(offset); offset += 4;
            entry.ino = data.readUInt32BE(offset); offset += 4;
            entry.mode = data.readUInt32BE(offset); offset += 4;
            entry.uid = data.readUInt32BE(offset); offset += 4;
            entry.gid = data.readUInt32BE(offset); offset += 4;
            entry.size = data.readUInt32BE(offset); offset += 4;
            
            // SHA-1 hash (20 bytes)
            entry.hash = data.slice(offset, offset + 20).toString('hex');
            offset += 20;
            
            // Flags (2 bytes, includes name length in lower 12 bits)
            entry.flags = data.readUInt16BE(offset);
            offset += 2;
            
            // Name (null-terminated)
            const nameLength = entry.flags & 0xFFF;
            const nullIndex = data.indexOf(0, offset);
            entry.name = data.slice(offset, nullIndex).toString();
            offset = nullIndex + 1;
            
            // Skip padding (align to 8 bytes)
            // Entry size so far: 62 bytes + name length + 1 (null byte)
            const entrySize = 62 + entry.name.length + 1;
            const padding = (8 - (entrySize % 8)) % 8;
            offset += padding;
            
            index.addEntry(entry);
        }
        
        return index;
    }
    
    /**
     * Write index to .git/index file
     */
    write(gitDir) {
        const entries = this.getEntries();
        const chunks = [];
        
        // Header
        const header = Buffer.alloc(12);
        header.write('DIRC', 0);
        header.writeUInt32BE(this.version, 4);
        header.writeUInt32BE(entries.length, 8);
        chunks.push(header);
        
        // Entries
        for (const entry of entries) {
            const entryData = Buffer.alloc(62); // Fixed fields
            let offset = 0;
            
            entryData.writeUInt32BE(entry.ctimeSeconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.ctimeNanoseconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.mtimeSeconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.mtimeNanoseconds, offset); offset += 4;
            entryData.writeUInt32BE(entry.dev, offset); offset += 4;
            entryData.writeUInt32BE(entry.ino, offset); offset += 4;
            entryData.writeUInt32BE(entry.mode, offset); offset += 4;
            entryData.writeUInt32BE(entry.uid, offset); offset += 4;
            entryData.writeUInt32BE(entry.gid, offset); offset += 4;
            entryData.writeUInt32BE(entry.size, offset); offset += 4;
            
            // SHA-1 hash
            const hashBuffer = Buffer.from(entry.hash, 'hex');
            hashBuffer.copy(entryData, offset);
            offset += 20;
            
            // Flags
            entryData.writeUInt16BE(entry.flags, offset);
            
            chunks.push(entryData);
            
            // Name (null-terminated with padding)
            const nameBuffer = Buffer.from(entry.name + '\0');
            chunks.push(nameBuffer);
            
            // Padding to 8-byte boundary
            const entrySize = 62 + entry.name.length + 1;
            const padding = (8 - (entrySize % 8)) % 8;
            if (padding > 0) {
                chunks.push(Buffer.alloc(padding, 0));
            }
        }
        
        // Combine all chunks
        const content = Buffer.concat(chunks);
        
        // Add checksum
        const checksum = crypto.createHash('sha1').update(content).digest();
        const finalData = Buffer.concat([content, checksum]);
        
        // Write to file
        const indexPath = path.join(gitDir, 'index');
        fs.writeFileSync(indexPath, finalData);
    }
}

module.exports = { Index, IndexEntry };

Step 2: Implement the Add Command

src/commands/add.js

const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index, IndexEntry } = require('../utils/index');

/**
 * add - Add file contents to the index
 * 
 * Usage:
 *   mygit add <file>...
 *   mygit add .           # Add all files
 *   mygit add -A          # Add all files (including deletions)
 */
function execute(args) {
    if (args.length === 0) {
        throw new Error('Nothing specified, nothing added.');
    }
    
    const gitDir = requireGitDir();
    const repoRoot = getRepoRoot(gitDir);
    const index = Index.read(gitDir);
    
    // Collect all files to add
    const filesToAdd = [];
    
    for (const arg of args) {
        if (arg === '-A' || arg === '--all' || arg === '.') {
            // Add all tracked and untracked files
            collectAllFiles(repoRoot, repoRoot, filesToAdd);
        } else {
            const filePath = path.resolve(arg);
            
            if (!fs.existsSync(filePath)) {
                throw new Error(`fatal: pathspec '${arg}' did not match any files`);
            }
            
            if (fs.statSync(filePath).isDirectory()) {
                collectAllFiles(filePath, repoRoot, filesToAdd);
            } else {
                filesToAdd.push(filePath);
            }
        }
    }
    
    // Add each file to the index
    for (const filePath of filesToAdd) {
        addFileToIndex(filePath, repoRoot, gitDir, index);
    }
    
    // Write updated index
    index.write(gitDir);
}

/**
 * Recursively collect all files in a directory
 */
function collectAllFiles(dir, repoRoot, files) {
    const entries = fs.readdirSync(dir, { withFileTypes: true });
    
    for (const entry of entries) {
        const fullPath = path.join(dir, entry.name);
        
        // Skip .git directory
        if (entry.name === '.git') continue;
        
        // Skip common ignored patterns (simplified)
        if (entry.name === 'node_modules') continue;
        if (entry.name.startsWith('.')) continue;
        
        if (entry.isDirectory()) {
            collectAllFiles(fullPath, repoRoot, files);
        } else if (entry.isFile()) {
            files.push(fullPath);
        }
    }
}

/**
 * Add a single file to the index
 */
function addFileToIndex(filePath, repoRoot, gitDir, index) {
    // Read file content
    const content = fs.readFileSync(filePath);
    
    // Hash and store as blob
    const { hash, data } = hashObject(content, 'blob');
    writeObject(gitDir, hash, data);
    
    // Create index entry
    const entry = IndexEntry.fromFile(filePath, hash, repoRoot);
    index.addEntry(entry);
}

module.exports = { execute };

Step 3: Implement the Status Command

src/commands/status.js

const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot, findGitDir } = require('../utils/paths');
const { hashObject, readObject } = require('../utils/objects');
const { Index } = require('../utils/index');

/**
 * status - Show the working tree status
 * 
 * Compares:
 * 1. HEAD commit tree vs Index (staged changes)
 * 2. Index vs Working directory (unstaged changes)
 * 3. Working directory vs Index (untracked files)
 */
function execute(args) {
    const gitDir = requireGitDir();
    const repoRoot = getRepoRoot(gitDir);
    const index = Index.read(gitDir);
    
    // Get current branch
    const branch = getCurrentBranch(gitDir);
    console.log(`On branch ${branch}`);
    console.log();
    
    // Get HEAD tree (if exists)
    const headTree = getHeadTree(gitDir);
    
    // Categorize files
    const staged = [];      // Changes between HEAD and index
    const unstaged = [];    // Changes between index and working dir
    const untracked = [];   // Files not in index
    
    // Check staged changes (HEAD vs Index)
    const indexEntries = index.getEntries();
    const headEntries = headTree ? parseTreeRecursive(gitDir, headTree, '') : new Map();
    
    for (const entry of indexEntries) {
        const headHash = headEntries.get(entry.name);
        
        if (!headHash) {
            staged.push({ name: entry.name, status: 'new file' });
        } else if (headHash !== entry.hash) {
            staged.push({ name: entry.name, status: 'modified' });
        }
    }
    
    // Check for deleted files (in HEAD but not in index)
    for (const [name, hash] of headEntries) {
        if (!index.getEntry(name)) {
            staged.push({ name, status: 'deleted' });
        }
    }
    
    // Check unstaged changes (Index vs Working Directory)
    for (const entry of indexEntries) {
        const filePath = path.join(repoRoot, entry.name);
        
        if (!fs.existsSync(filePath)) {
            unstaged.push({ name: entry.name, status: 'deleted' });
        } else {
            const content = fs.readFileSync(filePath);
            const { hash } = hashObject(content, 'blob');
            
            if (hash !== entry.hash) {
                unstaged.push({ name: entry.name, status: 'modified' });
            }
        }
    }
    
    // Check for untracked files
    collectUntracked(repoRoot, repoRoot, index, untracked);
    
    // Print results
    if (staged.length > 0) {
        console.log('Changes to be committed:');
        console.log('  (use "mygit restore --staged <file>..." to unstage)');
        console.log();
        for (const { name, status } of staged) {
            console.log(`\t\x1b[32m${status}:   ${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (unstaged.length > 0) {
        console.log('Changes not staged for commit:');
        console.log('  (use "mygit add <file>..." to update what will be committed)');
        console.log();
        for (const { name, status } of unstaged) {
            console.log(`\t\x1b[31m${status}:   ${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (untracked.length > 0) {
        console.log('Untracked files:');
        console.log('  (use "mygit add <file>..." to include in what will be committed)');
        console.log();
        for (const name of untracked) {
            console.log(`\t\x1b[31m${name}\x1b[0m`);
        }
        console.log();
    }
    
    if (staged.length === 0 && unstaged.length === 0 && untracked.length === 0) {
        console.log('nothing to commit, working tree clean');
    }
}

/**
 * Get the current branch name
 */
function getCurrentBranch(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: refs/heads/')) {
        return headContent.slice('ref: refs/heads/'.length);
    }
    
    // Detached HEAD
    return headContent.slice(0, 7);
}

/**
 * Get the tree hash from HEAD commit
 */
function getHeadTree(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    let commitHash;
    
    if (headContent.startsWith('ref: ')) {
        // Symbolic reference
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (!fs.existsSync(refFile)) {
            return null; // No commits yet
        }
        
        commitHash = fs.readFileSync(refFile, 'utf8').trim();
    } else {
        commitHash = headContent;
    }
    
    // Read commit object to get tree hash
    try {
        const { content } = readObject(gitDir, commitHash);
        const lines = content.toString().split('\n');
        
        for (const line of lines) {
            if (line.startsWith('tree ')) {
                return line.slice(5);
            }
        }
    } catch (e) {
        return null;
    }
    
    return null;
}

/**
 * Parse a tree object recursively, returning Map of path -> hash
 */
function parseTreeRecursive(gitDir, treeHash, prefix) {
    const result = new Map();
    const { content } = readObject(gitDir, treeHash);
    
    let offset = 0;
    while (offset < content.length) {
        const spaceIndex = content.indexOf(0x20, offset);
        const mode = content.slice(offset, spaceIndex).toString();
        
        const nullIndex = content.indexOf(0, spaceIndex);
        const name = content.slice(spaceIndex + 1, nullIndex).toString();
        
        const hashBytes = content.slice(nullIndex + 1, nullIndex + 21);
        const hash = hashBytes.toString('hex');
        
        offset = nullIndex + 21;
        
        const fullPath = prefix ? `${prefix}/${name}` : name;
        
        if (mode === '40000') {
            // Recurse into subtree
            const subEntries = parseTreeRecursive(gitDir, hash, fullPath);
            for (const [subPath, subHash] of subEntries) {
                result.set(subPath, subHash);
            }
        } else {
            result.set(fullPath, hash);
        }
    }
    
    return result;
}

/**
 * Collect untracked files
 */
function collectUntracked(dir, repoRoot, index, untracked) {
    const entries = fs.readdirSync(dir, { withFileTypes: true });
    
    for (const entry of entries) {
        const fullPath = path.join(dir, entry.name);
        const relativePath = path.relative(repoRoot, fullPath)
            .split(path.sep).join('/');
        
        // Skip .git
        if (entry.name === '.git') continue;
        if (entry.name === 'node_modules') continue;
        
        if (entry.isDirectory()) {
            collectUntracked(fullPath, repoRoot, index, untracked);
        } else if (entry.isFile()) {
            if (!index.getEntry(relativePath)) {
                untracked.push(relativePath);
            }
        }
    }
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js

#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
    add: require('./commands/add'),
    status: require('./commands/status'),
};

// ... rest of main() function

Testing Your Implementation

# Initialize a new repo
mygit init

# Create some files
echo "Hello" > hello.txt
echo "World" > world.txt

# Check status (should show untracked)
mygit status
# Expected:
# Untracked files:
#     hello.txt
#     world.txt

# Stage a file
mygit add hello.txt

# Check status again
mygit status
# Expected:
# Changes to be committed:
#     new file:   hello.txt
# Untracked files:
#     world.txt

# Modify the staged file
echo "Hello Modified" > hello.txt

# Check status
mygit status
# Expected:
# Changes to be committed:
#     new file:   hello.txt
# Changes not staged for commit:
#     modified:   hello.txt

Exercises

Exercise 1: Implement rm command

Create a rm command that removes files from the index:

// mygit rm <file>
// 1. Remove from index
// 2. Optionally delete from working directory (--cached to keep file)

Exercise 2: Add glob patterns

Support glob patterns in the add command:

mygit add "*.js"
mygit add "src/**/*.ts"

Hint: Use the minimatch npm package.

Exercise 3: Implement diff

Show the actual differences in status output:

mygit status -v  # Show diff for staged changes

Key Concepts

Three Trees

Git has three areas: Working Directory, Index (Stage), and HEAD commit

Binary Index

The index is a binary file optimized for fast status checks

Metadata Caching

File timestamps help Git quickly detect changes without hashing

Atomic Updates

The index is written atomically to prevent corruption

What’s Next?

In Chapter 4: Commits & History, we’ll implement:

The commit command
Tree objects from the index
Commit objects with parent linking
The log command

Next: Commits & History

Learn how Git creates commits and maintains history

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Chapter 3: Staging & Index

​Why a Staging Area?

​The Index File Format

​Implementation

​Step 1: Index Parser/Writer

​Step 2: Implement the Add Command

​Step 3: Implement the Status Command

​Step 4: Update CLI Entry Point

​Testing Your Implementation

​Exercises

​Key Concepts

Three Trees

Binary Index

Metadata Caching

Atomic Updates

​What’s Next?

Next: Commits & History

Chapter 3: Staging & Index

Why a Staging Area?

The Index File Format

Implementation

Step 1: Index Parser/Writer

Step 2: Implement the Add Command

Step 3: Implement the Status Command

Step 4: Update CLI Entry Point

Testing Your Implementation

Exercises

Key Concepts

What’s Next?