Chapter 3: Staging & Index
The staging area is Git’s “preparation zone” for commits. It’s what makes Git special compared to other version control systems. In this chapter, we’ll implement the index file and theadd and status commands.
Prerequisites: Completed Chapter 2: Object Model
Time: 2-3 hours
Outcome: Working
Time: 2-3 hours
Outcome: Working
add and status commandsWhy a Staging Area?
Other VCS systems commit all changes at once. Git’s staging area lets you:Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ GIT'S THREE-TREE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ WORKING DIRECTORY INDEX (STAGING) HEAD (LAST COMMIT) │
│ ───────────────── ─────────────── ───────────────── │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ file.txt │ add │ file.txt │ commit │ file.txt │ │
│ │ (edited) │ ───────► │ (staged) │ ───────► │ (committed) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ • Your actual files • Snapshot for next • Previous commit │
│ • What you edit commit • Immutable │
│ • Not tracked by Git • Binary index file • Object in store │
│ (.git/index) │
│ │
│ BENEFITS: │
│ • Stage partial changes from a file │
│ • Review what will be committed │
│ • Build commits incrementally │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Index File Format
The index is a binary file at.git/index. Let’s understand its format:
Copy
INDEX FILE FORMAT (Version 2)
==============================
HEADER (12 bytes)
├── Signature: "DIRC" (4 bytes) - stands for "DirCache"
├── Version: 2 (4 bytes, big-endian)
└── Number of entries (4 bytes, big-endian)
ENTRIES (variable)
├── Entry 1
│ ├── ctime seconds (4 bytes)
│ ├── ctime nanoseconds (4 bytes)
│ ├── mtime seconds (4 bytes)
│ ├── mtime nanoseconds (4 bytes)
│ ├── dev (4 bytes)
│ ├── ino (4 bytes)
│ ├── mode (4 bytes) - e.g., 100644
│ ├── uid (4 bytes)
│ ├── gid (4 bytes)
│ ├── file size (4 bytes)
│ ├── SHA-1 hash (20 bytes)
│ ├── flags (2 bytes) - includes name length
│ ├── name (variable, null-terminated)
│ └── padding (1-8 bytes to align to 8 bytes)
├── Entry 2
│ └── ...
└── Entry N
CHECKSUM (20 bytes)
└── SHA-1 of everything above
The index format is complex because Git optimizes for speed. File metadata (mtime, size) helps Git quickly detect changes without reading file contents.
Implementation
Step 1: Index Parser/Writer
src/utils/index.js
Copy
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');
/**
* Represents a single index entry
*/
class IndexEntry {
constructor({
ctimeSeconds = 0,
ctimeNanoseconds = 0,
mtimeSeconds = 0,
mtimeNanoseconds = 0,
dev = 0,
ino = 0,
mode = 0o100644,
uid = 0,
gid = 0,
size = 0,
hash = '',
flags = 0,
name = ''
}) {
this.ctimeSeconds = ctimeSeconds;
this.ctimeNanoseconds = ctimeNanoseconds;
this.mtimeSeconds = mtimeSeconds;
this.mtimeNanoseconds = mtimeNanoseconds;
this.dev = dev;
this.ino = ino;
this.mode = mode;
this.uid = uid;
this.gid = gid;
this.size = size;
this.hash = hash;
this.flags = flags;
this.name = name;
}
/**
* Create an entry from a file and its hash
*/
static fromFile(filePath, hash, repoRoot) {
const stat = fs.statSync(filePath);
const relativePath = path.relative(repoRoot, filePath)
.split(path.sep).join('/'); // Normalize to forward slashes
return new IndexEntry({
ctimeSeconds: Math.floor(stat.ctimeMs / 1000),
ctimeNanoseconds: (stat.ctimeMs % 1000) * 1000000,
mtimeSeconds: Math.floor(stat.mtimeMs / 1000),
mtimeNanoseconds: (stat.mtimeMs % 1000) * 1000000,
dev: stat.dev,
ino: stat.ino,
mode: stat.mode,
uid: stat.uid,
gid: stat.gid,
size: stat.size,
hash: hash,
flags: Math.min(relativePath.length, 0xFFF), // 12-bit name length
name: relativePath
});
}
}
/**
* The Git index (staging area)
*/
class Index {
constructor() {
this.version = 2;
this.entries = new Map(); // name -> IndexEntry
}
/**
* Add or update an entry
*/
addEntry(entry) {
this.entries.set(entry.name, entry);
}
/**
* Remove an entry
*/
removeEntry(name) {
this.entries.delete(name);
}
/**
* Get an entry by name
*/
getEntry(name) {
return this.entries.get(name);
}
/**
* Get all entries sorted by name
*/
getEntries() {
return Array.from(this.entries.values())
.sort((a, b) => a.name.localeCompare(b.name));
}
/**
* Read index from .git/index file
*/
static read(gitDir) {
const indexPath = path.join(gitDir, 'index');
const index = new Index();
if (!fs.existsSync(indexPath)) {
return index; // Empty index
}
const data = fs.readFileSync(indexPath);
let offset = 0;
// Parse header
const signature = data.slice(0, 4).toString();
if (signature !== 'DIRC') {
throw new Error('Invalid index file signature');
}
offset += 4;
const version = data.readUInt32BE(offset);
if (version !== 2) {
throw new Error(`Unsupported index version: ${version}`);
}
offset += 4;
const entryCount = data.readUInt32BE(offset);
offset += 4;
// Parse entries
for (let i = 0; i < entryCount; i++) {
const entry = new IndexEntry({});
entry.ctimeSeconds = data.readUInt32BE(offset); offset += 4;
entry.ctimeNanoseconds = data.readUInt32BE(offset); offset += 4;
entry.mtimeSeconds = data.readUInt32BE(offset); offset += 4;
entry.mtimeNanoseconds = data.readUInt32BE(offset); offset += 4;
entry.dev = data.readUInt32BE(offset); offset += 4;
entry.ino = data.readUInt32BE(offset); offset += 4;
entry.mode = data.readUInt32BE(offset); offset += 4;
entry.uid = data.readUInt32BE(offset); offset += 4;
entry.gid = data.readUInt32BE(offset); offset += 4;
entry.size = data.readUInt32BE(offset); offset += 4;
// SHA-1 hash (20 bytes)
entry.hash = data.slice(offset, offset + 20).toString('hex');
offset += 20;
// Flags (2 bytes, includes name length in lower 12 bits)
entry.flags = data.readUInt16BE(offset);
offset += 2;
// Name (null-terminated)
const nameLength = entry.flags & 0xFFF;
const nullIndex = data.indexOf(0, offset);
entry.name = data.slice(offset, nullIndex).toString();
offset = nullIndex + 1;
// Skip padding (align to 8 bytes)
// Entry size so far: 62 bytes + name length + 1 (null byte)
const entrySize = 62 + entry.name.length + 1;
const padding = (8 - (entrySize % 8)) % 8;
offset += padding;
index.addEntry(entry);
}
return index;
}
/**
* Write index to .git/index file
*/
write(gitDir) {
const entries = this.getEntries();
const chunks = [];
// Header
const header = Buffer.alloc(12);
header.write('DIRC', 0);
header.writeUInt32BE(this.version, 4);
header.writeUInt32BE(entries.length, 8);
chunks.push(header);
// Entries
for (const entry of entries) {
const entryData = Buffer.alloc(62); // Fixed fields
let offset = 0;
entryData.writeUInt32BE(entry.ctimeSeconds, offset); offset += 4;
entryData.writeUInt32BE(entry.ctimeNanoseconds, offset); offset += 4;
entryData.writeUInt32BE(entry.mtimeSeconds, offset); offset += 4;
entryData.writeUInt32BE(entry.mtimeNanoseconds, offset); offset += 4;
entryData.writeUInt32BE(entry.dev, offset); offset += 4;
entryData.writeUInt32BE(entry.ino, offset); offset += 4;
entryData.writeUInt32BE(entry.mode, offset); offset += 4;
entryData.writeUInt32BE(entry.uid, offset); offset += 4;
entryData.writeUInt32BE(entry.gid, offset); offset += 4;
entryData.writeUInt32BE(entry.size, offset); offset += 4;
// SHA-1 hash
const hashBuffer = Buffer.from(entry.hash, 'hex');
hashBuffer.copy(entryData, offset);
offset += 20;
// Flags
entryData.writeUInt16BE(entry.flags, offset);
chunks.push(entryData);
// Name (null-terminated with padding)
const nameBuffer = Buffer.from(entry.name + '\0');
chunks.push(nameBuffer);
// Padding to 8-byte boundary
const entrySize = 62 + entry.name.length + 1;
const padding = (8 - (entrySize % 8)) % 8;
if (padding > 0) {
chunks.push(Buffer.alloc(padding, 0));
}
}
// Combine all chunks
const content = Buffer.concat(chunks);
// Add checksum
const checksum = crypto.createHash('sha1').update(content).digest();
const finalData = Buffer.concat([content, checksum]);
// Write to file
const indexPath = path.join(gitDir, 'index');
fs.writeFileSync(indexPath, finalData);
}
}
module.exports = { Index, IndexEntry };
Step 2: Implement the Add Command
src/commands/add.js
Copy
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index, IndexEntry } = require('../utils/index');
/**
* add - Add file contents to the index
*
* Usage:
* mygit add <file>...
* mygit add . # Add all files
* mygit add -A # Add all files (including deletions)
*/
function execute(args) {
if (args.length === 0) {
throw new Error('Nothing specified, nothing added.');
}
const gitDir = requireGitDir();
const repoRoot = getRepoRoot(gitDir);
const index = Index.read(gitDir);
// Collect all files to add
const filesToAdd = [];
for (const arg of args) {
if (arg === '-A' || arg === '--all' || arg === '.') {
// Add all tracked and untracked files
collectAllFiles(repoRoot, repoRoot, filesToAdd);
} else {
const filePath = path.resolve(arg);
if (!fs.existsSync(filePath)) {
throw new Error(`fatal: pathspec '${arg}' did not match any files`);
}
if (fs.statSync(filePath).isDirectory()) {
collectAllFiles(filePath, repoRoot, filesToAdd);
} else {
filesToAdd.push(filePath);
}
}
}
// Add each file to the index
for (const filePath of filesToAdd) {
addFileToIndex(filePath, repoRoot, gitDir, index);
}
// Write updated index
index.write(gitDir);
}
/**
* Recursively collect all files in a directory
*/
function collectAllFiles(dir, repoRoot, files) {
const entries = fs.readdirSync(dir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(dir, entry.name);
// Skip .git directory
if (entry.name === '.git') continue;
// Skip common ignored patterns (simplified)
if (entry.name === 'node_modules') continue;
if (entry.name.startsWith('.')) continue;
if (entry.isDirectory()) {
collectAllFiles(fullPath, repoRoot, files);
} else if (entry.isFile()) {
files.push(fullPath);
}
}
}
/**
* Add a single file to the index
*/
function addFileToIndex(filePath, repoRoot, gitDir, index) {
// Read file content
const content = fs.readFileSync(filePath);
// Hash and store as blob
const { hash, data } = hashObject(content, 'blob');
writeObject(gitDir, hash, data);
// Create index entry
const entry = IndexEntry.fromFile(filePath, hash, repoRoot);
index.addEntry(entry);
}
module.exports = { execute };
Step 3: Implement the Status Command
src/commands/status.js
Copy
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot, findGitDir } = require('../utils/paths');
const { hashObject, readObject } = require('../utils/objects');
const { Index } = require('../utils/index');
/**
* status - Show the working tree status
*
* Compares:
* 1. HEAD commit tree vs Index (staged changes)
* 2. Index vs Working directory (unstaged changes)
* 3. Working directory vs Index (untracked files)
*/
function execute(args) {
const gitDir = requireGitDir();
const repoRoot = getRepoRoot(gitDir);
const index = Index.read(gitDir);
// Get current branch
const branch = getCurrentBranch(gitDir);
console.log(`On branch ${branch}`);
console.log();
// Get HEAD tree (if exists)
const headTree = getHeadTree(gitDir);
// Categorize files
const staged = []; // Changes between HEAD and index
const unstaged = []; // Changes between index and working dir
const untracked = []; // Files not in index
// Check staged changes (HEAD vs Index)
const indexEntries = index.getEntries();
const headEntries = headTree ? parseTreeRecursive(gitDir, headTree, '') : new Map();
for (const entry of indexEntries) {
const headHash = headEntries.get(entry.name);
if (!headHash) {
staged.push({ name: entry.name, status: 'new file' });
} else if (headHash !== entry.hash) {
staged.push({ name: entry.name, status: 'modified' });
}
}
// Check for deleted files (in HEAD but not in index)
for (const [name, hash] of headEntries) {
if (!index.getEntry(name)) {
staged.push({ name, status: 'deleted' });
}
}
// Check unstaged changes (Index vs Working Directory)
for (const entry of indexEntries) {
const filePath = path.join(repoRoot, entry.name);
if (!fs.existsSync(filePath)) {
unstaged.push({ name: entry.name, status: 'deleted' });
} else {
const content = fs.readFileSync(filePath);
const { hash } = hashObject(content, 'blob');
if (hash !== entry.hash) {
unstaged.push({ name: entry.name, status: 'modified' });
}
}
}
// Check for untracked files
collectUntracked(repoRoot, repoRoot, index, untracked);
// Print results
if (staged.length > 0) {
console.log('Changes to be committed:');
console.log(' (use "mygit restore --staged <file>..." to unstage)');
console.log();
for (const { name, status } of staged) {
console.log(`\t\x1b[32m${status}: ${name}\x1b[0m`);
}
console.log();
}
if (unstaged.length > 0) {
console.log('Changes not staged for commit:');
console.log(' (use "mygit add <file>..." to update what will be committed)');
console.log();
for (const { name, status } of unstaged) {
console.log(`\t\x1b[31m${status}: ${name}\x1b[0m`);
}
console.log();
}
if (untracked.length > 0) {
console.log('Untracked files:');
console.log(' (use "mygit add <file>..." to include in what will be committed)');
console.log();
for (const name of untracked) {
console.log(`\t\x1b[31m${name}\x1b[0m`);
}
console.log();
}
if (staged.length === 0 && unstaged.length === 0 && untracked.length === 0) {
console.log('nothing to commit, working tree clean');
}
}
/**
* Get the current branch name
*/
function getCurrentBranch(gitDir) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
if (headContent.startsWith('ref: refs/heads/')) {
return headContent.slice('ref: refs/heads/'.length);
}
// Detached HEAD
return headContent.slice(0, 7);
}
/**
* Get the tree hash from HEAD commit
*/
function getHeadTree(gitDir) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
let commitHash;
if (headContent.startsWith('ref: ')) {
// Symbolic reference
const refPath = headContent.slice(5);
const refFile = path.join(gitDir, refPath);
if (!fs.existsSync(refFile)) {
return null; // No commits yet
}
commitHash = fs.readFileSync(refFile, 'utf8').trim();
} else {
commitHash = headContent;
}
// Read commit object to get tree hash
try {
const { content } = readObject(gitDir, commitHash);
const lines = content.toString().split('\n');
for (const line of lines) {
if (line.startsWith('tree ')) {
return line.slice(5);
}
}
} catch (e) {
return null;
}
return null;
}
/**
* Parse a tree object recursively, returning Map of path -> hash
*/
function parseTreeRecursive(gitDir, treeHash, prefix) {
const result = new Map();
const { content } = readObject(gitDir, treeHash);
let offset = 0;
while (offset < content.length) {
const spaceIndex = content.indexOf(0x20, offset);
const mode = content.slice(offset, spaceIndex).toString();
const nullIndex = content.indexOf(0, spaceIndex);
const name = content.slice(spaceIndex + 1, nullIndex).toString();
const hashBytes = content.slice(nullIndex + 1, nullIndex + 21);
const hash = hashBytes.toString('hex');
offset = nullIndex + 21;
const fullPath = prefix ? `${prefix}/${name}` : name;
if (mode === '40000') {
// Recurse into subtree
const subEntries = parseTreeRecursive(gitDir, hash, fullPath);
for (const [subPath, subHash] of subEntries) {
result.set(subPath, subHash);
}
} else {
result.set(fullPath, hash);
}
}
return result;
}
/**
* Collect untracked files
*/
function collectUntracked(dir, repoRoot, index, untracked) {
const entries = fs.readdirSync(dir, { withFileTypes: true });
for (const entry of entries) {
const fullPath = path.join(dir, entry.name);
const relativePath = path.relative(repoRoot, fullPath)
.split(path.sep).join('/');
// Skip .git
if (entry.name === '.git') continue;
if (entry.name === 'node_modules') continue;
if (entry.isDirectory()) {
collectUntracked(fullPath, repoRoot, index, untracked);
} else if (entry.isFile()) {
if (!index.getEntry(relativePath)) {
untracked.push(relativePath);
}
}
}
}
module.exports = { execute };
Step 4: Update CLI Entry Point
src/mygit.js
Copy
#!/usr/bin/env node
const commands = {
init: require('./commands/init'),
'hash-object': require('./commands/hashObject'),
'cat-file': require('./commands/catFile'),
add: require('./commands/add'),
status: require('./commands/status'),
};
// ... rest of main() function
Testing Your Implementation
Copy
# Initialize a new repo
mygit init
# Create some files
echo "Hello" > hello.txt
echo "World" > world.txt
# Check status (should show untracked)
mygit status
# Expected:
# Untracked files:
# hello.txt
# world.txt
# Stage a file
mygit add hello.txt
# Check status again
mygit status
# Expected:
# Changes to be committed:
# new file: hello.txt
# Untracked files:
# world.txt
# Modify the staged file
echo "Hello Modified" > hello.txt
# Check status
mygit status
# Expected:
# Changes to be committed:
# new file: hello.txt
# Changes not staged for commit:
# modified: hello.txt
Exercises
Exercise 1: Implement rm command
Exercise 1: Implement rm command
Create a
rm command that removes files from the index:Copy
// mygit rm <file>
// 1. Remove from index
// 2. Optionally delete from working directory (--cached to keep file)
Exercise 2: Add glob patterns
Exercise 2: Add glob patterns
Support glob patterns in the add command:Hint: Use the
Copy
mygit add "*.js"
mygit add "src/**/*.ts"
minimatch npm package.Exercise 3: Implement diff
Exercise 3: Implement diff
Show the actual differences in
status output:Copy
mygit status -v # Show diff for staged changes
Key Concepts
Three Trees
Git has three areas: Working Directory, Index (Stage), and HEAD commit
Binary Index
The index is a binary file optimized for fast status checks
Metadata Caching
File timestamps help Git quickly detect changes without hashing
Atomic Updates
The index is written atomically to prevent corruption
What’s Next?
In Chapter 4: Commits & History, we’ll implement:- The
commitcommand - Tree objects from the index
- Commit objects with parent linking
- The
logcommand
Next: Commits & History
Learn how Git creates commits and maintains history