Chapter 4: Commits & History

A commit is a snapshot of your project at a point in time. In this chapter, we’ll implement the commit and log commands, learning how Git builds and traverses the commit graph.

Prerequisites: Completed Chapter 3: Staging & Index
Time: 2-3 hours
Outcome: Working commit and log commands

How Commits Work

┌─────────────────────────────────────────────────────────────────────────────┐
│                           COMMIT STRUCTURE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   COMMIT OBJECT                                                              │
│   ─────────────                                                              │
│   ┌──────────────────────────────────────┐                                  │
│   │ tree    d8329fc1cc938780ffdd9f94e0d36│──────► TREE (root directory)     │
│   │ parent  5a6f32abc123def456789abcdef  │──────► PARENT COMMIT (if any)    │
│   │ author  John <john@example.com>      │                                  │
│   │         1234567890 +0000             │                                  │
│   │ committer Jane <jane@example.com>    │                                  │
│   │           1234567890 +0000           │                                  │
│   │                                      │                                  │
│   │ Initial commit                       │ ◄───── COMMIT MESSAGE            │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   TREE OBJECT (from commit's tree)                                          │
│   ───────────                                                                │
│   ┌──────────────────────────────────────┐                                  │
│   │ 100644 blob abc123  README.md        │──────► BLOB (file content)       │
│   │ 040000 tree def456  src              │──────► TREE (subdirectory)       │
│   │ 100644 blob 789abc  package.json     │──────► BLOB (file content)       │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   BLOB (file content)                                                        │
│   ───────────────────                                                        │
│   ┌──────────────────────────────────────┐                                  │
│   │ # My Project                         │                                  │
│   │                                      │                                  │
│   │ This is a sample project...          │                                  │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Commit Graph (DAG)

Commits form a Directed Acyclic Graph (DAG):

                     (HEAD)
                        │
                        ▼
    ┌───────┐      ┌───────┐
    │   C1  │◄─────│   C2  │◄─────┐
    └───────┘      └───────┘      │
         │                        │
         │                   ┌────┴────┐
         │                   │   C4    │ (merge commit)
         │                   └────┬────┘
         │                        │
         │              ┌─────────┘
         │              │
         ▼              ▼
    ┌───────┐      ┌───────┐
    │   B1  │◄─────│   C3  │ (branch)
    └───────┘      └───────┘

Each commit knows its parent(s).
Following parent pointers = traversing history.

Why DAG, not just a linked list?
Merge commits have multiple parents, allowing branches to be combined while preserving all history.

Implementation

Step 1: Build Tree from Index

Before creating a commit, we need to convert the flat index into a tree structure:

src/utils/tree.js

const path = require('path');
const { hashObject, writeObject } = require('./objects');

/**
 * Build a tree object from index entries
 * 
 * The index is flat (files have full paths), but trees are nested:
 * Index: ["src/main.js", "src/utils/helper.js", "README.md"]
 * Trees: root -> src -> utils
 *              └─ README.md
 *              └─ main.js
 *                    └─ helper.js
 */
function buildTree(entries, gitDir) {
    // Build directory structure
    const root = { type: 'tree', entries: {} };
    
    for (const entry of entries) {
        const parts = entry.name.split('/');
        let current = root;
        
        // Navigate/create directories
        for (let i = 0; i < parts.length - 1; i++) {
            const dirName = parts[i];
            if (!current.entries[dirName]) {
                current.entries[dirName] = { type: 'tree', entries: {} };
            }
            current = current.entries[dirName];
        }
        
        // Add file entry
        const fileName = parts[parts.length - 1];
        current.entries[fileName] = {
            type: 'blob',
            mode: entry.mode,
            hash: entry.hash
        };
    }
    
    // Recursively write trees and return root hash
    return writeTreeRecursive(root, gitDir);
}

/**
 * Recursively serialize and write tree objects
 */
function writeTreeRecursive(node, gitDir) {
    const entries = [];
    
    for (const [name, child] of Object.entries(node.entries)) {
        if (child.type === 'tree') {
            // Recurse into directory
            const treeHash = writeTreeRecursive(child, gitDir);
            entries.push({
                mode: '40000',
                name: name,
                hash: treeHash
            });
        } else {
            // File entry
            const mode = child.mode.toString(8); // Convert to octal string
            entries.push({
                mode: mode,
                name: name,
                hash: child.hash
            });
        }
    }
    
    // Sort entries (Git sorts directories without trailing slash)
    entries.sort((a, b) => {
        const aName = a.mode === '40000' ? a.name + '/' : a.name;
        const bName = b.mode === '40000' ? b.name + '/' : b.name;
        return aName.localeCompare(bName);
    });
    
    // Serialize tree
    const chunks = [];
    for (const entry of entries) {
        // Format: "{mode} {name}\0{20-byte-binary-hash}"
        const modeAndName = Buffer.from(`${entry.mode} ${entry.name}\0`);
        const hashBytes = Buffer.from(entry.hash, 'hex');
        chunks.push(modeAndName, hashBytes);
    }
    
    const treeContent = Buffer.concat(chunks);
    const { hash, data } = hashObject(treeContent, 'tree');
    writeObject(gitDir, hash, data);
    
    return hash;
}

module.exports = { buildTree };

Step 2: Implement the Commit Command

src/commands/commit.js

const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index } = require('../utils/index');
const { buildTree } = require('../utils/tree');

/**
 * commit - Record changes to the repository
 * 
 * Usage:
 *   mygit commit -m "message"
 *   mygit commit --author "Name <email>"
 */
function execute(args) {
    const gitDir = requireGitDir();
    const index = Index.read(gitDir);
    
    // Check if there's anything to commit
    const entries = index.getEntries();
    if (entries.length === 0) {
        throw new Error('nothing to commit (create/copy files and use "mygit add" to track)');
    }
    
    // Parse arguments
    let message = null;
    let author = getDefaultAuthor();
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '-m' || arg === '--message') {
            message = args[++i];
        } else if (arg === '--author') {
            author = parseAuthor(args[++i]);
        }
    }
    
    if (!message) {
        throw new Error('Aborting commit due to empty commit message.');
    }
    
    // Build tree from index
    const treeHash = buildTree(entries, gitDir);
    
    // Get parent commit (if any)
    const parent = getHead(gitDir);
    
    // Create commit object
    const timestamp = Math.floor(Date.now() / 1000);
    const timezone = getTimezone();
    
    const lines = [];
    lines.push(`tree ${treeHash}`);
    
    if (parent) {
        lines.push(`parent ${parent}`);
    }
    
    lines.push(`author ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push(`committer ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push('');
    lines.push(message);
    
    const commitContent = lines.join('\n');
    const { hash, data } = hashObject(commitContent, 'commit');
    writeObject(gitDir, hash, data);
    
    // Update HEAD/branch ref
    updateHead(gitDir, hash);
    
    // Print result
    const branch = getCurrentBranch(gitDir);
    const shortHash = hash.slice(0, 7);
    const isRoot = !parent;
    
    console.log(`[${branch}${isRoot ? ' (root-commit)' : ''} ${shortHash}] ${message}`);
    console.log(` ${entries.length} file(s) changed`);
}

/**
 * Get current HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Update HEAD to point to new commit
 */
function updateHead(gitDir, commitHash) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        // Update the branch ref
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        // Ensure directory exists
        const refDir = path.dirname(refFile);
        if (!fs.existsSync(refDir)) {
            fs.mkdirSync(refDir, { recursive: true });
        }
        
        fs.writeFileSync(refFile, commitHash + '\n');
    } else {
        // Detached HEAD: update HEAD directly
        fs.writeFileSync(headPath, commitHash + '\n');
    }
}

/**
 * Get current branch name
 */
function getCurrentBranch(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: refs/heads/')) {
        return headContent.slice('ref: refs/heads/'.length);
    }
    
    return 'HEAD';
}

/**
 * Get default author from environment or config
 */
function getDefaultAuthor() {
    // Try environment variables first
    const name = process.env.GIT_AUTHOR_NAME || 
                 process.env.USER || 
                 'Unknown';
    const email = process.env.GIT_AUTHOR_EMAIL || 
                  `${name.toLowerCase().replace(/\s+/g, '.')}@localhost`;
    
    return { name, email };
}

/**
 * Parse author string "Name <email>"
 */
function parseAuthor(str) {
    const match = str.match(/^(.+?)\s*<(.+)>$/);
    if (!match) {
        throw new Error(`Invalid author format: ${str}`);
    }
    return { name: match[1], email: match[2] };
}

/**
 * Get timezone offset string (+0000 format)
 */
function getTimezone() {
    const offset = new Date().getTimezoneOffset();
    const sign = offset <= 0 ? '+' : '-';
    const hours = Math.floor(Math.abs(offset) / 60).toString().padStart(2, '0');
    const minutes = (Math.abs(offset) % 60).toString().padStart(2, '0');
    return `${sign}${hours}${minutes}`;
}

module.exports = { execute };

Step 3: Implement the Log Command

src/commands/log.js

const fs = require('fs');
const path = require('path');
const { requireGitDir } = require('../utils/paths');
const { readObject } = require('../utils/objects');

/**
 * log - Show commit history
 * 
 * Usage:
 *   mygit log                    # Show full history
 *   mygit log --oneline          # Compact format
 *   mygit log -n 5               # Limit to 5 commits
 */
function execute(args) {
    const gitDir = requireGitDir();
    
    // Parse arguments
    let oneline = false;
    let limit = Infinity;
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '--oneline') {
            oneline = true;
        } else if (arg === '-n') {
            limit = parseInt(args[++i], 10);
        }
    }
    
    // Get starting commit
    const head = getHead(gitDir);
    if (!head) {
        console.log('fatal: your current branch has no commits yet');
        return;
    }
    
    // Walk commit history
    let currentHash = head;
    let count = 0;
    
    while (currentHash && count < limit) {
        const commit = parseCommit(gitDir, currentHash);
        
        if (oneline) {
            printCommitOneline(currentHash, commit);
        } else {
            printCommitFull(currentHash, commit, count > 0);
        }
        
        // Move to parent
        currentHash = commit.parents[0] || null;
        count++;
    }
}

/**
 * Get HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Parse a commit object
 */
function parseCommit(gitDir, hash) {
    const { content } = readObject(gitDir, hash);
    const text = content.toString();
    const lines = text.split('\n');
    
    const commit = {
        tree: null,
        parents: [],
        author: null,
        committer: null,
        message: ''
    };
    
    let i = 0;
    
    // Parse headers
    while (i < lines.length && lines[i] !== '') {
        const line = lines[i];
        
        if (line.startsWith('tree ')) {
            commit.tree = line.slice(5);
        } else if (line.startsWith('parent ')) {
            commit.parents.push(line.slice(7));
        } else if (line.startsWith('author ')) {
            commit.author = parseAuthorLine(line.slice(7));
        } else if (line.startsWith('committer ')) {
            commit.committer = parseAuthorLine(line.slice(10));
        }
        
        i++;
    }
    
    // Skip blank line
    i++;
    
    // Rest is the message
    commit.message = lines.slice(i).join('\n');
    
    return commit;
}

/**
 * Parse author line: "Name <email> timestamp timezone"
 */
function parseAuthorLine(line) {
    const match = line.match(/^(.+?) <(.+?)> (\d+) ([+-]\d{4})$/);
    if (!match) {
        return { name: 'Unknown', email: '', timestamp: 0, timezone: '+0000' };
    }
    
    return {
        name: match[1],
        email: match[2],
        timestamp: parseInt(match[3], 10),
        timezone: match[4]
    };
}

/**
 * Print commit in oneline format
 */
function printCommitOneline(hash, commit) {
    const shortHash = hash.slice(0, 7);
    const message = commit.message.split('\n')[0]; // First line only
    console.log(`\x1b[33m${shortHash}\x1b[0m ${message}`);
}

/**
 * Print commit in full format
 */
function printCommitFull(hash, commit, showSeparator) {
    if (showSeparator) {
        console.log();
    }
    
    console.log(`\x1b[33mcommit ${hash}\x1b[0m`);
    
    if (commit.author) {
        const date = formatDate(commit.author.timestamp, commit.author.timezone);
        console.log(`Author: ${commit.author.name} <${commit.author.email}>`);
        console.log(`Date:   ${date}`);
    }
    
    console.log();
    
    // Indent message
    const lines = commit.message.split('\n');
    for (const line of lines) {
        console.log(`    ${line}`);
    }
}

/**
 * Format timestamp to human-readable date
 */
function formatDate(timestamp, timezone) {
    const date = new Date(timestamp * 1000);
    
    const days = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'];
    const months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                   'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
    
    const dayName = days[date.getUTCDay()];
    const month = months[date.getUTCMonth()];
    const day = date.getUTCDate();
    const hours = date.getUTCHours().toString().padStart(2, '0');
    const minutes = date.getUTCMinutes().toString().padStart(2, '0');
    const seconds = date.getUTCSeconds().toString().padStart(2, '0');
    const year = date.getUTCFullYear();
    
    return `${dayName} ${month} ${day} ${hours}:${minutes}:${seconds} ${year} ${timezone}`;
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js

#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
    add: require('./commands/add'),
    status: require('./commands/status'),
    commit: require('./commands/commit'),
    log: require('./commands/log'),
};

function main() {
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init          Initialize a new repository');
        console.log('   hash-object   Compute object ID');
        console.log('   cat-file      Show object content');
        console.log('   add           Add files to staging area');
        console.log('   status        Show working tree status');
        console.log('   commit        Record changes to repository');
        console.log('   log           Show commit history');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(error.message);
        process.exit(1);
    }
}

main();

Testing Your Implementation

# Initialize and create files
mygit init
echo "# My Project" > README.md
echo "console.log('hello');" > index.js

# Stage and commit
mygit add README.md index.js
mygit commit -m "Initial commit"
# Output: [master (root-commit) abc1234] Initial commit

# Make changes
echo "More content" >> README.md
mygit add README.md
mygit commit -m "Update README"

# View history
mygit log
# Shows both commits with full details

mygit log --oneline
# abc1234 Update README
# def5678 Initial commit

How Git Optimizes

Deduplication

If you have the same file in multiple commits, Git stores it only once. The tree just points to the same blob hash.

Commit 1 tree → README.md → blob abc123
Commit 2 tree → README.md → blob abc123  ← Same blob!

Pack Files

Git periodically packs objects into .pack files with delta compression. Similar files are stored as deltas from a base object.We won’t implement this, but it’s why Git is so space-efficient.

Commit Caching

Real Git caches parsed commits in memory. For our implementation, we re-read each commit, which is fine for learning.

Exercises

Exercise 1: Add --graph flag to log

Show branch structure visually:

* abc1234 Merge feature branch
|\
| * def5678 Add feature
* | 789abcd Fix bug
|/
* 456789a Initial commit

Exercise 2: Implement diff between commits

Show what changed between two commits:

mygit diff abc123..def456

Exercise 3: Handle merge commits

Extend log to handle commits with multiple parents:

// In parseCommit, parents is already an array
// Extend log to show all parent hashes
// Handle walking multiple branches

Key Takeaways

Commits are Snapshots

Each commit points to a complete tree (directory snapshot), not diffs

History is a DAG

Commits form a directed acyclic graph via parent pointers

Branches are Pointers

A branch is just a file containing a commit hash

Trees are Nested

Tree objects contain blobs and other trees, creating directory structure

DSA: Graph Algorithms

Understand graph traversal for commit history

DSA: Trees

Learn tree traversal for directory structures

What’s Next?

In Chapter 5: Branches & Checkout, we’ll implement:

The branch command
The checkout command
Switching between branches
Detached HEAD state

Next: Branches & Checkout

Learn how Git manages and switches between branches

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Chapter 4: Commits & History

​How Commits Work

​The Commit Graph (DAG)

​Implementation

​Step 1: Build Tree from Index

​Step 2: Implement the Commit Command

​Step 3: Implement the Log Command

​Step 4: Update CLI Entry Point

​Testing Your Implementation

​How Git Optimizes

​Exercises

​Key Takeaways

Commits are Snapshots

History is a DAG

Branches are Pointers

Trees are Nested

​Further Reading

DSA: Graph Algorithms

DSA: Trees

​What’s Next?

Next: Branches & Checkout

Chapter 4: Commits & History

How Commits Work

The Commit Graph (DAG)

Implementation

Step 1: Build Tree from Index

Step 2: Implement the Commit Command

Step 3: Implement the Log Command

Step 4: Update CLI Entry Point

Testing Your Implementation

How Git Optimizes

Exercises

Key Takeaways

Further Reading

What’s Next?