Skip to main content

Chapter 4: Commits & History

A commit is a snapshot of your project at a point in time. In this chapter, we’ll implement the commit and log commands, learning how Git builds and traverses the commit graph.
Prerequisites: Completed Chapter 3: Staging & Index
Time: 2-3 hours
Outcome: Working commit and log commands

How Commits Work

┌─────────────────────────────────────────────────────────────────────────────┐
│                           COMMIT STRUCTURE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   COMMIT OBJECT                                                              │
│   ─────────────                                                              │
│   ┌──────────────────────────────────────┐                                  │
│   │ tree    d8329fc1cc938780ffdd9f94e0d36│──────► TREE (root directory)     │
│   │ parent  5a6f32abc123def456789abcdef  │──────► PARENT COMMIT (if any)    │
│   │ author  John <[email protected]>      │                                  │
│   │         1234567890 +0000             │                                  │
│   │ committer Jane <[email protected]>    │                                  │
│   │           1234567890 +0000           │                                  │
│   │                                      │                                  │
│   │ Initial commit                       │ ◄───── COMMIT MESSAGE            │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   TREE OBJECT (from commit's tree)                                          │
│   ───────────                                                                │
│   ┌──────────────────────────────────────┐                                  │
│   │ 100644 blob abc123  README.md        │──────► BLOB (file content)       │
│   │ 040000 tree def456  src              │──────► TREE (subdirectory)       │
│   │ 100644 blob 789abc  package.json     │──────► BLOB (file content)       │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   BLOB (file content)                                                        │
│   ───────────────────                                                        │
│   ┌──────────────────────────────────────┐                                  │
│   │ # My Project                         │                                  │
│   │                                      │                                  │
│   │ This is a sample project...          │                                  │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Commit Graph (DAG)

Commits form a Directed Acyclic Graph (DAG):
                     (HEAD)


    ┌───────┐      ┌───────┐
    │   C1  │◄─────│   C2  │◄─────┐
    └───────┘      └───────┘      │
         │                        │
         │                   ┌────┴────┐
         │                   │   C4    │ (merge commit)
         │                   └────┬────┘
         │                        │
         │              ┌─────────┘
         │              │
         ▼              ▼
    ┌───────┐      ┌───────┐
    │   B1  │◄─────│   C3  │ (branch)
    └───────┘      └───────┘

Each commit knows its parent(s).
Following parent pointers = traversing history.
Why DAG, not just a linked list?
Merge commits have multiple parents, allowing branches to be combined while preserving all history.

Implementation

Step 1: Build Tree from Index

Before creating a commit, we need to convert the flat index into a tree structure:
src/utils/tree.js
const path = require('path');
const { hashObject, writeObject } = require('./objects');

/**
 * Build a tree object from index entries
 * 
 * The index is flat (files have full paths), but trees are nested:
 * Index: ["src/main.js", "src/utils/helper.js", "README.md"]
 * Trees: root -> src -> utils
 *              └─ README.md
 *              └─ main.js
 *                    └─ helper.js
 */
function buildTree(entries, gitDir) {
    // Build directory structure
    const root = { type: 'tree', entries: {} };
    
    for (const entry of entries) {
        const parts = entry.name.split('/');
        let current = root;
        
        // Navigate/create directories
        for (let i = 0; i < parts.length - 1; i++) {
            const dirName = parts[i];
            if (!current.entries[dirName]) {
                current.entries[dirName] = { type: 'tree', entries: {} };
            }
            current = current.entries[dirName];
        }
        
        // Add file entry
        const fileName = parts[parts.length - 1];
        current.entries[fileName] = {
            type: 'blob',
            mode: entry.mode,
            hash: entry.hash
        };
    }
    
    // Recursively write trees and return root hash
    return writeTreeRecursive(root, gitDir);
}

/**
 * Recursively serialize and write tree objects
 */
function writeTreeRecursive(node, gitDir) {
    const entries = [];
    
    for (const [name, child] of Object.entries(node.entries)) {
        if (child.type === 'tree') {
            // Recurse into directory
            const treeHash = writeTreeRecursive(child, gitDir);
            entries.push({
                mode: '40000',
                name: name,
                hash: treeHash
            });
        } else {
            // File entry
            const mode = child.mode.toString(8); // Convert to octal string
            entries.push({
                mode: mode,
                name: name,
                hash: child.hash
            });
        }
    }
    
    // Sort entries (Git sorts directories without trailing slash)
    entries.sort((a, b) => {
        const aName = a.mode === '40000' ? a.name + '/' : a.name;
        const bName = b.mode === '40000' ? b.name + '/' : b.name;
        return aName.localeCompare(bName);
    });
    
    // Serialize tree
    const chunks = [];
    for (const entry of entries) {
        // Format: "{mode} {name}\0{20-byte-binary-hash}"
        const modeAndName = Buffer.from(`${entry.mode} ${entry.name}\0`);
        const hashBytes = Buffer.from(entry.hash, 'hex');
        chunks.push(modeAndName, hashBytes);
    }
    
    const treeContent = Buffer.concat(chunks);
    const { hash, data } = hashObject(treeContent, 'tree');
    writeObject(gitDir, hash, data);
    
    return hash;
}

module.exports = { buildTree };

Step 2: Implement the Commit Command

src/commands/commit.js
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index } = require('../utils/index');
const { buildTree } = require('../utils/tree');

/**
 * commit - Record changes to the repository
 * 
 * Usage:
 *   mygit commit -m "message"
 *   mygit commit --author "Name <email>"
 */
function execute(args) {
    const gitDir = requireGitDir();
    const index = Index.read(gitDir);
    
    // Check if there's anything to commit
    const entries = index.getEntries();
    if (entries.length === 0) {
        throw new Error('nothing to commit (create/copy files and use "mygit add" to track)');
    }
    
    // Parse arguments
    let message = null;
    let author = getDefaultAuthor();
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '-m' || arg === '--message') {
            message = args[++i];
        } else if (arg === '--author') {
            author = parseAuthor(args[++i]);
        }
    }
    
    if (!message) {
        throw new Error('Aborting commit due to empty commit message.');
    }
    
    // Build tree from index
    const treeHash = buildTree(entries, gitDir);
    
    // Get parent commit (if any)
    const parent = getHead(gitDir);
    
    // Create commit object
    const timestamp = Math.floor(Date.now() / 1000);
    const timezone = getTimezone();
    
    const lines = [];
    lines.push(`tree ${treeHash}`);
    
    if (parent) {
        lines.push(`parent ${parent}`);
    }
    
    lines.push(`author ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push(`committer ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push('');
    lines.push(message);
    
    const commitContent = lines.join('\n');
    const { hash, data } = hashObject(commitContent, 'commit');
    writeObject(gitDir, hash, data);
    
    // Update HEAD/branch ref
    updateHead(gitDir, hash);
    
    // Print result
    const branch = getCurrentBranch(gitDir);
    const shortHash = hash.slice(0, 7);
    const isRoot = !parent;
    
    console.log(`[${branch}${isRoot ? ' (root-commit)' : ''} ${shortHash}] ${message}`);
    console.log(` ${entries.length} file(s) changed`);
}

/**
 * Get current HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Update HEAD to point to new commit
 */
function updateHead(gitDir, commitHash) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        // Update the branch ref
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        // Ensure directory exists
        const refDir = path.dirname(refFile);
        if (!fs.existsSync(refDir)) {
            fs.mkdirSync(refDir, { recursive: true });
        }
        
        fs.writeFileSync(refFile, commitHash + '\n');
    } else {
        // Detached HEAD: update HEAD directly
        fs.writeFileSync(headPath, commitHash + '\n');
    }
}

/**
 * Get current branch name
 */
function getCurrentBranch(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: refs/heads/')) {
        return headContent.slice('ref: refs/heads/'.length);
    }
    
    return 'HEAD';
}

/**
 * Get default author from environment or config
 */
function getDefaultAuthor() {
    // Try environment variables first
    const name = process.env.GIT_AUTHOR_NAME || 
                 process.env.USER || 
                 'Unknown';
    const email = process.env.GIT_AUTHOR_EMAIL || 
                  `${name.toLowerCase().replace(/\s+/g, '.')}@localhost`;
    
    return { name, email };
}

/**
 * Parse author string "Name <email>"
 */
function parseAuthor(str) {
    const match = str.match(/^(.+?)\s*<(.+)>$/);
    if (!match) {
        throw new Error(`Invalid author format: ${str}`);
    }
    return { name: match[1], email: match[2] };
}

/**
 * Get timezone offset string (+0000 format)
 */
function getTimezone() {
    const offset = new Date().getTimezoneOffset();
    const sign = offset <= 0 ? '+' : '-';
    const hours = Math.floor(Math.abs(offset) / 60).toString().padStart(2, '0');
    const minutes = (Math.abs(offset) % 60).toString().padStart(2, '0');
    return `${sign}${hours}${minutes}`;
}

module.exports = { execute };

Step 3: Implement the Log Command

src/commands/log.js
const fs = require('fs');
const path = require('path');
const { requireGitDir } = require('../utils/paths');
const { readObject } = require('../utils/objects');

/**
 * log - Show commit history
 * 
 * Usage:
 *   mygit log                    # Show full history
 *   mygit log --oneline          # Compact format
 *   mygit log -n 5               # Limit to 5 commits
 */
function execute(args) {
    const gitDir = requireGitDir();
    
    // Parse arguments
    let oneline = false;
    let limit = Infinity;
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '--oneline') {
            oneline = true;
        } else if (arg === '-n') {
            limit = parseInt(args[++i], 10);
        }
    }
    
    // Get starting commit
    const head = getHead(gitDir);
    if (!head) {
        console.log('fatal: your current branch has no commits yet');
        return;
    }
    
    // Walk commit history
    let currentHash = head;
    let count = 0;
    
    while (currentHash && count < limit) {
        const commit = parseCommit(gitDir, currentHash);
        
        if (oneline) {
            printCommitOneline(currentHash, commit);
        } else {
            printCommitFull(currentHash, commit, count > 0);
        }
        
        // Move to parent
        currentHash = commit.parents[0] || null;
        count++;
    }
}

/**
 * Get HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Parse a commit object
 */
function parseCommit(gitDir, hash) {
    const { content } = readObject(gitDir, hash);
    const text = content.toString();
    const lines = text.split('\n');
    
    const commit = {
        tree: null,
        parents: [],
        author: null,
        committer: null,
        message: ''
    };
    
    let i = 0;
    
    // Parse headers
    while (i < lines.length && lines[i] !== '') {
        const line = lines[i];
        
        if (line.startsWith('tree ')) {
            commit.tree = line.slice(5);
        } else if (line.startsWith('parent ')) {
            commit.parents.push(line.slice(7));
        } else if (line.startsWith('author ')) {
            commit.author = parseAuthorLine(line.slice(7));
        } else if (line.startsWith('committer ')) {
            commit.committer = parseAuthorLine(line.slice(10));
        }
        
        i++;
    }
    
    // Skip blank line
    i++;
    
    // Rest is the message
    commit.message = lines.slice(i).join('\n');
    
    return commit;
}

/**
 * Parse author line: "Name <email> timestamp timezone"
 */
function parseAuthorLine(line) {
    const match = line.match(/^(.+?) <(.+?)> (\d+) ([+-]\d{4})$/);
    if (!match) {
        return { name: 'Unknown', email: '', timestamp: 0, timezone: '+0000' };
    }
    
    return {
        name: match[1],
        email: match[2],
        timestamp: parseInt(match[3], 10),
        timezone: match[4]
    };
}

/**
 * Print commit in oneline format
 */
function printCommitOneline(hash, commit) {
    const shortHash = hash.slice(0, 7);
    const message = commit.message.split('\n')[0]; // First line only
    console.log(`\x1b[33m${shortHash}\x1b[0m ${message}`);
}

/**
 * Print commit in full format
 */
function printCommitFull(hash, commit, showSeparator) {
    if (showSeparator) {
        console.log();
    }
    
    console.log(`\x1b[33mcommit ${hash}\x1b[0m`);
    
    if (commit.author) {
        const date = formatDate(commit.author.timestamp, commit.author.timezone);
        console.log(`Author: ${commit.author.name} <${commit.author.email}>`);
        console.log(`Date:   ${date}`);
    }
    
    console.log();
    
    // Indent message
    const lines = commit.message.split('\n');
    for (const line of lines) {
        console.log(`    ${line}`);
    }
}

/**
 * Format timestamp to human-readable date
 */
function formatDate(timestamp, timezone) {
    const date = new Date(timestamp * 1000);
    
    const days = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'];
    const months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                   'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
    
    const dayName = days[date.getUTCDay()];
    const month = months[date.getUTCMonth()];
    const day = date.getUTCDate();
    const hours = date.getUTCHours().toString().padStart(2, '0');
    const minutes = date.getUTCMinutes().toString().padStart(2, '0');
    const seconds = date.getUTCSeconds().toString().padStart(2, '0');
    const year = date.getUTCFullYear();
    
    return `${dayName} ${month} ${day} ${hours}:${minutes}:${seconds} ${year} ${timezone}`;
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js
#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
    add: require('./commands/add'),
    status: require('./commands/status'),
    commit: require('./commands/commit'),
    log: require('./commands/log'),
};

function main() {
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init          Initialize a new repository');
        console.log('   hash-object   Compute object ID');
        console.log('   cat-file      Show object content');
        console.log('   add           Add files to staging area');
        console.log('   status        Show working tree status');
        console.log('   commit        Record changes to repository');
        console.log('   log           Show commit history');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(error.message);
        process.exit(1);
    }
}

main();

Testing Your Implementation

# Initialize and create files
mygit init
echo "# My Project" > README.md
echo "console.log('hello');" > index.js

# Stage and commit
mygit add README.md index.js
mygit commit -m "Initial commit"
# Output: [master (root-commit) abc1234] Initial commit

# Make changes
echo "More content" >> README.md
mygit add README.md
mygit commit -m "Update README"

# View history
mygit log
# Shows both commits with full details

mygit log --oneline
# abc1234 Update README
# def5678 Initial commit

How Git Optimizes

If you have the same file in multiple commits, Git stores it only once. The tree just points to the same blob hash.
Commit 1 tree → README.md → blob abc123
Commit 2 tree → README.md → blob abc123  ← Same blob!
Git periodically packs objects into .pack files with delta compression. Similar files are stored as deltas from a base object.We won’t implement this, but it’s why Git is so space-efficient.
Real Git caches parsed commits in memory. For our implementation, we re-read each commit, which is fine for learning.

Exercises

Show branch structure visually:
* abc1234 Merge feature branch
|\
| * def5678 Add feature
* | 789abcd Fix bug
|/
* 456789a Initial commit
Show what changed between two commits:
mygit diff abc123..def456
Extend log to handle commits with multiple parents:
// In parseCommit, parents is already an array
// Extend log to show all parent hashes
// Handle walking multiple branches

Key Takeaways

Commits are Snapshots

Each commit points to a complete tree (directory snapshot), not diffs

History is a DAG

Commits form a directed acyclic graph via parent pointers

Branches are Pointers

A branch is just a file containing a commit hash

Trees are Nested

Tree objects contain blobs and other trees, creating directory structure

Further Reading


What’s Next?

In Chapter 5: Branches & Checkout, we’ll implement:
  • The branch command
  • The checkout command
  • Switching between branches
  • Detached HEAD state

Next: Branches & Checkout

Learn how Git manages and switches between branches