Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Chapter 4: Commits & History

A commit is a snapshot of your project at a point in time. In this chapter, we’ll implement the commit and log commands, learning how Git builds and traverses the commit graph. A useful analogy: think of each commit as a photograph of every file in your project. The photo is labeled with the date, the photographer’s name (author), and a sticky note (commit message). The parent pointer is like writing “this is the photo that came right before me” on the back. By following those back-references, you can reconstruct the entire timeline of your project — that chain of pointers is the commit history.
Prerequisites: Completed Chapter 3: Staging & Index
Time: 2-3 hours
Outcome: Working commit and log commands

How Commits Work

┌─────────────────────────────────────────────────────────────────────────────┐
│                           COMMIT STRUCTURE                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   COMMIT OBJECT                                                              │
│   ─────────────                                                              │
│   ┌──────────────────────────────────────┐                                  │
│   │ tree    d8329fc1cc938780ffdd9f94e0d36│──────► TREE (root directory)     │
│   │ parent  5a6f32abc123def456789abcdef  │──────► PARENT COMMIT (if any)    │
│   │ author  John <john@example.com>      │                                  │
│   │         1234567890 +0000             │                                  │
│   │ committer Jane <jane@example.com>    │                                  │
│   │           1234567890 +0000           │                                  │
│   │                                      │                                  │
│   │ Initial commit                       │ ◄───── COMMIT MESSAGE            │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   TREE OBJECT (from commit's tree)                                          │
│   ───────────                                                                │
│   ┌──────────────────────────────────────┐                                  │
│   │ 100644 blob abc123  README.md        │──────► BLOB (file content)       │
│   │ 040000 tree def456  src              │──────► TREE (subdirectory)       │
│   │ 100644 blob 789abc  package.json     │──────► BLOB (file content)       │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
│   BLOB (file content)                                                        │
│   ───────────────────                                                        │
│   ┌──────────────────────────────────────┐                                  │
│   │ # My Project                         │                                  │
│   │                                      │                                  │
│   │ This is a sample project...          │                                  │
│   └──────────────────────────────────────┘                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

The Commit Graph (DAG)

Commits form a Directed Acyclic Graph (DAG):
                     (HEAD)


    ┌───────┐      ┌───────┐
    │   C1  │◄─────│   C2  │◄─────┐
    └───────┘      └───────┘      │
         │                        │
         │                   ┌────┴────┐
         │                   │   C4    │ (merge commit)
         │                   └────┬────┘
         │                        │
         │              ┌─────────┘
         │              │
         ▼              ▼
    ┌───────┐      ┌───────┐
    │   B1  │◄─────│   C3  │ (branch)
    └───────┘      └───────┘

Each commit knows its parent(s).
Following parent pointers = traversing history.
Why DAG, not just a linked list?
Merge commits have multiple parents, allowing branches to be combined while preserving all history. A linked list can only represent one straight line of work. A DAG captures the reality of software development: multiple people working in parallel, their work eventually converging. The “acyclic” constraint means you can never create a loop — commit A cannot be both an ancestor and a descendant of commit B — which guarantees that history always flows in one direction.

Implementation

Step 1: Build Tree from Index

Before creating a commit, we need to convert the flat index into a tree structure:
src/utils/tree.js
const path = require('path');
const { hashObject, writeObject } = require('./objects');

/**
 * Build a tree object from index entries.
 * 
 * The index is flat (files have full paths), but trees are nested:
 * Index: ["src/main.js", "src/utils/helper.js", "README.md"]
 * Trees: root -> src -> utils
 *              |-- README.md
 *              |-- main.js
 *                    |-- helper.js
 *
 * Why does this conversion matter? Because the commit object points to a
 * single root tree hash. That tree recursively contains sub-trees for every
 * directory. This nested structure mirrors the actual filesystem layout, and
 * it means two commits that share the same `src/` directory will point to
 * the same tree hash for `src/` -- another layer of deduplication.
 *
 * The algorithm: walk through the sorted index entries, split each path
 * by '/', and build a nested dictionary. Then recursively serialize each
 * level into a Git tree object (bottom-up, so child hashes are available
 * when the parent is written).
 */
function buildTree(entries, gitDir) {
    // Build directory structure
    const root = { type: 'tree', entries: {} };
    
    for (const entry of entries) {
        const parts = entry.name.split('/');
        let current = root;
        
        // Navigate/create directories
        for (let i = 0; i < parts.length - 1; i++) {
            const dirName = parts[i];
            if (!current.entries[dirName]) {
                current.entries[dirName] = { type: 'tree', entries: {} };
            }
            current = current.entries[dirName];
        }
        
        // Add file entry
        const fileName = parts[parts.length - 1];
        current.entries[fileName] = {
            type: 'blob',
            mode: entry.mode,
            hash: entry.hash
        };
    }
    
    // Recursively write trees and return root hash
    return writeTreeRecursive(root, gitDir);
}

/**
 * Recursively serialize and write tree objects
 */
function writeTreeRecursive(node, gitDir) {
    const entries = [];
    
    for (const [name, child] of Object.entries(node.entries)) {
        if (child.type === 'tree') {
            // Recurse into directory
            const treeHash = writeTreeRecursive(child, gitDir);
            entries.push({
                mode: '40000',
                name: name,
                hash: treeHash
            });
        } else {
            // File entry
            const mode = child.mode.toString(8); // Convert to octal string
            entries.push({
                mode: mode,
                name: name,
                hash: child.hash
            });
        }
    }
    
    // Sort entries (Git sorts directories without trailing slash)
    entries.sort((a, b) => {
        const aName = a.mode === '40000' ? a.name + '/' : a.name;
        const bName = b.mode === '40000' ? b.name + '/' : b.name;
        return aName.localeCompare(bName);
    });
    
    // Serialize tree
    const chunks = [];
    for (const entry of entries) {
        // Format: "{mode} {name}\0{20-byte-binary-hash}"
        const modeAndName = Buffer.from(`${entry.mode} ${entry.name}\0`);
        const hashBytes = Buffer.from(entry.hash, 'hex');
        chunks.push(modeAndName, hashBytes);
    }
    
    const treeContent = Buffer.concat(chunks);
    const { hash, data } = hashObject(treeContent, 'tree');
    writeObject(gitDir, hash, data);
    
    return hash;
}

module.exports = { buildTree };

Step 2: Implement the Commit Command

src/commands/commit.js
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index } = require('../utils/index');
const { buildTree } = require('../utils/tree');

/**
 * commit - Record changes to the repository
 * 
 * Usage:
 *   mygit commit -m "message"
 *   mygit commit --author "Name <email>"
 */
function execute(args) {
    const gitDir = requireGitDir();
    const index = Index.read(gitDir);
    
    // Check if there's anything to commit
    const entries = index.getEntries();
    if (entries.length === 0) {
        throw new Error('nothing to commit (create/copy files and use "mygit add" to track)');
    }
    
    // Parse arguments
    let message = null;
    let author = getDefaultAuthor();
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '-m' || arg === '--message') {
            message = args[++i];
        } else if (arg === '--author') {
            author = parseAuthor(args[++i]);
        }
    }
    
    if (!message) {
        throw new Error('Aborting commit due to empty commit message.');
    }
    
    // Build tree from index.
    // This converts the flat list of staged files into nested tree objects.
    // The returned hash is the root tree -- the "snapshot" the commit will reference.
    const treeHash = buildTree(entries, gitDir);
    
    // Get parent commit (if any).
    // The first commit in a repo has no parent -- it's the "root commit".
    // Every subsequent commit links back to its parent, forming the history chain.
    const parent = getHead(gitDir);
    
    // Create commit object.
    // The timestamp is Unix epoch seconds. Git stores timestamps as integers
    // rather than formatted strings to avoid locale/timezone parsing issues.
    const timestamp = Math.floor(Date.now() / 1000);
    const timezone = getTimezone();
    
    const lines = [];
    lines.push(`tree ${treeHash}`);
    
    if (parent) {
        lines.push(`parent ${parent}`);
    }
    
    lines.push(`author ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push(`committer ${author.name} <${author.email}> ${timestamp} ${timezone}`);
    lines.push('');
    lines.push(message);
    
    const commitContent = lines.join('\n');
    const { hash, data } = hashObject(commitContent, 'commit');
    writeObject(gitDir, hash, data);
    
    // Update HEAD/branch ref
    updateHead(gitDir, hash);
    
    // Print result
    const branch = getCurrentBranch(gitDir);
    const shortHash = hash.slice(0, 7);
    const isRoot = !parent;
    
    console.log(`[${branch}${isRoot ? ' (root-commit)' : ''} ${shortHash}] ${message}`);
    console.log(` ${entries.length} file(s) changed`);
}

/**
 * Get current HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Update HEAD to point to new commit.
 *
 * This is the critical step that "advances" the current branch. HEAD says
 * "I'm on branch main," and main says "current commit is abc123." When we
 * commit, we write the new commit hash into the branch file -- so main now
 * points to the new commit, and the new commit's parent pointer links back
 * to the old one. This is how branches grow: the tip moves forward, one
 * commit at a time.
 *
 * In detached HEAD mode (HEAD contains a raw hash, not "ref: ..."), we
 * update HEAD itself, which is why commits made in detached HEAD can be
 * lost if you switch branches without creating a branch first.
 */
function updateHead(gitDir, commitHash) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        // Update the branch ref
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        // Ensure directory exists
        const refDir = path.dirname(refFile);
        if (!fs.existsSync(refDir)) {
            fs.mkdirSync(refDir, { recursive: true });
        }
        
        fs.writeFileSync(refFile, commitHash + '\n');
    } else {
        // Detached HEAD: update HEAD directly
        fs.writeFileSync(headPath, commitHash + '\n');
    }
}

/**
 * Get current branch name
 */
function getCurrentBranch(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: refs/heads/')) {
        return headContent.slice('ref: refs/heads/'.length);
    }
    
    return 'HEAD';
}

/**
 * Get default author from environment or config
 */
function getDefaultAuthor() {
    // Try environment variables first
    const name = process.env.GIT_AUTHOR_NAME || 
                 process.env.USER || 
                 'Unknown';
    const email = process.env.GIT_AUTHOR_EMAIL || 
                  `${name.toLowerCase().replace(/\s+/g, '.')}@localhost`;
    
    return { name, email };
}

/**
 * Parse author string "Name <email>"
 */
function parseAuthor(str) {
    const match = str.match(/^(.+?)\s*<(.+)>$/);
    if (!match) {
        throw new Error(`Invalid author format: ${str}`);
    }
    return { name: match[1], email: match[2] };
}

/**
 * Get timezone offset string (+0000 format)
 */
function getTimezone() {
    const offset = new Date().getTimezoneOffset();
    const sign = offset <= 0 ? '+' : '-';
    const hours = Math.floor(Math.abs(offset) / 60).toString().padStart(2, '0');
    const minutes = (Math.abs(offset) % 60).toString().padStart(2, '0');
    return `${sign}${hours}${minutes}`;
}

module.exports = { execute };

Step 3: Implement the Log Command

src/commands/log.js
const fs = require('fs');
const path = require('path');
const { requireGitDir } = require('../utils/paths');
const { readObject } = require('../utils/objects');

/**
 * log - Show commit history
 * 
 * Usage:
 *   mygit log                    # Show full history
 *   mygit log --oneline          # Compact format
 *   mygit log -n 5               # Limit to 5 commits
 */
function execute(args) {
    const gitDir = requireGitDir();
    
    // Parse arguments
    let oneline = false;
    let limit = Infinity;
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '--oneline') {
            oneline = true;
        } else if (arg === '-n') {
            limit = parseInt(args[++i], 10);
        }
    }
    
    // Get starting commit
    const head = getHead(gitDir);
    if (!head) {
        console.log('fatal: your current branch has no commits yet');
        return;
    }
    
    // Walk commit history
    let currentHash = head;
    let count = 0;
    
    while (currentHash && count < limit) {
        const commit = parseCommit(gitDir, currentHash);
        
        if (oneline) {
            printCommitOneline(currentHash, commit);
        } else {
            printCommitFull(currentHash, commit, count > 0);
        }
        
        // Move to parent
        currentHash = commit.parents[0] || null;
        count++;
    }
}

/**
 * Get HEAD commit hash
 */
function getHead(gitDir) {
    const headPath = path.join(gitDir, 'HEAD');
    const headContent = fs.readFileSync(headPath, 'utf8').trim();
    
    if (headContent.startsWith('ref: ')) {
        const refPath = headContent.slice(5);
        const refFile = path.join(gitDir, refPath);
        
        if (fs.existsSync(refFile)) {
            return fs.readFileSync(refFile, 'utf8').trim();
        }
        return null;
    }
    
    return headContent;
}

/**
 * Parse a commit object
 */
function parseCommit(gitDir, hash) {
    const { content } = readObject(gitDir, hash);
    const text = content.toString();
    const lines = text.split('\n');
    
    const commit = {
        tree: null,
        parents: [],
        author: null,
        committer: null,
        message: ''
    };
    
    let i = 0;
    
    // Parse headers
    while (i < lines.length && lines[i] !== '') {
        const line = lines[i];
        
        if (line.startsWith('tree ')) {
            commit.tree = line.slice(5);
        } else if (line.startsWith('parent ')) {
            commit.parents.push(line.slice(7));
        } else if (line.startsWith('author ')) {
            commit.author = parseAuthorLine(line.slice(7));
        } else if (line.startsWith('committer ')) {
            commit.committer = parseAuthorLine(line.slice(10));
        }
        
        i++;
    }
    
    // Skip blank line
    i++;
    
    // Rest is the message
    commit.message = lines.slice(i).join('\n');
    
    return commit;
}

/**
 * Parse author line: "Name <email> timestamp timezone"
 */
function parseAuthorLine(line) {
    const match = line.match(/^(.+?) <(.+?)> (\d+) ([+-]\d{4})$/);
    if (!match) {
        return { name: 'Unknown', email: '', timestamp: 0, timezone: '+0000' };
    }
    
    return {
        name: match[1],
        email: match[2],
        timestamp: parseInt(match[3], 10),
        timezone: match[4]
    };
}

/**
 * Print commit in oneline format
 */
function printCommitOneline(hash, commit) {
    const shortHash = hash.slice(0, 7);
    const message = commit.message.split('\n')[0]; // First line only
    console.log(`\x1b[33m${shortHash}\x1b[0m ${message}`);
}

/**
 * Print commit in full format
 */
function printCommitFull(hash, commit, showSeparator) {
    if (showSeparator) {
        console.log();
    }
    
    console.log(`\x1b[33mcommit ${hash}\x1b[0m`);
    
    if (commit.author) {
        const date = formatDate(commit.author.timestamp, commit.author.timezone);
        console.log(`Author: ${commit.author.name} <${commit.author.email}>`);
        console.log(`Date:   ${date}`);
    }
    
    console.log();
    
    // Indent message
    const lines = commit.message.split('\n');
    for (const line of lines) {
        console.log(`    ${line}`);
    }
}

/**
 * Format timestamp to human-readable date
 */
function formatDate(timestamp, timezone) {
    const date = new Date(timestamp * 1000);
    
    const days = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'];
    const months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                   'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
    
    const dayName = days[date.getUTCDay()];
    const month = months[date.getUTCMonth()];
    const day = date.getUTCDate();
    const hours = date.getUTCHours().toString().padStart(2, '0');
    const minutes = date.getUTCMinutes().toString().padStart(2, '0');
    const seconds = date.getUTCSeconds().toString().padStart(2, '0');
    const year = date.getUTCFullYear();
    
    return `${dayName} ${month} ${day} ${hours}:${minutes}:${seconds} ${year} ${timezone}`;
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js
#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
    add: require('./commands/add'),
    status: require('./commands/status'),
    commit: require('./commands/commit'),
    log: require('./commands/log'),
};

function main() {
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init          Initialize a new repository');
        console.log('   hash-object   Compute object ID');
        console.log('   cat-file      Show object content');
        console.log('   add           Add files to staging area');
        console.log('   status        Show working tree status');
        console.log('   commit        Record changes to repository');
        console.log('   log           Show commit history');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(error.message);
        process.exit(1);
    }
}

main();

Testing Your Implementation

# Initialize and create files
mygit init
echo "# My Project" > README.md
echo "console.log('hello');" > index.js

# Stage and commit
mygit add README.md index.js
mygit commit -m "Initial commit"
# Output: [master (root-commit) abc1234] Initial commit

# Make changes
echo "More content" >> README.md
mygit add README.md
mygit commit -m "Update README"

# View history
mygit log
# Shows both commits with full details

mygit log --oneline
# abc1234 Update README
# def5678 Initial commit

How Git Optimizes

If you have the same file in multiple commits, Git stores it only once. The tree just points to the same blob hash.
Commit 1 tree → README.md → blob abc123
Commit 2 tree → README.md → blob abc123  ← Same blob!
Git periodically packs objects into .pack files with delta compression. Similar files are stored as deltas from a base object.We won’t implement this, but it’s why Git is so space-efficient.
Real Git caches parsed commits in memory. For our implementation, we re-read each commit, which is fine for learning.

Exercises

Show branch structure visually:
* abc1234 Merge feature branch
|\
| * def5678 Add feature
* | 789abcd Fix bug
|/
* 456789a Initial commit
Show what changed between two commits:
mygit diff abc123..def456
Extend log to handle commits with multiple parents:
// In parseCommit, parents is already an array
// Extend log to show all parent hashes
// Handle walking multiple branches

Key Takeaways

Commits are Snapshots

Each commit points to a complete tree (directory snapshot), not diffs

History is a DAG

Commits form a directed acyclic graph via parent pointers

Branches are Pointers

A branch is just a file containing a commit hash

Trees are Nested

Tree objects contain blobs and other trees, creating directory structure

Further Reading

DSA: Graph Algorithms

Understand graph traversal for commit history

DSA: Trees

Learn tree traversal for directory structures

What’s Next?

In Chapter 5: Branches & Checkout, we’ll implement:
  • The branch command
  • The checkout command
  • Switching between branches
  • Detached HEAD state

Next: Branches & Checkout

Learn how Git manages and switches between branches

Interview Deep-Dive

Strong Answer:
  • Snapshot-based storage makes checkout a constant-time operation. To materialize any commit, Git reads the root tree and recursively reads the blobs. There is no need to replay a chain of diffs from the beginning of history. Checking out commit 1 is as fast as checking out commit 10,000.
  • Diff-based systems (like older VCS tools) store the initial version and then a chain of deltas. To reconstruct version N, you must apply N-1 diffs sequentially. This gets slower as history grows and makes certain operations (like blame or bisect) expensive because they require reconstructing arbitrary versions.
  • Snapshots also make branching and merging simpler. A merge is a comparison of two tree snapshots against a common ancestor. With diffs, you would need to reconcile overlapping delta chains, which is more error-prone.
  • The apparent waste of storing full snapshots is mitigated by Git’s deduplication. Unchanged files share the same blob hash across commits, unchanged directories share tree hashes, and pack files apply delta compression after the fact. So Git gets the performance of snapshots with the storage efficiency of deltas — the best of both worlds.
Follow-up: What is the commit DAG, and why is it a DAG specifically rather than a simple linked list?A DAG (directed acyclic graph) allows merge commits to have multiple parents, representing the point where two branches converge. A linked list can only represent linear history — one commit after another. Real software development is inherently non-linear: multiple developers work in parallel, and their work eventually merges. The “acyclic” constraint means history always moves forward — commit A cannot be both an ancestor and a descendant of commit B. This guarantee is enforced by the hash chain: each commit’s hash depends on its parent hashes, so creating a cycle would require a hash collision. The DAG structure enables git log --graph to visualize parallel development, git merge-base to find common ancestors, and three-way merge to reconcile divergent changes.
Strong Answer:
  • First, Git reads the index (.git/index) to get the list of staged files with their blob hashes. It converts this flat list into a nested tree structure by grouping entries by directory.
  • It writes tree objects bottom-up: leaf directories first, then their parents, up to the root tree. Each tree object is hashed and stored in .git/objects/. If a subtree is identical to one that already exists (unchanged directory), the same hash is reused and no new object is written.
  • Git reads HEAD to find the current branch (ref: refs/heads/main), then reads the branch file to get the current commit hash (the parent).
  • It constructs the commit object: the root tree hash, the parent commit hash, author/committer lines with timestamps, and the commit message. This is hashed and written to the object store.
  • Finally, Git updates the branch file (.git/refs/heads/main) with the new commit’s hash. HEAD is not changed — it still says ref: refs/heads/main. Only the branch pointer advances.
  • Total new objects: one commit, one or more trees (only for changed directory paths), and the blobs were already written during git add. Total file updates: the branch ref file and possibly the index (to update stat cache entries).
Follow-up: What happens if the process crashes between writing the commit object and updating the branch ref?The commit object exists in the object store but is unreachable — no branch or tag points to it. It becomes a “dangling commit.” git fsck --unreachable will find it. The branch pointer still points to the old commit, so from the user’s perspective, the commit never happened. This is safe: the repository is in a consistent state, and the dangling object will be cleaned up by git gc after the reflog expiration period (default 30 days for unreachable objects). Git’s design ensures that the branch ref update is the atomic “commit point” — everything before that is preparation, and the repository is consistent whether the update happens or not.
Strong Answer:
  • git log starts at HEAD, reads the commit object, prints it, then follows the parent pointer to the next commit. For a linear history, this is a simple linked-list traversal — O(n) where n is the number of commits displayed.
  • For histories with branches and merges, git log performs a priority-queue-based topological sort of the DAG. It maintains a queue of commits sorted by timestamp, dequeues the newest, prints it, and enqueues its parents. This ensures commits are displayed in roughly chronological order while respecting the DAG structure.
  • For repositories with millions of commits (like the Linux kernel), the performance bottleneck is I/O: reading commit objects from pack files. Git mitigates this with commit-graph files (.git/objects/info/commit-graph), which pre-compute and cache commit metadata (parents, tree hash, generation number) in a memory-mappable binary format. With a commit graph, git log can traverse history without reading individual commit objects, reducing the operation from millions of object lookups to a single file scan.
  • Generation numbers in the commit graph enable even faster reachability queries: “is commit A an ancestor of commit B?” can be answered in O(1) if A’s generation number is greater than B’s (impossible to be an ancestor). This accelerates merge-base computation, fetch negotiation, and branch filtering.
Follow-up: What is the commit-graph file, and when should you regenerate it?The commit-graph file is a binary cache of commit metadata (OID, parent OIDs, tree OID, generation number, and commit timestamp) stored in a format optimized for random access. It is automatically maintained by git maintenance (enabled by default in modern Git) and updated incrementally during git gc. You should regenerate it after large imports or history rewrites with git commit-graph write --reachable. Without it, operations like git log --ancestry-path or git merge-base on large repositories can be orders of magnitude slower because they must open and parse individual commit objects from pack files instead of scanning the pre-computed graph.