Chapter 4: Commits & History
A commit is a snapshot of your project at a point in time. In this chapter, we’ll implement thecommit and log commands, learning how Git builds and traverses the commit graph.
Prerequisites: Completed Chapter 3: Staging & Index
Time: 2-3 hours
Outcome: Working
Time: 2-3 hours
Outcome: Working
commit and log commandsHow Commits Work
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ COMMIT STRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ COMMIT OBJECT │
│ ───────────── │
│ ┌──────────────────────────────────────┐ │
│ │ tree d8329fc1cc938780ffdd9f94e0d36│──────► TREE (root directory) │
│ │ parent 5a6f32abc123def456789abcdef │──────► PARENT COMMIT (if any) │
│ │ author John <[email protected]> │ │
│ │ 1234567890 +0000 │ │
│ │ committer Jane <[email protected]> │ │
│ │ 1234567890 +0000 │ │
│ │ │ │
│ │ Initial commit │ ◄───── COMMIT MESSAGE │
│ └──────────────────────────────────────┘ │
│ │
│ TREE OBJECT (from commit's tree) │
│ ─────────── │
│ ┌──────────────────────────────────────┐ │
│ │ 100644 blob abc123 README.md │──────► BLOB (file content) │
│ │ 040000 tree def456 src │──────► TREE (subdirectory) │
│ │ 100644 blob 789abc package.json │──────► BLOB (file content) │
│ └──────────────────────────────────────┘ │
│ │
│ BLOB (file content) │
│ ─────────────────── │
│ ┌──────────────────────────────────────┐ │
│ │ # My Project │ │
│ │ │ │
│ │ This is a sample project... │ │
│ └──────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Commit Graph (DAG)
Commits form a Directed Acyclic Graph (DAG):Copy
(HEAD)
│
▼
┌───────┐ ┌───────┐
│ C1 │◄─────│ C2 │◄─────┐
└───────┘ └───────┘ │
│ │
│ ┌────┴────┐
│ │ C4 │ (merge commit)
│ └────┬────┘
│ │
│ ┌─────────┘
│ │
▼ ▼
┌───────┐ ┌───────┐
│ B1 │◄─────│ C3 │ (branch)
└───────┘ └───────┘
Each commit knows its parent(s).
Following parent pointers = traversing history.
Why DAG, not just a linked list?
Merge commits have multiple parents, allowing branches to be combined while preserving all history.
Merge commits have multiple parents, allowing branches to be combined while preserving all history.
Implementation
Step 1: Build Tree from Index
Before creating a commit, we need to convert the flat index into a tree structure:src/utils/tree.js
Copy
const path = require('path');
const { hashObject, writeObject } = require('./objects');
/**
* Build a tree object from index entries
*
* The index is flat (files have full paths), but trees are nested:
* Index: ["src/main.js", "src/utils/helper.js", "README.md"]
* Trees: root -> src -> utils
* └─ README.md
* └─ main.js
* └─ helper.js
*/
function buildTree(entries, gitDir) {
// Build directory structure
const root = { type: 'tree', entries: {} };
for (const entry of entries) {
const parts = entry.name.split('/');
let current = root;
// Navigate/create directories
for (let i = 0; i < parts.length - 1; i++) {
const dirName = parts[i];
if (!current.entries[dirName]) {
current.entries[dirName] = { type: 'tree', entries: {} };
}
current = current.entries[dirName];
}
// Add file entry
const fileName = parts[parts.length - 1];
current.entries[fileName] = {
type: 'blob',
mode: entry.mode,
hash: entry.hash
};
}
// Recursively write trees and return root hash
return writeTreeRecursive(root, gitDir);
}
/**
* Recursively serialize and write tree objects
*/
function writeTreeRecursive(node, gitDir) {
const entries = [];
for (const [name, child] of Object.entries(node.entries)) {
if (child.type === 'tree') {
// Recurse into directory
const treeHash = writeTreeRecursive(child, gitDir);
entries.push({
mode: '40000',
name: name,
hash: treeHash
});
} else {
// File entry
const mode = child.mode.toString(8); // Convert to octal string
entries.push({
mode: mode,
name: name,
hash: child.hash
});
}
}
// Sort entries (Git sorts directories without trailing slash)
entries.sort((a, b) => {
const aName = a.mode === '40000' ? a.name + '/' : a.name;
const bName = b.mode === '40000' ? b.name + '/' : b.name;
return aName.localeCompare(bName);
});
// Serialize tree
const chunks = [];
for (const entry of entries) {
// Format: "{mode} {name}\0{20-byte-binary-hash}"
const modeAndName = Buffer.from(`${entry.mode} ${entry.name}\0`);
const hashBytes = Buffer.from(entry.hash, 'hex');
chunks.push(modeAndName, hashBytes);
}
const treeContent = Buffer.concat(chunks);
const { hash, data } = hashObject(treeContent, 'tree');
writeObject(gitDir, hash, data);
return hash;
}
module.exports = { buildTree };
Step 2: Implement the Commit Command
src/commands/commit.js
Copy
const fs = require('fs');
const path = require('path');
const { requireGitDir, getRepoRoot } = require('../utils/paths');
const { hashObject, writeObject } = require('../utils/objects');
const { Index } = require('../utils/index');
const { buildTree } = require('../utils/tree');
/**
* commit - Record changes to the repository
*
* Usage:
* mygit commit -m "message"
* mygit commit --author "Name <email>"
*/
function execute(args) {
const gitDir = requireGitDir();
const index = Index.read(gitDir);
// Check if there's anything to commit
const entries = index.getEntries();
if (entries.length === 0) {
throw new Error('nothing to commit (create/copy files and use "mygit add" to track)');
}
// Parse arguments
let message = null;
let author = getDefaultAuthor();
for (let i = 0; i < args.length; i++) {
const arg = args[i];
if (arg === '-m' || arg === '--message') {
message = args[++i];
} else if (arg === '--author') {
author = parseAuthor(args[++i]);
}
}
if (!message) {
throw new Error('Aborting commit due to empty commit message.');
}
// Build tree from index
const treeHash = buildTree(entries, gitDir);
// Get parent commit (if any)
const parent = getHead(gitDir);
// Create commit object
const timestamp = Math.floor(Date.now() / 1000);
const timezone = getTimezone();
const lines = [];
lines.push(`tree ${treeHash}`);
if (parent) {
lines.push(`parent ${parent}`);
}
lines.push(`author ${author.name} <${author.email}> ${timestamp} ${timezone}`);
lines.push(`committer ${author.name} <${author.email}> ${timestamp} ${timezone}`);
lines.push('');
lines.push(message);
const commitContent = lines.join('\n');
const { hash, data } = hashObject(commitContent, 'commit');
writeObject(gitDir, hash, data);
// Update HEAD/branch ref
updateHead(gitDir, hash);
// Print result
const branch = getCurrentBranch(gitDir);
const shortHash = hash.slice(0, 7);
const isRoot = !parent;
console.log(`[${branch}${isRoot ? ' (root-commit)' : ''} ${shortHash}] ${message}`);
console.log(` ${entries.length} file(s) changed`);
}
/**
* Get current HEAD commit hash
*/
function getHead(gitDir) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
if (headContent.startsWith('ref: ')) {
const refPath = headContent.slice(5);
const refFile = path.join(gitDir, refPath);
if (fs.existsSync(refFile)) {
return fs.readFileSync(refFile, 'utf8').trim();
}
return null;
}
return headContent;
}
/**
* Update HEAD to point to new commit
*/
function updateHead(gitDir, commitHash) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
if (headContent.startsWith('ref: ')) {
// Update the branch ref
const refPath = headContent.slice(5);
const refFile = path.join(gitDir, refPath);
// Ensure directory exists
const refDir = path.dirname(refFile);
if (!fs.existsSync(refDir)) {
fs.mkdirSync(refDir, { recursive: true });
}
fs.writeFileSync(refFile, commitHash + '\n');
} else {
// Detached HEAD: update HEAD directly
fs.writeFileSync(headPath, commitHash + '\n');
}
}
/**
* Get current branch name
*/
function getCurrentBranch(gitDir) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
if (headContent.startsWith('ref: refs/heads/')) {
return headContent.slice('ref: refs/heads/'.length);
}
return 'HEAD';
}
/**
* Get default author from environment or config
*/
function getDefaultAuthor() {
// Try environment variables first
const name = process.env.GIT_AUTHOR_NAME ||
process.env.USER ||
'Unknown';
const email = process.env.GIT_AUTHOR_EMAIL ||
`${name.toLowerCase().replace(/\s+/g, '.')}@localhost`;
return { name, email };
}
/**
* Parse author string "Name <email>"
*/
function parseAuthor(str) {
const match = str.match(/^(.+?)\s*<(.+)>$/);
if (!match) {
throw new Error(`Invalid author format: ${str}`);
}
return { name: match[1], email: match[2] };
}
/**
* Get timezone offset string (+0000 format)
*/
function getTimezone() {
const offset = new Date().getTimezoneOffset();
const sign = offset <= 0 ? '+' : '-';
const hours = Math.floor(Math.abs(offset) / 60).toString().padStart(2, '0');
const minutes = (Math.abs(offset) % 60).toString().padStart(2, '0');
return `${sign}${hours}${minutes}`;
}
module.exports = { execute };
Step 3: Implement the Log Command
src/commands/log.js
Copy
const fs = require('fs');
const path = require('path');
const { requireGitDir } = require('../utils/paths');
const { readObject } = require('../utils/objects');
/**
* log - Show commit history
*
* Usage:
* mygit log # Show full history
* mygit log --oneline # Compact format
* mygit log -n 5 # Limit to 5 commits
*/
function execute(args) {
const gitDir = requireGitDir();
// Parse arguments
let oneline = false;
let limit = Infinity;
for (let i = 0; i < args.length; i++) {
const arg = args[i];
if (arg === '--oneline') {
oneline = true;
} else if (arg === '-n') {
limit = parseInt(args[++i], 10);
}
}
// Get starting commit
const head = getHead(gitDir);
if (!head) {
console.log('fatal: your current branch has no commits yet');
return;
}
// Walk commit history
let currentHash = head;
let count = 0;
while (currentHash && count < limit) {
const commit = parseCommit(gitDir, currentHash);
if (oneline) {
printCommitOneline(currentHash, commit);
} else {
printCommitFull(currentHash, commit, count > 0);
}
// Move to parent
currentHash = commit.parents[0] || null;
count++;
}
}
/**
* Get HEAD commit hash
*/
function getHead(gitDir) {
const headPath = path.join(gitDir, 'HEAD');
const headContent = fs.readFileSync(headPath, 'utf8').trim();
if (headContent.startsWith('ref: ')) {
const refPath = headContent.slice(5);
const refFile = path.join(gitDir, refPath);
if (fs.existsSync(refFile)) {
return fs.readFileSync(refFile, 'utf8').trim();
}
return null;
}
return headContent;
}
/**
* Parse a commit object
*/
function parseCommit(gitDir, hash) {
const { content } = readObject(gitDir, hash);
const text = content.toString();
const lines = text.split('\n');
const commit = {
tree: null,
parents: [],
author: null,
committer: null,
message: ''
};
let i = 0;
// Parse headers
while (i < lines.length && lines[i] !== '') {
const line = lines[i];
if (line.startsWith('tree ')) {
commit.tree = line.slice(5);
} else if (line.startsWith('parent ')) {
commit.parents.push(line.slice(7));
} else if (line.startsWith('author ')) {
commit.author = parseAuthorLine(line.slice(7));
} else if (line.startsWith('committer ')) {
commit.committer = parseAuthorLine(line.slice(10));
}
i++;
}
// Skip blank line
i++;
// Rest is the message
commit.message = lines.slice(i).join('\n');
return commit;
}
/**
* Parse author line: "Name <email> timestamp timezone"
*/
function parseAuthorLine(line) {
const match = line.match(/^(.+?) <(.+?)> (\d+) ([+-]\d{4})$/);
if (!match) {
return { name: 'Unknown', email: '', timestamp: 0, timezone: '+0000' };
}
return {
name: match[1],
email: match[2],
timestamp: parseInt(match[3], 10),
timezone: match[4]
};
}
/**
* Print commit in oneline format
*/
function printCommitOneline(hash, commit) {
const shortHash = hash.slice(0, 7);
const message = commit.message.split('\n')[0]; // First line only
console.log(`\x1b[33m${shortHash}\x1b[0m ${message}`);
}
/**
* Print commit in full format
*/
function printCommitFull(hash, commit, showSeparator) {
if (showSeparator) {
console.log();
}
console.log(`\x1b[33mcommit ${hash}\x1b[0m`);
if (commit.author) {
const date = formatDate(commit.author.timestamp, commit.author.timezone);
console.log(`Author: ${commit.author.name} <${commit.author.email}>`);
console.log(`Date: ${date}`);
}
console.log();
// Indent message
const lines = commit.message.split('\n');
for (const line of lines) {
console.log(` ${line}`);
}
}
/**
* Format timestamp to human-readable date
*/
function formatDate(timestamp, timezone) {
const date = new Date(timestamp * 1000);
const days = ['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'];
const months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'];
const dayName = days[date.getUTCDay()];
const month = months[date.getUTCMonth()];
const day = date.getUTCDate();
const hours = date.getUTCHours().toString().padStart(2, '0');
const minutes = date.getUTCMinutes().toString().padStart(2, '0');
const seconds = date.getUTCSeconds().toString().padStart(2, '0');
const year = date.getUTCFullYear();
return `${dayName} ${month} ${day} ${hours}:${minutes}:${seconds} ${year} ${timezone}`;
}
module.exports = { execute };
Step 4: Update CLI Entry Point
src/mygit.js
Copy
#!/usr/bin/env node
const commands = {
init: require('./commands/init'),
'hash-object': require('./commands/hashObject'),
'cat-file': require('./commands/catFile'),
add: require('./commands/add'),
status: require('./commands/status'),
commit: require('./commands/commit'),
log: require('./commands/log'),
};
function main() {
const args = process.argv.slice(2);
if (args.length === 0) {
console.log('usage: mygit <command> [<args>]');
console.log('\nAvailable commands:');
console.log(' init Initialize a new repository');
console.log(' hash-object Compute object ID');
console.log(' cat-file Show object content');
console.log(' add Add files to staging area');
console.log(' status Show working tree status');
console.log(' commit Record changes to repository');
console.log(' log Show commit history');
process.exit(1);
}
const command = args[0];
const commandArgs = args.slice(1);
if (!commands[command]) {
console.error(`mygit: '${command}' is not a mygit command.`);
process.exit(1);
}
try {
commands[command].execute(commandArgs);
} catch (error) {
console.error(error.message);
process.exit(1);
}
}
main();
Testing Your Implementation
Copy
# Initialize and create files
mygit init
echo "# My Project" > README.md
echo "console.log('hello');" > index.js
# Stage and commit
mygit add README.md index.js
mygit commit -m "Initial commit"
# Output: [master (root-commit) abc1234] Initial commit
# Make changes
echo "More content" >> README.md
mygit add README.md
mygit commit -m "Update README"
# View history
mygit log
# Shows both commits with full details
mygit log --oneline
# abc1234 Update README
# def5678 Initial commit
How Git Optimizes
Deduplication
Deduplication
If you have the same file in multiple commits, Git stores it only once. The tree just points to the same blob hash.
Copy
Commit 1 tree → README.md → blob abc123
Commit 2 tree → README.md → blob abc123 ← Same blob!
Pack Files
Pack Files
Git periodically packs objects into
.pack files with delta compression. Similar files are stored as deltas from a base object.We won’t implement this, but it’s why Git is so space-efficient.Commit Caching
Commit Caching
Real Git caches parsed commits in memory. For our implementation, we re-read each commit, which is fine for learning.
Exercises
Exercise 1: Add --graph flag to log
Exercise 1: Add --graph flag to log
Show branch structure visually:
Copy
* abc1234 Merge feature branch
|\
| * def5678 Add feature
* | 789abcd Fix bug
|/
* 456789a Initial commit
Exercise 2: Implement diff between commits
Exercise 2: Implement diff between commits
Show what changed between two commits:
Copy
mygit diff abc123..def456
Exercise 3: Handle merge commits
Exercise 3: Handle merge commits
Extend log to handle commits with multiple parents:
Copy
// In parseCommit, parents is already an array
// Extend log to show all parent hashes
// Handle walking multiple branches
Key Takeaways
Commits are Snapshots
Each commit points to a complete tree (directory snapshot), not diffs
History is a DAG
Commits form a directed acyclic graph via parent pointers
Branches are Pointers
A branch is just a file containing a commit hash
Trees are Nested
Tree objects contain blobs and other trees, creating directory structure
Further Reading
DSA: Graph Algorithms
Understand graph traversal for commit history
DSA: Trees
Learn tree traversal for directory structures
What’s Next?
In Chapter 5: Branches & Checkout, we’ll implement:- The
branchcommand - The
checkoutcommand - Switching between branches
- Detached HEAD state
Next: Branches & Checkout
Learn how Git manages and switches between branches