Chapter 2: The Object Model
The Core Insight
The Three Object Types
Implementation
Step 1: SHA-1 Hashing Utility
Step 2: Implement hash-object Command
Step 3: Implement cat-file Command
Step 4: Update CLI Entry Point
Testing Your Implementation
Test hash-object
Test cat-file
Compare with Real Git
Understanding Blob Storage
Exercises
Key Concepts Review
Further Reading
What’s Next?

Chapter 2: The Object Model

Git’s object model is its most elegant design. In this chapter, we’ll implement the foundation of Git: content-addressable storage using SHA-1 hashing.

Prerequisites: Completed Chapter 1: Setup & Init
Time: 2-3 hours
Outcome: Working hash-object and cat-file commands

The Core Insight

Git stores everything as objects in a content-addressable store. The “address” (filename) is derived from the content itself using SHA-1 hashing.

Content: "Hello, World!"
   ↓
SHA-1("blob 13\0Hello, World!")
   ↓
Hash: 5dd01c177f5d7d1be5346a5bc18a569a7410c2ef
   ↓
Stored at: .git/objects/5d/d01c177f5d7d1be5346a5bc18a569a7410c2ef

Why is this brilliant?

Same content = same hash = stored only once (deduplication!)
Hash acts as a checksum (data integrity)
Immutable objects (can’t change content without changing address)

The Three Object Types

┌─────────────────────────────────────────────────────────────────────────────┐
│                          GIT OBJECT TYPES                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   BLOB                    TREE                      COMMIT                   │
│   ────                    ────                      ──────                   │
│   Raw file content        Directory snapshot        Project snapshot         │
│                                                                              │
│   ┌─────────────┐        ┌─────────────────┐       ┌──────────────────┐     │
│   │ Hello World │        │ 100644 file.txt │       │ tree d8329fc...  │     │
│   │             │        │ → blob abc123   │       │ parent 5a6f32... │     │
│   └─────────────┘        │ 040000 src/     │       │ author Jane      │     │
│         ↓                │ → tree def456   │       │ committer Jane   │     │
│   SHA-1 of content       └─────────────────┘       │                  │     │
│                                ↓                   │ Initial commit   │     │
│                          SHA-1 of entries          └──────────────────┘     │
│                                                            ↓                │
│                                                    SHA-1 of metadata        │
│                                                                              │
│   ALL OBJECTS: header + content → SHA-1 → stored compressed                 │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

Step 1: SHA-1 Hashing Utility

Create the core hashing function that all objects will use:

src/utils/objects.js

const crypto = require('crypto');
const zlib = require('zlib');
const fs = require('fs');
const path = require('path');

/**
 * Compute SHA-1 hash for a Git object
 * 
 * Git object format: "{type} {size}\0{content}"
 * - type: blob, tree, commit, or tag
 * - size: content length in bytes
 * - \0: null byte separator
 * - content: the actual data
 * 
 * @param {Buffer|string} content - The object content
 * @param {string} type - Object type (blob, tree, commit)
 * @returns {{hash: string, data: Buffer}} - Hash and full object data
 */
function hashObject(content, type = 'blob') {
    // Convert string to buffer if needed
    const contentBuffer = Buffer.isBuffer(content) 
        ? content 
        : Buffer.from(content);
    
    // Create header: "{type} {size}\0"
    const header = Buffer.from(`${type} ${contentBuffer.length}\0`);
    
    // Full object = header + content
    const fullObject = Buffer.concat([header, contentBuffer]);
    
    // Compute SHA-1 hash
    const hash = crypto
        .createHash('sha1')
        .update(fullObject)
        .digest('hex');
    
    return { hash, data: fullObject };
}

/**
 * Write an object to the Git object store
 * 
 * @param {string} gitDir - Path to .git directory
 * @param {string} hash - 40-character hex hash
 * @param {Buffer} data - Full object data (header + content)
 * @returns {string} - The hash
 */
function writeObject(gitDir, hash, data) {
    // Objects stored at: .git/objects/{first 2 chars}/{remaining 38 chars}
    const objectDir = path.join(gitDir, 'objects', hash.slice(0, 2));
    const objectPath = path.join(objectDir, hash.slice(2));
    
    // Don't write if already exists (content-addressable = immutable)
    if (fs.existsSync(objectPath)) {
        return hash;
    }
    
    // Create subdirectory if needed
    if (!fs.existsSync(objectDir)) {
        fs.mkdirSync(objectDir, { recursive: true });
    }
    
    // Compress with zlib and write
    const compressed = zlib.deflateSync(data);
    fs.writeFileSync(objectPath, compressed);
    
    return hash;
}

/**
 * Read an object from the Git object store
 * 
 * @param {string} gitDir - Path to .git directory
 * @param {string} hash - 40-character hex hash
 * @returns {{type: string, size: number, content: Buffer}}
 */
function readObject(gitDir, hash) {
    const objectPath = path.join(
        gitDir, 
        'objects', 
        hash.slice(0, 2), 
        hash.slice(2)
    );
    
    if (!fs.existsSync(objectPath)) {
        throw new Error(`fatal: Not a valid object name ${hash}`);
    }
    
    // Read and decompress
    const compressed = fs.readFileSync(objectPath);
    const data = zlib.inflateSync(compressed);
    
    // Parse header: find null byte
    const nullIndex = data.indexOf(0);
    const header = data.slice(0, nullIndex).toString();
    const content = data.slice(nullIndex + 1);
    
    // Parse header: "{type} {size}"
    const [type, sizeStr] = header.split(' ');
    const size = parseInt(sizeStr, 10);
    
    // Verify size matches
    if (content.length !== size) {
        throw new Error(`Object ${hash} is corrupted`);
    }
    
    return { type, size, content };
}

module.exports = {
    hashObject,
    writeObject,
    readObject
};

Step 2: Implement hash-object Command

The hash-object command computes the hash of a file and optionally stores it:

src/commands/hashObject.js

const fs = require('fs');
const path = require('path');
const { hashObject, writeObject } = require('../utils/objects');
const { findGitDir } = require('../utils/paths');

/**
 * hash-object - Compute object ID and optionally store it
 * 
 * Usage:
 *   mygit hash-object <file>           # Just compute hash
 *   mygit hash-object -w <file>        # Compute and write to object store
 *   mygit hash-object -t <type> <file> # Specify type (default: blob)
 *   mygit hash-object --stdin          # Read from stdin
 */
function execute(args) {
    // Parse options
    let write = false;
    let type = 'blob';
    let useStdin = false;
    const files = [];
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '-w') {
            write = true;
        } else if (arg === '-t') {
            type = args[++i];
            if (!['blob', 'tree', 'commit', 'tag'].includes(type)) {
                throw new Error(`invalid object type "${type}"`);
            }
        } else if (arg === '--stdin') {
            useStdin = true;
        } else if (!arg.startsWith('-')) {
            files.push(arg);
        }
    }
    
    // Handle stdin
    if (useStdin) {
        // Read all stdin synchronously
        const chunks = [];
        const fd = fs.openSync(0, 'r'); // stdin
        const buffer = Buffer.alloc(1024);
        let bytesRead;
        
        while ((bytesRead = fs.readSync(fd, buffer, 0, buffer.length)) > 0) {
            chunks.push(buffer.slice(0, bytesRead));
        }
        
        const content = Buffer.concat(chunks);
        processContent(content, type, write);
        return;
    }
    
    // Handle files
    if (files.length === 0) {
        throw new Error('no file specified');
    }
    
    for (const file of files) {
        if (!fs.existsSync(file)) {
            throw new Error(`fatal: could not open '${file}' for reading`);
        }
        
        const content = fs.readFileSync(file);
        processContent(content, type, write);
    }
}

function processContent(content, type, write) {
    const { hash, data } = hashObject(content, type);
    
    if (write) {
        const gitDir = findGitDir();
        if (!gitDir) {
            throw new Error('fatal: not a git repository');
        }
        writeObject(gitDir, hash, data);
    }
    
    console.log(hash);
}

module.exports = { execute };

Step 3: Implement cat-file Command

The cat-file command reads objects from the store:

src/commands/catFile.js

const { readObject } = require('../utils/objects');
const { requireGitDir } = require('../utils/paths');

/**
 * cat-file - Provide content or type of repository objects
 * 
 * Usage:
 *   mygit cat-file -t <hash>   # Show object type
 *   mygit cat-file -s <hash>   # Show object size
 *   mygit cat-file -p <hash>   # Pretty-print object content
 *   mygit cat-file <type> <hash>  # Show content, expecting type
 */
function execute(args) {
    if (args.length < 2) {
        throw new Error('usage: mygit cat-file (-t | -s | -p | <type>) <object>');
    }
    
    const gitDir = requireGitDir();
    const option = args[0];
    const hash = resolveHash(gitDir, args[1]);
    
    const { type, size, content } = readObject(gitDir, hash);
    
    switch (option) {
        case '-t':
            // Show type
            console.log(type);
            break;
            
        case '-s':
            // Show size
            console.log(size);
            break;
            
        case '-p':
            // Pretty-print based on type
            prettyPrint(type, content);
            break;
            
        default:
            // Expect specific type
            if (type !== option) {
                throw new Error(`fatal: git cat-file: expected ${option}, got ${type}`);
            }
            process.stdout.write(content);
    }
}

/**
 * Resolve a potentially abbreviated hash to full hash
 */
function resolveHash(gitDir, partialHash) {
    // If it's already 40 chars, return as-is
    if (partialHash.length === 40) {
        return partialHash;
    }
    
    // For abbreviated hashes, we need at least 4 chars
    if (partialHash.length < 4) {
        throw new Error('fatal: too short object hash');
    }
    
    const fs = require('fs');
    const path = require('path');
    
    const prefix = partialHash.slice(0, 2);
    const rest = partialHash.slice(2);
    const objectDir = path.join(gitDir, 'objects', prefix);
    
    if (!fs.existsSync(objectDir)) {
        throw new Error(`fatal: Not a valid object name ${partialHash}`);
    }
    
    const matches = fs.readdirSync(objectDir)
        .filter(name => name.startsWith(rest));
    
    if (matches.length === 0) {
        throw new Error(`fatal: Not a valid object name ${partialHash}`);
    }
    
    if (matches.length > 1) {
        throw new Error(`fatal: ambiguous argument '${partialHash}'`);
    }
    
    return prefix + matches[0];
}

/**
 * Pretty-print object content based on type
 */
function prettyPrint(type, content) {
    switch (type) {
        case 'blob':
            // Blob: just print content
            process.stdout.write(content);
            break;
            
        case 'tree':
            // Tree: formatted entries
            printTree(content);
            break;
            
        case 'commit':
            // Commit: print as-is (it's already text)
            process.stdout.write(content);
            break;
            
        default:
            process.stdout.write(content);
    }
}

/**
 * Parse and print tree entries
 */
function printTree(content) {
    let offset = 0;
    
    while (offset < content.length) {
        // Find space (separates mode from name)
        const spaceIndex = content.indexOf(0x20, offset);
        const mode = content.slice(offset, spaceIndex).toString();
        
        // Find null byte (separates name from hash)
        const nullIndex = content.indexOf(0, spaceIndex);
        const name = content.slice(spaceIndex + 1, nullIndex).toString();
        
        // Next 20 bytes are the SHA-1 hash
        const hashBytes = content.slice(nullIndex + 1, nullIndex + 21);
        const hash = hashBytes.toString('hex');
        
        // Determine type from mode
        const typeStr = mode === '40000' ? 'tree' : 'blob';
        
        // Print in git's format: {mode} {type} {hash}\t{name}
        console.log(`${mode.padStart(6, '0')} ${typeStr} ${hash}\t${name}`);
        
        offset = nullIndex + 21;
    }
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js

#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
};

function main() {
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init          Initialize a new repository');
        console.log('   hash-object   Compute object ID and optionally creates a blob');
        console.log('   cat-file      Provide content or type of repository objects');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(`${error.message}`);
        process.exit(1);
    }
}

main();

Testing Your Implementation

Test hash-object

# Create a test file
echo "Hello, Git!" > test.txt

# Hash without storing
mygit hash-object test.txt
# Expected: d670460b4b4aece5915caf5c68d12f560a9fe3e4

# Hash and store
mygit hash-object -w test.txt

# Verify it was stored
ls .git/objects/d6/
# Should show: 70460b4b4aece5915caf5c68d12f560a9fe3e4

Test cat-file

# Store an object first
mygit hash-object -w test.txt
# Returns: d670460b4b4aece5915caf5c68d12f560a9fe3e4

# Read back the content
mygit cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: Hello, Git!

# Check the type
mygit cat-file -t d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: blob

# Check the size
mygit cat-file -s d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: 12

Compare with Real Git

# Use real git
git hash-object test.txt
# Should match your implementation!

Understanding Blob Storage

Let’s trace through what happens when you store “Hello”:

// Input
content = "Hello"

// Step 1: Create header
header = "blob 5\0"  // type + space + size + null byte

// Step 2: Concatenate
fullObject = "blob 5\0Hello"

// Step 3: SHA-1 hash
hash = sha1("blob 5\0Hello") = "5ab2f8a4323abafb10abb68657d46c..."

// Step 4: Compress with zlib
compressed = zlib.deflate("blob 5\0Hello")

// Step 5: Store at path
path = ".git/objects/5a/b2f8a4323abafb10abb68657d46c..."

Why include type and size in the hash?

Including type and size in the hashed data:

Prevents collisions: A blob “tree 100” won’t have the same hash as an actual tree
Enables verification: We can check the stored size matches actual content
Self-describing: The object header tells us what it is

Exercises

Exercise 1: Add stdin support

Implement reading from stdin for hash-object:

echo "Hello" | mygit hash-object --stdin

Hint: Use fs.readFileSync(0, 'utf8') to read from stdin (file descriptor 0).

Exercise 2: Implement tree objects

Create a utility to build tree objects:

// A tree entry is: "{mode} {name}\0{20-byte-binary-sha}"
// Modes: 100644 (regular file), 100755 (executable), 040000 (directory)

function createTree(entries) {
    // entries: [{mode, name, hash}, ...]
    // Return the hash of the tree object
}

Exercise 3: Pretty-print trees

Enhance cat-file -p to nicely format tree objects:

100644 blob a1b2c3d4... file.txt
040000 tree e5f6g7h8... src

Hint: Parse the binary tree format and detect type from mode.

Key Concepts Review

Content-Addressable

Objects are stored by their SHA-1 hash. Same content = same hash = same storage location.

Immutable Objects

Once written, objects never change. Changing content would change the hash.

Compression

All objects are zlib-compressed. Git is surprisingly space-efficient.

Object Format

Every object: {type} {size}\0{content} - simple and consistent.

DSA: Hash Maps

Understand the data structure behind content-addressable storage

Cryptography Basics

Learn more about SHA-1 and cryptographic hashing

What’s Next?

In Chapter 3: Staging & Index, we’ll implement:

The index (staging area) file format
The add command
The status command

Next: Staging & Index

Learn how Git’s staging area works

1. Setup & Init 3. Staging & Index

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Chapter 2: The Object Model

​The Core Insight

​The Three Object Types

​Implementation

​Step 1: SHA-1 Hashing Utility

​Step 2: Implement hash-object Command

​Step 3: Implement cat-file Command

​Step 4: Update CLI Entry Point

​Testing Your Implementation

​Test hash-object

​Test cat-file

​Compare with Real Git

​Understanding Blob Storage

​Exercises

​Key Concepts Review

Content-Addressable

Immutable Objects

Compression

Object Format

​Further Reading

DSA: Hash Maps

Cryptography Basics

​What’s Next?

Next: Staging & Index

Chapter 2: The Object Model

The Core Insight

The Three Object Types

Implementation

Step 1: SHA-1 Hashing Utility

Step 2: Implement hash-object Command

Step 3: Implement cat-file Command

Step 4: Update CLI Entry Point

Testing Your Implementation

Test hash-object

Test cat-file

Compare with Real Git

Understanding Blob Storage

Exercises

Key Concepts Review

Further Reading

What’s Next?