Skip to main content

Chapter 2: The Object Model

Git’s object model is its most elegant design. In this chapter, we’ll implement the foundation of Git: content-addressable storage using SHA-1 hashing.
Prerequisites: Completed Chapter 1: Setup & Init
Time: 2-3 hours
Outcome: Working hash-object and cat-file commands

The Core Insight

Git stores everything as objects in a content-addressable store. The “address” (filename) is derived from the content itself using SHA-1 hashing.
Content: "Hello, World!"

SHA-1("blob 13\0Hello, World!")

Hash: 5dd01c177f5d7d1be5346a5bc18a569a7410c2ef

Stored at: .git/objects/5d/d01c177f5d7d1be5346a5bc18a569a7410c2ef
Why is this brilliant?
  • Same content = same hash = stored only once (deduplication!)
  • Hash acts as a checksum (data integrity)
  • Immutable objects (can’t change content without changing address)

The Three Object Types

┌─────────────────────────────────────────────────────────────────────────────┐
│                          GIT OBJECT TYPES                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   BLOB                    TREE                      COMMIT                   │
│   ────                    ────                      ──────                   │
│   Raw file content        Directory snapshot        Project snapshot         │
│                                                                              │
│   ┌─────────────┐        ┌─────────────────┐       ┌──────────────────┐     │
│   │ Hello World │        │ 100644 file.txt │       │ tree d8329fc...  │     │
│   │             │        │ → blob abc123   │       │ parent 5a6f32... │     │
│   └─────────────┘        │ 040000 src/     │       │ author Jane      │     │
│         ↓                │ → tree def456   │       │ committer Jane   │     │
│   SHA-1 of content       └─────────────────┘       │                  │     │
│                                ↓                   │ Initial commit   │     │
│                          SHA-1 of entries          └──────────────────┘     │
│                                                            ↓                │
│                                                    SHA-1 of metadata        │
│                                                                              │
│   ALL OBJECTS: header + content → SHA-1 → stored compressed                 │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation

Step 1: SHA-1 Hashing Utility

Create the core hashing function that all objects will use:
src/utils/objects.js
const crypto = require('crypto');
const zlib = require('zlib');
const fs = require('fs');
const path = require('path');

/**
 * Compute SHA-1 hash for a Git object
 * 
 * Git object format: "{type} {size}\0{content}"
 * - type: blob, tree, commit, or tag
 * - size: content length in bytes
 * - \0: null byte separator
 * - content: the actual data
 * 
 * @param {Buffer|string} content - The object content
 * @param {string} type - Object type (blob, tree, commit)
 * @returns {{hash: string, data: Buffer}} - Hash and full object data
 */
function hashObject(content, type = 'blob') {
    // Convert string to buffer if needed
    const contentBuffer = Buffer.isBuffer(content) 
        ? content 
        : Buffer.from(content);
    
    // Create header: "{type} {size}\0"
    const header = Buffer.from(`${type} ${contentBuffer.length}\0`);
    
    // Full object = header + content
    const fullObject = Buffer.concat([header, contentBuffer]);
    
    // Compute SHA-1 hash
    const hash = crypto
        .createHash('sha1')
        .update(fullObject)
        .digest('hex');
    
    return { hash, data: fullObject };
}

/**
 * Write an object to the Git object store
 * 
 * @param {string} gitDir - Path to .git directory
 * @param {string} hash - 40-character hex hash
 * @param {Buffer} data - Full object data (header + content)
 * @returns {string} - The hash
 */
function writeObject(gitDir, hash, data) {
    // Objects stored at: .git/objects/{first 2 chars}/{remaining 38 chars}
    const objectDir = path.join(gitDir, 'objects', hash.slice(0, 2));
    const objectPath = path.join(objectDir, hash.slice(2));
    
    // Don't write if already exists (content-addressable = immutable)
    if (fs.existsSync(objectPath)) {
        return hash;
    }
    
    // Create subdirectory if needed
    if (!fs.existsSync(objectDir)) {
        fs.mkdirSync(objectDir, { recursive: true });
    }
    
    // Compress with zlib and write
    const compressed = zlib.deflateSync(data);
    fs.writeFileSync(objectPath, compressed);
    
    return hash;
}

/**
 * Read an object from the Git object store
 * 
 * @param {string} gitDir - Path to .git directory
 * @param {string} hash - 40-character hex hash
 * @returns {{type: string, size: number, content: Buffer}}
 */
function readObject(gitDir, hash) {
    const objectPath = path.join(
        gitDir, 
        'objects', 
        hash.slice(0, 2), 
        hash.slice(2)
    );
    
    if (!fs.existsSync(objectPath)) {
        throw new Error(`fatal: Not a valid object name ${hash}`);
    }
    
    // Read and decompress
    const compressed = fs.readFileSync(objectPath);
    const data = zlib.inflateSync(compressed);
    
    // Parse header: find null byte
    const nullIndex = data.indexOf(0);
    const header = data.slice(0, nullIndex).toString();
    const content = data.slice(nullIndex + 1);
    
    // Parse header: "{type} {size}"
    const [type, sizeStr] = header.split(' ');
    const size = parseInt(sizeStr, 10);
    
    // Verify size matches
    if (content.length !== size) {
        throw new Error(`Object ${hash} is corrupted`);
    }
    
    return { type, size, content };
}

module.exports = {
    hashObject,
    writeObject,
    readObject
};

Step 2: Implement hash-object Command

The hash-object command computes the hash of a file and optionally stores it:
src/commands/hashObject.js
const fs = require('fs');
const path = require('path');
const { hashObject, writeObject } = require('../utils/objects');
const { findGitDir } = require('../utils/paths');

/**
 * hash-object - Compute object ID and optionally store it
 * 
 * Usage:
 *   mygit hash-object <file>           # Just compute hash
 *   mygit hash-object -w <file>        # Compute and write to object store
 *   mygit hash-object -t <type> <file> # Specify type (default: blob)
 *   mygit hash-object --stdin          # Read from stdin
 */
function execute(args) {
    // Parse options
    let write = false;
    let type = 'blob';
    let useStdin = false;
    const files = [];
    
    for (let i = 0; i < args.length; i++) {
        const arg = args[i];
        
        if (arg === '-w') {
            write = true;
        } else if (arg === '-t') {
            type = args[++i];
            if (!['blob', 'tree', 'commit', 'tag'].includes(type)) {
                throw new Error(`invalid object type "${type}"`);
            }
        } else if (arg === '--stdin') {
            useStdin = true;
        } else if (!arg.startsWith('-')) {
            files.push(arg);
        }
    }
    
    // Handle stdin
    if (useStdin) {
        // Read all stdin synchronously
        const chunks = [];
        const fd = fs.openSync(0, 'r'); // stdin
        const buffer = Buffer.alloc(1024);
        let bytesRead;
        
        while ((bytesRead = fs.readSync(fd, buffer, 0, buffer.length)) > 0) {
            chunks.push(buffer.slice(0, bytesRead));
        }
        
        const content = Buffer.concat(chunks);
        processContent(content, type, write);
        return;
    }
    
    // Handle files
    if (files.length === 0) {
        throw new Error('no file specified');
    }
    
    for (const file of files) {
        if (!fs.existsSync(file)) {
            throw new Error(`fatal: could not open '${file}' for reading`);
        }
        
        const content = fs.readFileSync(file);
        processContent(content, type, write);
    }
}

function processContent(content, type, write) {
    const { hash, data } = hashObject(content, type);
    
    if (write) {
        const gitDir = findGitDir();
        if (!gitDir) {
            throw new Error('fatal: not a git repository');
        }
        writeObject(gitDir, hash, data);
    }
    
    console.log(hash);
}

module.exports = { execute };

Step 3: Implement cat-file Command

The cat-file command reads objects from the store:
src/commands/catFile.js
const { readObject } = require('../utils/objects');
const { requireGitDir } = require('../utils/paths');

/**
 * cat-file - Provide content or type of repository objects
 * 
 * Usage:
 *   mygit cat-file -t <hash>   # Show object type
 *   mygit cat-file -s <hash>   # Show object size
 *   mygit cat-file -p <hash>   # Pretty-print object content
 *   mygit cat-file <type> <hash>  # Show content, expecting type
 */
function execute(args) {
    if (args.length < 2) {
        throw new Error('usage: mygit cat-file (-t | -s | -p | <type>) <object>');
    }
    
    const gitDir = requireGitDir();
    const option = args[0];
    const hash = resolveHash(gitDir, args[1]);
    
    const { type, size, content } = readObject(gitDir, hash);
    
    switch (option) {
        case '-t':
            // Show type
            console.log(type);
            break;
            
        case '-s':
            // Show size
            console.log(size);
            break;
            
        case '-p':
            // Pretty-print based on type
            prettyPrint(type, content);
            break;
            
        default:
            // Expect specific type
            if (type !== option) {
                throw new Error(`fatal: git cat-file: expected ${option}, got ${type}`);
            }
            process.stdout.write(content);
    }
}

/**
 * Resolve a potentially abbreviated hash to full hash
 */
function resolveHash(gitDir, partialHash) {
    // If it's already 40 chars, return as-is
    if (partialHash.length === 40) {
        return partialHash;
    }
    
    // For abbreviated hashes, we need at least 4 chars
    if (partialHash.length < 4) {
        throw new Error('fatal: too short object hash');
    }
    
    const fs = require('fs');
    const path = require('path');
    
    const prefix = partialHash.slice(0, 2);
    const rest = partialHash.slice(2);
    const objectDir = path.join(gitDir, 'objects', prefix);
    
    if (!fs.existsSync(objectDir)) {
        throw new Error(`fatal: Not a valid object name ${partialHash}`);
    }
    
    const matches = fs.readdirSync(objectDir)
        .filter(name => name.startsWith(rest));
    
    if (matches.length === 0) {
        throw new Error(`fatal: Not a valid object name ${partialHash}`);
    }
    
    if (matches.length > 1) {
        throw new Error(`fatal: ambiguous argument '${partialHash}'`);
    }
    
    return prefix + matches[0];
}

/**
 * Pretty-print object content based on type
 */
function prettyPrint(type, content) {
    switch (type) {
        case 'blob':
            // Blob: just print content
            process.stdout.write(content);
            break;
            
        case 'tree':
            // Tree: formatted entries
            printTree(content);
            break;
            
        case 'commit':
            // Commit: print as-is (it's already text)
            process.stdout.write(content);
            break;
            
        default:
            process.stdout.write(content);
    }
}

/**
 * Parse and print tree entries
 */
function printTree(content) {
    let offset = 0;
    
    while (offset < content.length) {
        // Find space (separates mode from name)
        const spaceIndex = content.indexOf(0x20, offset);
        const mode = content.slice(offset, spaceIndex).toString();
        
        // Find null byte (separates name from hash)
        const nullIndex = content.indexOf(0, spaceIndex);
        const name = content.slice(spaceIndex + 1, nullIndex).toString();
        
        // Next 20 bytes are the SHA-1 hash
        const hashBytes = content.slice(nullIndex + 1, nullIndex + 21);
        const hash = hashBytes.toString('hex');
        
        // Determine type from mode
        const typeStr = mode === '40000' ? 'tree' : 'blob';
        
        // Print in git's format: {mode} {type} {hash}\t{name}
        console.log(`${mode.padStart(6, '0')} ${typeStr} ${hash}\t${name}`);
        
        offset = nullIndex + 21;
    }
}

module.exports = { execute };

Step 4: Update CLI Entry Point

src/mygit.js
#!/usr/bin/env node

const commands = {
    init: require('./commands/init'),
    'hash-object': require('./commands/hashObject'),
    'cat-file': require('./commands/catFile'),
};

function main() {
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init          Initialize a new repository');
        console.log('   hash-object   Compute object ID and optionally creates a blob');
        console.log('   cat-file      Provide content or type of repository objects');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(`${error.message}`);
        process.exit(1);
    }
}

main();

Testing Your Implementation

Test hash-object

# Create a test file
echo "Hello, Git!" > test.txt

# Hash without storing
mygit hash-object test.txt
# Expected: d670460b4b4aece5915caf5c68d12f560a9fe3e4

# Hash and store
mygit hash-object -w test.txt

# Verify it was stored
ls .git/objects/d6/
# Should show: 70460b4b4aece5915caf5c68d12f560a9fe3e4

Test cat-file

# Store an object first
mygit hash-object -w test.txt
# Returns: d670460b4b4aece5915caf5c68d12f560a9fe3e4

# Read back the content
mygit cat-file -p d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: Hello, Git!

# Check the type
mygit cat-file -t d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: blob

# Check the size
mygit cat-file -s d670460b4b4aece5915caf5c68d12f560a9fe3e4
# Expected: 12

Compare with Real Git

# Use real git
git hash-object test.txt
# Should match your implementation!

Understanding Blob Storage

Let’s trace through what happens when you store “Hello”:
// Input
content = "Hello"

// Step 1: Create header
header = "blob 5\0"  // type + space + size + null byte

// Step 2: Concatenate
fullObject = "blob 5\0Hello"

// Step 3: SHA-1 hash
hash = sha1("blob 5\0Hello") = "5ab2f8a4323abafb10abb68657d46c..."

// Step 4: Compress with zlib
compressed = zlib.deflate("blob 5\0Hello")

// Step 5: Store at path
path = ".git/objects/5a/b2f8a4323abafb10abb68657d46c..."
Including type and size in the hashed data:
  1. Prevents collisions: A blob “tree 100” won’t have the same hash as an actual tree
  2. Enables verification: We can check the stored size matches actual content
  3. Self-describing: The object header tells us what it is

Exercises

Implement reading from stdin for hash-object:
echo "Hello" | mygit hash-object --stdin
Hint: Use fs.readFileSync(0, 'utf8') to read from stdin (file descriptor 0).
Create a utility to build tree objects:
// A tree entry is: "{mode} {name}\0{20-byte-binary-sha}"
// Modes: 100644 (regular file), 100755 (executable), 040000 (directory)

function createTree(entries) {
    // entries: [{mode, name, hash}, ...]
    // Return the hash of the tree object
}
Enhance cat-file -p to nicely format tree objects:
100644 blob a1b2c3d4... file.txt
040000 tree e5f6g7h8... src
Hint: Parse the binary tree format and detect type from mode.

Key Concepts Review

Content-Addressable

Objects are stored by their SHA-1 hash. Same content = same hash = same storage location.

Immutable Objects

Once written, objects never change. Changing content would change the hash.

Compression

All objects are zlib-compressed. Git is surprisingly space-efficient.

Object Format

Every object: {type} {size}\0{content} - simple and consistent.

Further Reading


What’s Next?

In Chapter 3: Staging & Index, we’ll implement:
  • The index (staging area) file format
  • The add command
  • The status command

Next: Staging & Index

Learn how Git’s staging area works