Skip to main content
Git Object Model

Build Your Own Git

Target Audience: Students and Junior Engineers
Language: JavaScript (with Java & Go alternatives)
Duration: 2-3 weeks
Difficulty: ⭐⭐⭐☆☆

Why Build Git?

Git is the most ubiquitous tool in software development, yet most developers treat it as magic. By building your own Git, you’ll:
  • Understand content-addressable storage — the elegant idea behind Git’s object model
  • Master hashing and cryptography basics — SHA-1 in practice
  • Learn tree data structures — directories as trees, commits as a DAG
  • Build a real CLI tool — practical software engineering
This is NOT a tutorial on using Git. This is about understanding Git’s internals deeply enough to reimplement them.

Git’s Beautiful Architecture

Git Object Model
┌─────────────────────────────────────────────────────────────────────────────┐
│                           GIT OBJECT MODEL                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   BLOB                    TREE                      COMMIT                   │
│   ─────                   ─────                     ──────                   │
│   File contents           Directory listing         Snapshot + metadata      │
│   SHA-1 of content        Points to blobs/trees     Points to tree + parent │
│                                                                              │
│   ┌───────────┐          ┌───────────────────┐     ┌───────────────────┐    │
│   │ Hello     │          │ 100644 hello.txt  │     │ tree abc123       │    │
│   │ World     │          │ 040000 src/       │     │ parent def456     │    │
│   └───────────┘          └───────────────────┘     │ author John       │    │
│        │                        │                   │ message: Initial  │    │
│        └────────────────────────┴───────────────────┘                        │
│                                                                              │
│   ALL OBJECTS ARE CONTENT-ADDRESSED:                                        │
│   SHA-1(type + size + content) → 40-character hex string                    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

What You’ll Build

Core Commands

CommandDescriptionConcepts Learned
initInitialize repositoryFile structure, .git directory
hash-objectHash and store a fileSHA-1, zlib compression
cat-fileRead object contentsObject types, decompression
addStage filesIndex file format
commitCreate commit objectTree building, commit linking
logShow commit historyDAG traversal
statusShow working tree statusDiff against index
branchManage branchesRefs, symbolic refs
checkoutSwitch branchesUpdating HEAD, working tree

Implementation: JavaScript

Project Structure

mygit/
├── src/
│   ├── commands/
│   │   ├── init.js
│   │   ├── hashObject.js
│   │   ├── catFile.js
│   │   ├── add.js
│   │   ├── commit.js
│   │   ├── log.js
│   │   ├── status.js
│   │   ├── branch.js
│   │   └── checkout.js
│   ├── objects/
│   │   ├── blob.js
│   │   ├── tree.js
│   │   └── commit.js
│   ├── utils/
│   │   ├── hash.js
│   │   ├── compression.js
│   │   ├── index.js
│   │   └── refs.js
│   └── mygit.js
├── package.json
└── README.md

Core Implementation

const crypto = require('crypto');
const zlib = require('zlib');
const fs = require('fs');
const path = require('path');

/**
 * Compute SHA-1 hash of content with Git's format
 * Git format: "{type} {size}\0{content}"
 */
function hashObject(content, type = 'blob') {
    const header = `${type} ${content.length}\0`;
    const store = Buffer.concat([Buffer.from(header), Buffer.from(content)]);
    const hash = crypto.createHash('sha1').update(store).digest('hex');
    return { hash, store };
}

/**
 * Write object to .git/objects/{hash[0:2]}/{hash[2:40]}
 */
function writeObject(gitDir, hash, store) {
    const objectDir = path.join(gitDir, 'objects', hash.slice(0, 2));
    const objectPath = path.join(objectDir, hash.slice(2));
    
    if (!fs.existsSync(objectDir)) {
        fs.mkdirSync(objectDir, { recursive: true });
    }
    
    // Git stores objects compressed with zlib
    const compressed = zlib.deflateSync(store);
    fs.writeFileSync(objectPath, compressed);
    
    return hash;
}

/**
 * Read object from .git/objects
 */
function readObject(gitDir, hash) {
    const objectPath = path.join(
        gitDir, 'objects', 
        hash.slice(0, 2), 
        hash.slice(2)
    );
    
    if (!fs.existsSync(objectPath)) {
        throw new Error(`Object ${hash} not found`);
    }
    
    const compressed = fs.readFileSync(objectPath);
    const store = zlib.inflateSync(compressed);
    
    // Parse header: "{type} {size}\0{content}"
    const nullIndex = store.indexOf(0);
    const header = store.slice(0, nullIndex).toString();
    const [type, size] = header.split(' ');
    const content = store.slice(nullIndex + 1);
    
    return { type, size: parseInt(size), content };
}

module.exports = { hashObject, writeObject, readObject };

Exercises

Level 1: Basic Understanding

  1. Initialize a repository and create a blob manually
  2. Understand how SHA-1 hashing creates content-addressable storage
  3. Create a commit and inspect its structure with cat-file

Level 2: Core Implementation

  1. Implement the status command (compare index to working tree)
  2. Add support for .gitignore patterns
  3. Implement diff to show changes between commits

Level 3: Advanced Features

  1. Implement merge (fast-forward and three-way)
  2. Add remote repository support (fetch, push)
  3. Implement pack files for efficient storage

What You’ve Learned

Content-addressable storage using SHA-1 hashing
Tree data structures for representing directories
DAG (Directed Acyclic Graph) for commit history
Binary file formats (index file)
CLI tool development in JavaScript

Next Steps