> ## Documentation Index > Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt > Use this file to discover all available pages before exploring further. # Chapter 2: The Object Model > Implement Git's content-addressable storage with blobs, trees, and SHA-1 hashing # Chapter 2: The Object Model Git's object model is its most elegant design. In this chapter, we'll implement the foundation of Git: content-addressable storage using SHA-1 hashing. Here is the key insight to carry with you: in most filesystems, the *name* of a file is arbitrary -- you choose it. In Git's object store, the name *is* the content. The file's address is a fingerprint derived from what is inside it. This is like a library where every book's call number is computed from the text itself -- if two copies of the same book arrive, they get the exact same call number and are stored only once. This single idea gives Git deduplication, integrity checking, and immutability for free. **Prerequisites**: Completed [Chapter 1: Setup & Init](/courses/build-your-own-x/git-1-setup)\ **Time**: 2-3 hours\ **Outcome**: Working `hash-object` and `cat-file` commands *** ## The Core Insight Git stores everything as **objects** in a content-addressable store. The "address" (filename) is derived from the content itself using SHA-1 hashing. ``` Content: "Hello, World!" ↓ SHA-1("blob 13\0Hello, World!") ↓ Hash: 5dd01c177f5d7d1be5346a5bc18a569a7410c2ef ↓ Stored at: .git/objects/5d/d01c177f5d7d1be5346a5bc18a569a7410c2ef ``` **Why is this brilliant?** * Same content = same hash = stored only once (deduplication!) * Hash acts as a checksum (data integrity) * Immutable objects (can't change content without changing address) *** ## The Three Object Types ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ GIT OBJECT TYPES │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ BLOB TREE COMMIT │ │ ──── ──── ────── │ │ Raw file content Directory snapshot Project snapshot │ │ │ │ ┌─────────────┐ ┌─────────────────┐ ┌──────────────────┐ │ │ │ Hello World │ │ 100644 file.txt │ │ tree d8329fc... │ │ │ │ │ │ → blob abc123 │ │ parent 5a6f32... │ │ │ └─────────────┘ │ 040000 src/ │ │ author Jane │ │ │ ↓ │ → tree def456 │ │ committer Jane │ │ │ SHA-1 of content └─────────────────┘ │ │ │ │ ↓ │ Initial commit │ │ │ SHA-1 of entries └──────────────────┘ │ │ ↓ │ │ SHA-1 of metadata │ │ │ │ ALL OBJECTS: header + content → SHA-1 → stored compressed │ └─────────────────────────────────────────────────────────────────────────────┘ ``` *** ## Implementation ### Step 1: SHA-1 Hashing Utility Create the core hashing function that all objects will use: ```javascript src/utils/objects.js theme={null} const crypto = require('crypto'); const zlib = require('zlib'); const fs = require('fs'); const path = require('path'); /** * Compute SHA-1 hash for a Git object. * * Git object format: "{type} {size}\0{content}" * - type: blob, tree, commit, or tag * - size: content length in bytes (must be the byte length, not character count) * - \0: null byte separator -- this prevents ambiguity between the header and body * - content: the actual data * * Why include type and size in the hash? Two reasons: * 1. Security: a blob containing the text "tree 100" won't collide * with an actual tree object, because the headers differ. * 2. Verification: when reading back, we can check that the stored * size matches the actual content length -- catching corruption early. * * @param {Buffer|string} content - The object content * @param {string} type - Object type (blob, tree, commit) * @returns {{hash: string, data: Buffer}} - Hash and full object data */ function hashObject(content, type = 'blob') { // Convert string to buffer if needed. // We use Buffer because we need the byte length, which can differ // from string length for multi-byte characters (e.g., UTF-8 emoji). const contentBuffer = Buffer.isBuffer(content) ? content : Buffer.from(content); // Create header: "{type} {size}\0" const header = Buffer.from(`${type} ${contentBuffer.length}\0`); // Full object = header + content const fullObject = Buffer.concat([header, contentBuffer]); // Compute SHA-1 hash. // The resulting 40-character hex string becomes the object's "address". const hash = crypto .createHash('sha1') .update(fullObject) .digest('hex'); return { hash, data: fullObject }; } /** * Write an object to the Git object store * * @param {string} gitDir - Path to .git directory * @param {string} hash - 40-character hex hash * @param {Buffer} data - Full object data (header + content) * @returns {string} - The hash */ function writeObject(gitDir, hash, data) { // Objects stored at: .git/objects/{first 2 chars}/{remaining 38 chars} // The 2-char prefix creates up to 256 subdirectories -- a fan-out // strategy that keeps directory listings small and fast. const objectDir = path.join(gitDir, 'objects', hash.slice(0, 2)); const objectPath = path.join(objectDir, hash.slice(2)); // Don't write if already exists. Because the hash is derived from the // content, an existing file with the same hash is guaranteed to have // the same content. This is the deduplication magic of content-addressable // storage -- identical files are never stored twice. if (fs.existsSync(objectPath)) { return hash; } // Create subdirectory if needed if (!fs.existsSync(objectDir)) { fs.mkdirSync(objectDir, { recursive: true }); } // Compress with zlib and write. // Real Git also uses zlib deflate. On average this reduces object size // by 60-70%, which is why .git directories are much smaller than you'd expect. // // Debugging tip: if you suspect a corrupt object, you can manually inflate // it with: zlib.inflateSync(fs.readFileSync(objectPath)).toString() const compressed = zlib.deflateSync(data); fs.writeFileSync(objectPath, compressed); return hash; } /** * Read an object from the Git object store * * @param {string} gitDir - Path to .git directory * @param {string} hash - 40-character hex hash * @returns {{type: string, size: number, content: Buffer}} */ function readObject(gitDir, hash) { const objectPath = path.join( gitDir, 'objects', hash.slice(0, 2), hash.slice(2) ); if (!fs.existsSync(objectPath)) { throw new Error(`fatal: Not a valid object name ${hash}`); } // Read and decompress. Every object on disk is zlib-deflated. const compressed = fs.readFileSync(objectPath); const data = zlib.inflateSync(compressed); // Parse header: find the null byte that separates header from content. // The header tells us the object type and expected size, which lets us // verify integrity before trusting the data. const nullIndex = data.indexOf(0); const header = data.slice(0, nullIndex).toString(); const content = data.slice(nullIndex + 1); // Parse header: "{type} {size}" const [type, sizeStr] = header.split(' '); const size = parseInt(sizeStr, 10); // Verify size matches. If a disk error or interrupted write corrupted // the file, this check catches it immediately rather than letting bad // data propagate silently through your commit history. if (content.length !== size) { throw new Error(`Object ${hash} is corrupted`); } return { type, size, content }; } module.exports = { hashObject, writeObject, readObject }; ``` *** ### Step 2: Implement hash-object Command The `hash-object` command computes the hash of a file and optionally stores it: ```javascript src/commands/hashObject.js theme={null} const fs = require('fs'); const path = require('path'); const { hashObject, writeObject } = require('../utils/objects'); const { findGitDir } = require('../utils/paths'); /** * hash-object - Compute object ID and optionally store it * * Usage: * mygit hash-object # Just compute hash * mygit hash-object -w # Compute and write to object store * mygit hash-object -t # Specify type (default: blob) * mygit hash-object --stdin # Read from stdin */ function execute(args) { // Parse options let write = false; let type = 'blob'; let useStdin = false; const files = []; for (let i = 0; i < args.length; i++) { const arg = args[i]; if (arg === '-w') { write = true; } else if (arg === '-t') { type = args[++i]; if (!['blob', 'tree', 'commit', 'tag'].includes(type)) { throw new Error(`invalid object type "${type}"`); } } else if (arg === '--stdin') { useStdin = true; } else if (!arg.startsWith('-')) { files.push(arg); } } // Handle stdin if (useStdin) { // Read all stdin synchronously const chunks = []; const fd = fs.openSync(0, 'r'); // stdin const buffer = Buffer.alloc(1024); let bytesRead; while ((bytesRead = fs.readSync(fd, buffer, 0, buffer.length)) > 0) { chunks.push(buffer.slice(0, bytesRead)); } const content = Buffer.concat(chunks); processContent(content, type, write); return; } // Handle files if (files.length === 0) { throw new Error('no file specified'); } for (const file of files) { if (!fs.existsSync(file)) { throw new Error(`fatal: could not open '${file}' for reading`); } const content = fs.readFileSync(file); processContent(content, type, write); } } function processContent(content, type, write) { const { hash, data } = hashObject(content, type); if (write) { const gitDir = findGitDir(); if (!gitDir) { throw new Error('fatal: not a git repository'); } writeObject(gitDir, hash, data); } console.log(hash); } module.exports = { execute }; ``` *** ### Step 3: Implement cat-file Command The `cat-file` command reads objects from the store: ```javascript src/commands/catFile.js theme={null} const { readObject } = require('../utils/objects'); const { requireGitDir } = require('../utils/paths'); /** * cat-file - Provide content or type of repository objects * * Usage: * mygit cat-file -t # Show object type * mygit cat-file -s # Show object size * mygit cat-file -p # Pretty-print object content * mygit cat-file # Show content, expecting type */ function execute(args) { if (args.length < 2) { throw new Error('usage: mygit cat-file (-t | -s | -p | )