Chapter 2: The Object Model
Git’s object model is its most elegant design. In this chapter, we’ll implement the foundation of Git: content-addressable storage using SHA-1 hashing.Prerequisites: Completed Chapter 1: Setup & Init
Time: 2-3 hours
Outcome: Working
Time: 2-3 hours
Outcome: Working
hash-object and cat-file commandsThe Core Insight
Git stores everything as objects in a content-addressable store. The “address” (filename) is derived from the content itself using SHA-1 hashing.Why is this brilliant?
- Same content = same hash = stored only once (deduplication!)
- Hash acts as a checksum (data integrity)
- Immutable objects (can’t change content without changing address)
The Three Object Types
Implementation
Step 1: SHA-1 Hashing Utility
Create the core hashing function that all objects will use:src/utils/objects.js
Step 2: Implement hash-object Command
Thehash-object command computes the hash of a file and optionally stores it:
src/commands/hashObject.js
Step 3: Implement cat-file Command
Thecat-file command reads objects from the store:
src/commands/catFile.js
Step 4: Update CLI Entry Point
src/mygit.js
Testing Your Implementation
Test hash-object
Test cat-file
Compare with Real Git
Understanding Blob Storage
Let’s trace through what happens when you store “Hello”:Why include type and size in the hash?
Why include type and size in the hash?
Including type and size in the hashed data:
- Prevents collisions: A blob “tree 100” won’t have the same hash as an actual tree
- Enables verification: We can check the stored size matches actual content
- Self-describing: The object header tells us what it is
Exercises
Exercise 1: Add stdin support
Exercise 1: Add stdin support
Implement reading from stdin for hash-object:Hint: Use
fs.readFileSync(0, 'utf8') to read from stdin (file descriptor 0).Exercise 2: Implement tree objects
Exercise 2: Implement tree objects
Create a utility to build tree objects:
Exercise 3: Pretty-print trees
Exercise 3: Pretty-print trees
Enhance Hint: Parse the binary tree format and detect type from mode.
cat-file -p to nicely format tree objects:Key Concepts Review
Content-Addressable
Objects are stored by their SHA-1 hash. Same content = same hash = same storage location.
Immutable Objects
Once written, objects never change. Changing content would change the hash.
Compression
All objects are zlib-compressed. Git is surprisingly space-efficient.
Object Format
Every object:
{type} {size}\0{content} - simple and consistent.Further Reading
DSA: Hash Maps
Understand the data structure behind content-addressable storage
Cryptography Basics
Learn more about SHA-1 and cryptographic hashing
What’s Next?
In Chapter 3: Staging & Index, we’ll implement:- The index (staging area) file format
- The
addcommand - The
statuscommand
Next: Staging & Index
Learn how Git’s staging area works