Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Chapter 1: Project Setup & Init
In this chapter, we’ll set up our project and implement the first Git command:init. By the end, you’ll understand exactly what happens when you run git init.
Why start here? Because git init is deceptively simple — it looks like it just creates a folder. But that folder is the skeleton of a content-addressable filesystem, a concept borrowed from how large-scale storage systems (think Amazon S3 or IPFS) organize data. By building init yourself, you are laying the foundation that every subsequent command depends on, just like pouring the foundation of a building before raising walls.
Time: 1-2 hours
Outcome: A working
mygit init commandWhat You’ll Learn
- How Git’s
.gitdirectory is structured - What each file and folder in
.gitmeans - How to build a CLI tool in Node.js
- Content-addressable storage concepts
Understanding Git’s Directory Structure
When you rungit init, Git creates a .git directory with this structure:
HEAD, objects/, and refs/. Everything else is optional for a minimal implementation. Think of HEAD as “you are here” on a map, objects/ as the warehouse that stores every version of every file, and refs/ as the labelled bookmarks pointing to specific moments in history.Project Setup
1. Initialize Your Project
2. Create the Project Structure
3. Set Up package.json
Implementation
Step 1: Create the CLI Entry Point
Step 2: Create Path Utilities
Step 3: Implement the Init Command
Testing Your Implementation
Make It Executable
Test It!
Compare with Real Git
Deep Dive: Understanding HEAD
TheHEAD file is crucial to Git. Let’s understand it:
refs/heads/master contains.”
What happens when you commit?
What happens when you commit?
- Git creates a new commit object
- Reads HEAD to find current branch:
refs/heads/master - Updates
refs/heads/masterto point to new commit - HEAD still points to
refs/heads/master(unchanged)
What is 'detached HEAD'?
What is 'detached HEAD'?
How do branches work?
How do branches work?
refs/heads/ containing a commit SHA:Deep Dive: The Objects Directory
Theobjects/ directory is Git’s content-addressable storage:
Common Pitfalls
- Forgetting the trailing newline in HEAD. The
HEADfile must end with\n. Without it, some Git tools (and your own future commands) may misparse the ref. Always write'ref: refs/heads/master\n', not'ref: refs/heads/master'. - Path separator issues on Windows. If you’re developing on Windows,
path.joinproduces backslashes. Git expects forward slashes inside its own metadata. For internal Git paths (like ref names), normalize with.split(path.sep).join('/'). - Re-initializing an existing repo by accident. Real Git gracefully handles
git initin an existing repo (it prints “Reinitialized…”). Make sure your implementation checks for an existing.gitdirectory before overwriting it.
Exercises
Exercise 1: Add --bare flag
Exercise 1: Add --bare flag
mygit init --bare which creates a bare repository (no working directory):Exercise 2: Add --initial-branch flag
Exercise 2: Add --initial-branch flag
mygit init --initial-branch=main to set a custom default branch:Exercise 3: Validate directory
Exercise 3: Validate directory
Key Takeaways
Simple Structure
.git directory is surprisingly simple: just files and foldersHEAD is King
Branches are Files
Content-Addressable
objects/ directory stores everything by its content hashWhat’s Next?
In Chapter 2: Object Model, we’ll implement Git’s object storage:- Create and store blob objects (file content)
- Implement
hash-objectandcat-filecommands - Understand SHA-1 hashing and zlib compression
Next: Object Model
Interview Deep-Dive
Why does Git split object hashes into a 2-character directory prefix and a 38-character filename? What problem does this solve?
Why does Git split object hashes into a 2-character directory prefix and a 38-character filename? What problem does this solve?
- This is a fan-out strategy borrowed from hash table design. Most filesystems degrade when a single directory contains hundreds of thousands of entries because directory lookups become linear scans (or at best, B-tree lookups that grow with entry count). By using the first two hex characters as a subdirectory, Git creates up to 256 buckets, keeping each directory small.
- A repository with 100,000 objects has ~390 files per subdirectory on average. Without fan-out, that is 100,000 entries in a single
objects/directory, which causesreaddir()andstat()to slow down dramatically on ext4 and other common filesystems. - The choice of 2 characters (256 buckets) is a pragmatic balance. One character (16 buckets) would still put thousands of files per directory. Three characters (4096 buckets) would waste directory entries on small repos. Two characters work well for repositories up to millions of objects.
- This same pattern appears everywhere in systems design: Redis uses hash slots for cluster sharding, Cassandra uses consistent hashing for partition distribution, and CDNs use URL-based hashing for cache distribution. The underlying principle is always the same: distribute entries across buckets to avoid hotspots.
.pack file with an accompanying .idx index. Instead of 100,000 individual files in the fan-out directory structure, Git compresses them into one (or a few) pack files. The index provides O(1) lookup by hash using a binary search on sorted entries. git gc triggers packing, and git repack can be run manually. Pack files also use delta compression: similar objects are stored as a base plus a binary diff, which is why pack files are dramatically smaller than the sum of loose objects. The fan-out directories are still used for newly created objects (loose objects); packing happens periodically as maintenance.What is a symbolic reference in Git, and why does HEAD use this indirection instead of directly storing a commit hash?
What is a symbolic reference in Git, and why does HEAD use this indirection instead of directly storing a commit hash?
- A symbolic reference (symref) is a reference that points to another reference rather than directly to an object. HEAD normally contains
ref: refs/heads/main, which means “follow refs/heads/main to find my value.” This level of indirection is what makes branch advancement automatic. - When you commit, Git reads HEAD, sees it is a symref pointing to
refs/heads/main, reads the current commit hash from that file, creates a new commit with that hash as the parent, then writes the new commit’s hash back torefs/heads/main. HEAD does not change — only the branch file changes. This is how branches “grow”: the tip pointer moves forward one commit at a time. - Without this indirection, Git would have to update HEAD on every commit, and there would be no concept of “being on a branch.” Every state would be a detached HEAD. The symref is what connects the concept of “current branch” to the commit graph.
- The practical consequence is that operations like
git logcan resolve HEAD to a branch name and display “On branch main” rather than just a raw hash. It also enables reflogs to track branch history separately from HEAD history, which is essential for recovery operations likegit reflogafter an accidentalgit reset --hard.
.git/rebase-merge/ or .git/rebase-apply/. During a merge, HEAD also stays as a symref. The merge commit is created with two (or more) parents, and the branch pointer advances to the merge commit. The only time HEAD becomes a raw hash (detached) is during git checkout <commit> or during the intermediate steps of an interactive rebase where Git checks out individual commits for editing.If you initialize a Git repository and immediately check .git/refs/heads/, it is empty. Where is the 'master' branch?
If you initialize a Git repository and immediately check .git/refs/heads/, it is empty. Where is the 'master' branch?
- It does not exist yet. HEAD contains
ref: refs/heads/master, but the file.git/refs/heads/masteris not created until the first commit. This is an intentional design: a branch file only exists when it has a commit to point to. Before the first commit, you are on an “unborn branch.” - This is why
git statuson a fresh repository says “No commits yet” and whygit logfails with “does not have any commits yet.” The branch is conceptually created by HEAD’s symref, but it is not materialized until there is a commit hash to write into the branch file. - This design avoids a special “null commit” sentinel value. Instead of storing a special value meaning “no commits,” Git simply does not create the file. Code that reads branch refs handles the “file not found” case as “unborn branch,” which is cleaner than checking for a magic value.
- A practical implication: if you try to create a branch with
git branch featurebefore the first commit, Git fails because it cannot resolve HEAD to a commit hash. You must make at least one commit before branching. This trips up new Git users who try to create branches immediately aftergit init.
git init --initial-branch=main differ from git init followed by git branch -m main?git init --initial-branch=main writes ref: refs/heads/main\n to HEAD instead of the default ref: refs/heads/master\n. No branch file is created in either case — it only affects the symref target in HEAD. git branch -m main after git init would fail before the first commit because git branch -m requires the current branch to exist (it renames the branch file), and the branch file is not created until the first commit. So --initial-branch is the only correct way to set the default branch name before the first commit. This is why the Git project added the init.defaultBranch configuration option — it avoids the race between init and first commit entirely.