> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Chapter 1: Project Setup & Init

> Set up your Git clone project and implement the init command to understand repository structure

# Chapter 1: Project Setup & Init

In this chapter, we'll set up our project and implement the first Git command: `init`. By the end, you'll understand exactly what happens when you run `git init`.

Why start here? Because `git init` is deceptively simple -- it looks like it just creates a folder. But that folder is the skeleton of a content-addressable filesystem, a concept borrowed from how large-scale storage systems (think Amazon S3 or IPFS) organize data. By building `init` yourself, you are laying the foundation that every subsequent command depends on, just like pouring the foundation of a building before raising walls.

<Info>
  **Prerequisites**: Basic JavaScript knowledge, Node.js installed\
  **Time**: 1-2 hours\
  **Outcome**: A working `mygit init` command
</Info>

***

## What You'll Learn

* How Git's `.git` directory is structured
* What each file and folder in `.git` means
* How to build a CLI tool in Node.js
* Content-addressable storage concepts

***

## Understanding Git's Directory Structure

When you run `git init`, Git creates a `.git` directory with this structure:

```
.git/
├── HEAD              # Points to current branch (ref: refs/heads/master)
├── config            # Repository-specific configuration
├── description       # Used by GitWeb (we'll skip this)
├── objects/          # Object database (blobs, trees, commits)
│   ├── info/
│   └── pack/
├── refs/             # Branch and tag pointers
│   ├── heads/        # Branch refs (e.g., refs/heads/master)
│   └── tags/         # Tag refs
├── hooks/            # Scripts triggered by Git events
└── info/
    └── exclude       # Local gitignore (not committed)
```

<Note>
  The most important parts are `HEAD`, `objects/`, and `refs/`. Everything else is optional for a minimal implementation. Think of `HEAD` as "you are here" on a map, `objects/` as the warehouse that stores every version of every file, and `refs/` as the labelled bookmarks pointing to specific moments in history.
</Note>

***

## Project Setup

### 1. Initialize Your Project

```bash theme={null}
mkdir mygit
cd mygit
npm init -y
```

### 2. Create the Project Structure

```
mygit/
├── src/
│   ├── commands/         # Individual command implementations
│   │   └── init.js
│   ├── utils/            # Shared utilities
│   │   └── paths.js
│   └── mygit.js          # Main CLI entry point
├── package.json
└── README.md
```

### 3. Set Up package.json

```json package.json theme={null}
{
  "name": "mygit",
  "version": "1.0.0",
  "description": "A Git implementation for learning",
  "main": "src/mygit.js",
  "bin": {
    "mygit": "./src/mygit.js"
  },
  "scripts": {
    "test": "node test/test.js"
  },
  "keywords": ["git", "vcs", "learning"],
  "license": "MIT"
}
```

***

## Implementation

### Step 1: Create the CLI Entry Point

```javascript src/mygit.js theme={null}
#!/usr/bin/env node

/**
 * mygit - A Git implementation for learning
 * 
 * Usage: mygit <command> [options]
 * 
 * This file is the "front door" of our CLI tool. Its only job is to
 * figure out which command the user wants and hand off to the right module.
 * Keeping this file thin means each command can evolve independently --
 * the same separation-of-concerns principle behind real Git's source layout.
 */

const commands = {
    init: require('./commands/init'),
    // We'll add more commands later
};

function main() {
    // process.argv: [node, script, ...userArgs]
    // We slice off the first two to get the user-supplied arguments.
    const args = process.argv.slice(2);
    
    if (args.length === 0) {
        console.log('usage: mygit <command> [<args>]');
        console.log('\nAvailable commands:');
        console.log('   init       Initialize a new repository');
        process.exit(1);
    }
    
    const command = args[0];
    const commandArgs = args.slice(1);
    
    if (!commands[command]) {
        console.error(`mygit: '${command}' is not a mygit command.`);
        process.exit(1);
    }
    
    try {
        commands[command].execute(commandArgs);
    } catch (error) {
        console.error(`error: ${error.message}`);
        process.exit(1);
    }
}

main();
```

### Step 2: Create Path Utilities

```javascript src/utils/paths.js theme={null}
const path = require('path');
const fs = require('fs');

/**
 * Find the .git directory by walking up from the current directory.
 *
 * Why walk upward? Because Git commands work from any subdirectory
 * inside a repository. When you run `git status` from `src/utils/`,
 * Git doesn't give up -- it climbs the directory tree until it finds
 * the `.git` folder. This is the same behavior we're implementing here.
 *
 * Debugging tip: if this function returns null when you expect a repo,
 * check that you're calling it from a directory that actually lives
 * inside an initialized repository. A common mistake is running
 * `mygit init` in one terminal directory and testing in another.
 */
function findGitDir(startDir = process.cwd()) {
    let currentDir = startDir;
    
    // Walk up until we hit the filesystem root (e.g., "/" or "C:\")
    while (currentDir !== path.parse(currentDir).root) {
        const gitDir = path.join(currentDir, '.git');
        if (fs.existsSync(gitDir) && fs.statSync(gitDir).isDirectory()) {
            return gitDir;
        }
        currentDir = path.dirname(currentDir);
    }
    
    return null;
}

/**
 * Get the repository root (parent of .git)
 */
function getRepoRoot(gitDir) {
    return path.dirname(gitDir);
}

/**
 * Ensure we're in a Git repository
 */
function requireGitDir() {
    const gitDir = findGitDir();
    if (!gitDir) {
        throw new Error('not a git repository (or any of the parent directories): .git');
    }
    return gitDir;
}

module.exports = {
    findGitDir,
    getRepoRoot,
    requireGitDir
};
```

### Step 3: Implement the Init Command

```javascript src/commands/init.js theme={null}
const fs = require('fs');
const path = require('path');

/**
 * Initialize a new Git repository
 * 
 * Creates the .git directory structure:
 * - HEAD: Points to refs/heads/master
 * - config: Repository configuration
 * - objects/: Object database
 * - refs/heads/: Branch refs
 * - refs/tags/: Tag refs
 */
function execute(args) {
    // Parse arguments
    const directory = args[0] || '.';
    const repoPath = path.resolve(directory);
    const gitDir = path.join(repoPath, '.git');
    
    // Check if already a repository
    if (fs.existsSync(gitDir)) {
        console.log(`Reinitialized existing Git repository in ${gitDir}`);
        return;
    }
    
    // Create the directory structure.
    // Each of these directories serves a specific purpose:
    //   objects/      -> the database of all file contents, trees, and commits
    //   objects/info/ -> auxiliary info for dumb-protocol transfers (rarely used)
    //   objects/pack/ -> packed objects for storage efficiency (we'll skip packing)
    //   refs/heads/   -> one file per branch, each containing a commit hash
    //   refs/tags/    -> one file per tag, pointing to a tagged object
    //   info/         -> repo-level metadata like local exclude patterns
    const directories = [
        gitDir,
        path.join(gitDir, 'objects'),
        path.join(gitDir, 'objects', 'info'),
        path.join(gitDir, 'objects', 'pack'),
        path.join(gitDir, 'refs'),
        path.join(gitDir, 'refs', 'heads'),
        path.join(gitDir, 'refs', 'tags'),
        path.join(gitDir, 'info'),
    ];
    
    directories.forEach(dir => {
        fs.mkdirSync(dir, { recursive: true });
    });
    
    // Create HEAD file - points to the master branch.
    // This is a "symbolic reference" (symref). HEAD is the answer to
    // "where am I right now?" -- it almost always points to a branch name,
    // which in turn points to a commit hash. This level of indirection is
    // what lets commits advance a branch automatically.
    const headContent = 'ref: refs/heads/master\n';
    fs.writeFileSync(path.join(gitDir, 'HEAD'), headContent);
    
    // Create config file with minimal settings
    const configContent = `[core]
\trepositoryformatversion = 0
\tfilemode = false
\tbare = false
`;
    fs.writeFileSync(path.join(gitDir, 'config'), configContent);
    
    // Create description file (used by GitWeb)
    const descContent = 'Unnamed repository; edit this file to name the repository.\n';
    fs.writeFileSync(path.join(gitDir, 'description'), descContent);
    
    // Create info/exclude (local gitignore)
    const excludeContent = '# git ls-files --others --exclude-from=.git/info/exclude\n';
    fs.writeFileSync(path.join(gitDir, 'info', 'exclude'), excludeContent);
    
    console.log(`Initialized empty Git repository in ${gitDir}`);
}

module.exports = { execute };
```

***

## Testing Your Implementation

### Make It Executable

```bash theme={null}
# Make the script executable (on Unix)
chmod +x src/mygit.js

# Link it globally for testing
npm link
```

### Test It!

```bash theme={null}
# Create a test directory
mkdir test-repo
cd test-repo

# Initialize with your implementation
mygit init

# Verify the structure
ls -la .git/
# Should show: HEAD, config, objects/, refs/
```

### Compare with Real Git

```bash theme={null}
# Create another directory
mkdir git-repo
cd git-repo
git init

# Compare the structures
diff -r ../test-repo/.git .git
```

***

## Deep Dive: Understanding HEAD

The `HEAD` file is crucial to Git. Let's understand it:

```
ref: refs/heads/master
```

This is a **symbolic reference** (symref). It says "I'm pointing to whatever `refs/heads/master` contains."

<AccordionGroup>
  <Accordion title="What happens when you commit?" icon="code-commit">
    1. Git creates a new commit object
    2. Reads HEAD to find current branch: `refs/heads/master`
    3. Updates `refs/heads/master` to point to new commit
    4. HEAD still points to `refs/heads/master` (unchanged)
  </Accordion>

  <Accordion title="What is 'detached HEAD'?" icon="link-slash">
    When HEAD contains a commit SHA instead of a ref:

    ```
    abc123def456...  (not "ref: refs/heads/...")
    ```

    This means you're not on any branch!
  </Accordion>

  <Accordion title="How do branches work?" icon="code-branch">
    A branch is just a file in `refs/heads/` containing a commit SHA:

    ```
    $ cat .git/refs/heads/master
    abc123def456789...
    ```

    That's it! Branches are just pointers to commits.
  </Accordion>
</AccordionGroup>

***

## Deep Dive: The Objects Directory

The `objects/` directory is Git's content-addressable storage:

```
objects/
├── ab/                    # First 2 characters of SHA
│   └── cdef123456...     # Remaining 38 characters
├── pack/                  # Packed objects (for efficiency)
└── info/                  # Additional info
```

<Note>
  **Why split the hash?**
  A directory with millions of files is slow on most filesystems because lookups degrade as directory entries grow. By using the first 2 hex characters as subdirectory names, Git creates up to 256 "buckets" (00-ff), each holding far fewer files. This is essentially the same trick a hash table uses -- distribute entries across buckets to keep each bucket small. Real-world repos with hundreds of thousands of objects stay fast thanks to this simple fan-out.
</Note>

We'll implement the object storage in the next chapter!

***

## Common Pitfalls

* **Forgetting the trailing newline in HEAD.** The `HEAD` file must end with `\n`. Without it, some Git tools (and your own future commands) may misparse the ref. Always write `'ref: refs/heads/master\n'`, not `'ref: refs/heads/master'`.
* **Path separator issues on Windows.** If you're developing on Windows, `path.join` produces backslashes. Git expects forward slashes inside its own metadata. For internal Git paths (like ref names), normalize with `.split(path.sep).join('/')`.
* **Re-initializing an existing repo by accident.** Real Git gracefully handles `git init` in an existing repo (it prints "Reinitialized..."). Make sure your implementation checks for an existing `.git` directory before overwriting it.

***

## Exercises

<Accordion title="Exercise 1: Add --bare flag" icon="terminal">
  Implement `mygit init --bare` which creates a bare repository (no working directory):

  ```javascript theme={null}
  // Bare repos don't have:
  // - A working directory
  // - HEAD pointing to a branch (often points directly to a commit)
  // - The .git folder IS the repository (not inside another folder)

  // Hint: Check for --bare in args, then:
  // 1. Don't create .git subdirectory, use current directory
  // 2. Set bare = true in config
  ```

  **Solution outline:**

  ```javascript theme={null}
  const isBare = args.includes('--bare');
  const repoDir = isBare ? repoPath : path.join(repoPath, '.git');
  // ... rest of implementation
  ```
</Accordion>

<Accordion title="Exercise 2: Add --initial-branch flag" icon="code-branch">
  Implement `mygit init --initial-branch=main` to set a custom default branch:

  ```javascript theme={null}
  // Modern Git uses 'main' instead of 'master'
  // Parse the --initial-branch=NAME argument

  // Hint: Update HEAD content:
  // ref: refs/heads/main
  ```
</Accordion>

<Accordion title="Exercise 3: Validate directory" icon="folder">
  Add validation to check if the target directory is writable:

  ```javascript theme={null}
  // Before creating .git, check:
  // 1. Parent directory exists (or can be created)
  // 2. We have write permissions
  // 3. Not trying to init inside another .git

  // Use: fs.accessSync(dir, fs.constants.W_OK)
  ```
</Accordion>

***

## Key Takeaways

<CardGroup cols={2}>
  <Card title="Simple Structure" icon="folder-tree">
    Git's `.git` directory is surprisingly simple: just files and folders
  </Card>

  <Card title="HEAD is King" icon="crown">
    HEAD always tells you where you are: which branch or which commit
  </Card>

  <Card title="Branches are Files" icon="file">
    A branch is just a file containing 40 hex characters (a SHA-1 hash)
  </Card>

  <Card title="Content-Addressable" icon="fingerprint">
    The `objects/` directory stores everything by its content hash
  </Card>
</CardGroup>

***

## What's Next?

In [Chapter 2: Object Model](/courses/build-your-own-x/git-2-objects), we'll implement Git's object storage:

* Create and store blob objects (file content)
* Implement `hash-object` and `cat-file` commands
* Understand SHA-1 hashing and zlib compression

<Card title="Next: Object Model" icon="arrow-right" href="/courses/build-your-own-x/git-2-objects">
  Learn how Git stores files as content-addressed blobs
</Card>

***

## Interview Deep-Dive

<AccordionGroup>
  <Accordion title="Why does Git split object hashes into a 2-character directory prefix and a 38-character filename? What problem does this solve?">
    **Strong Answer:**

    * This is a fan-out strategy borrowed from hash table design. Most filesystems degrade when a single directory contains hundreds of thousands of entries because directory lookups become linear scans (or at best, B-tree lookups that grow with entry count). By using the first two hex characters as a subdirectory, Git creates up to 256 buckets, keeping each directory small.
    * A repository with 100,000 objects has \~390 files per subdirectory on average. Without fan-out, that is 100,000 entries in a single `objects/` directory, which causes `readdir()` and `stat()` to slow down dramatically on ext4 and other common filesystems.
    * The choice of 2 characters (256 buckets) is a pragmatic balance. One character (16 buckets) would still put thousands of files per directory. Three characters (4096 buckets) would waste directory entries on small repos. Two characters work well for repositories up to millions of objects.
    * This same pattern appears everywhere in systems design: Redis uses hash slots for cluster sharding, Cassandra uses consistent hashing for partition distribution, and CDNs use URL-based hashing for cache distribution. The underlying principle is always the same: distribute entries across buckets to avoid hotspots.

    **Follow-up: What are pack files, and how do they change this storage model?**

    Pack files consolidate many loose objects into a single `.pack` file with an accompanying `.idx` index. Instead of 100,000 individual files in the fan-out directory structure, Git compresses them into one (or a few) pack files. The index provides O(1) lookup by hash using a binary search on sorted entries. `git gc` triggers packing, and `git repack` can be run manually. Pack files also use delta compression: similar objects are stored as a base plus a binary diff, which is why pack files are dramatically smaller than the sum of loose objects. The fan-out directories are still used for newly created objects (loose objects); packing happens periodically as maintenance.
  </Accordion>

  <Accordion title="What is a symbolic reference in Git, and why does HEAD use this indirection instead of directly storing a commit hash?">
    **Strong Answer:**

    * A symbolic reference (symref) is a reference that points to another reference rather than directly to an object. HEAD normally contains `ref: refs/heads/main`, which means "follow refs/heads/main to find my value." This level of indirection is what makes branch advancement automatic.
    * When you commit, Git reads HEAD, sees it is a symref pointing to `refs/heads/main`, reads the current commit hash from that file, creates a new commit with that hash as the parent, then writes the new commit's hash back to `refs/heads/main`. HEAD does not change -- only the branch file changes. This is how branches "grow": the tip pointer moves forward one commit at a time.
    * Without this indirection, Git would have to update HEAD on every commit, and there would be no concept of "being on a branch." Every state would be a detached HEAD. The symref is what connects the concept of "current branch" to the commit graph.
    * The practical consequence is that operations like `git log` can resolve HEAD to a branch name and display "On branch main" rather than just a raw hash. It also enables reflogs to track branch history separately from HEAD history, which is essential for recovery operations like `git reflog` after an accidental `git reset --hard`.

    **Follow-up: What happens to HEAD during a rebase or merge? Does it stay as a symref?**

    During a normal rebase, HEAD remains a symref to the current branch. As each commit is replayed, the branch pointer advances. If a conflict occurs and the rebase pauses, HEAD is in a special "rebase in progress" state tracked by files in `.git/rebase-merge/` or `.git/rebase-apply/`. During a merge, HEAD also stays as a symref. The merge commit is created with two (or more) parents, and the branch pointer advances to the merge commit. The only time HEAD becomes a raw hash (detached) is during `git checkout <commit>` or during the intermediate steps of an interactive rebase where Git checks out individual commits for editing.
  </Accordion>

  <Accordion title="If you initialize a Git repository and immediately check .git/refs/heads/, it is empty. Where is the 'master' branch?">
    **Strong Answer:**

    * It does not exist yet. HEAD contains `ref: refs/heads/master`, but the file `.git/refs/heads/master` is not created until the first commit. This is an intentional design: a branch file only exists when it has a commit to point to. Before the first commit, you are on an "unborn branch."
    * This is why `git status` on a fresh repository says "No commits yet" and why `git log` fails with "does not have any commits yet." The branch is conceptually created by HEAD's symref, but it is not materialized until there is a commit hash to write into the branch file.
    * This design avoids a special "null commit" sentinel value. Instead of storing a special value meaning "no commits," Git simply does not create the file. Code that reads branch refs handles the "file not found" case as "unborn branch," which is cleaner than checking for a magic value.
    * A practical implication: if you try to create a branch with `git branch feature` before the first commit, Git fails because it cannot resolve HEAD to a commit hash. You must make at least one commit before branching. This trips up new Git users who try to create branches immediately after `git init`.

    **Follow-up: How does `git init --initial-branch=main` differ from `git init` followed by `git branch -m main`?**

    `git init --initial-branch=main` writes `ref: refs/heads/main\n` to HEAD instead of the default `ref: refs/heads/master\n`. No branch file is created in either case -- it only affects the symref target in HEAD. `git branch -m main` after `git init` would fail before the first commit because `git branch -m` requires the current branch to exist (it renames the branch file), and the branch file is not created until the first commit. So `--initial-branch` is the only correct way to set the default branch name before the first commit. This is why the Git project added the `init.defaultBranch` configuration option -- it avoids the race between init and first commit entirely.
  </Accordion>
</AccordionGroup>