Neo4j Architecture & Native Graph Storage

Module Duration: 6-8 hours Learning Style: Deep Technical + Implementation Details + Performance Analysis Outcome: Understand how Neo4j achieves O(1) traversals and ACID guarantees at the storage layer

Introduction: What Makes Neo4j “Native”?

Most databases were built for tables, then retrofitted for graphs. Neo4j was designed from day one for graphs. Key Difference: Index-free adjacency

Traditional (PostgreSQL with graph extension):
MATCH (a:Person)-[:KNOWS]->(b)
→ Index scan for Person nodes
→ Index scan for KNOWS relationships
→ JOIN on IDs
→ O(log N) per hop

Neo4j (Native Graph):
MATCH (a:Person)-[:KNOWS]->(b)
→ Follow pointer from node to relationship to node
→ O(1) per hop (constant time, regardless of graph size!)

Performance: 1000x faster for multi-hop traversals! This module reveals how Neo4j achieves this at the storage level.

Part 1: High-Level Architecture

System Layering

┌─────────────────────────────────────────────────────┐
│            Client Applications                       │
│  (Browser, Drivers: Java, Python, Go, JS)          │
└────────────────────┬────────────────────────────────┘
                     │ Bolt Protocol (Binary)
┌────────────────────▼────────────────────────────────┐
│               Query Engine                           │
│  ┌─────────────────────────────────────────┐       │
│  │  Cypher Parser                           │       │
│  │  (Convert Cypher → AST)                  │       │
│  └─────────────────┬───────────────────────┘       │
│                    │                                 │
│  ┌─────────────────▼───────────────────────┐       │
│  │  Query Planner                           │       │
│  │  (Optimize: cost-based, rule-based)      │       │
│  └─────────────────┬───────────────────────┘       │
│                    │                                 │
│  ┌─────────────────▼───────────────────────┐       │
│  │  Execution Engine                        │       │
│  │  (Execute plan: operators, pipelining)   │       │
│  └─────────────────┬───────────────────────┘       │
└────────────────────┼────────────────────────────────┘
                     │ Kernel API
┌────────────────────▼────────────────────────────────┐
│              Transaction Manager                     │
│  • ACID guarantees                                   │
│  • Write-Ahead Logging (WAL)                         │
│  • Locking (pessimistic, read/write locks)          │
└────────────────────┬────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────────┐
│               Storage Engine                         │
│  ┌──────────────────────────────────────────┐      │
│  │  Page Cache (in-memory buffer)           │      │
│  │  • LRU eviction                           │      │
│  │  • Memory-mapped files                    │      │
│  └──────────────────┬───────────────────────┘      │
│                     │                                │
│  ┌──────────────────▼───────────────────────┐      │
│  │  Store Files (disk)                      │      │
│  │  • neostore.nodestore.db                 │      │
│  │  • neostore.relationshipstore.db         │      │
│  │  • neostore.propertystore.db             │      │
│  │  • neostore.labeltokenstore.db           │      │
│  └──────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────┘

Core Components

1. Cypher Parser

Converts Cypher query string → Abstract Syntax Tree (AST)
Validates syntax

2. Query Planner

Generates execution plan (like SQL EXPLAIN)
Cost-based optimization (estimates row counts, cardinality)
Rule-based optimization (predicate push-down, etc.)

3. Execution Engine

Executes query plan using operators (Scan, Filter, Expand, etc.)
Pipelined execution (streaming results)

4. Transaction Manager

Ensures ACID properties
Write-Ahead Log (WAL) for durability
Locking for isolation

5. Storage Engine

Native graph storage (nodes, relationships, properties)
Page cache (in-memory buffer pool)
Indexes (for lookups)

Part 2: Storage Layout

Store Files

Neo4j stores graph data in multiple fixed-size record stores:

# In data/databases/neo4j/
neostore.nodestore.db              # Node records
neostore.relationshipstore.db      # Relationship records
neostore.propertystore.db          # Property records
neostore.propertystore.db.strings  # String values
neostore.propertystore.db.arrays   # Array values
neostore.relationshipgroupstore.db # Relationship type groups (dense nodes)
neostore.labeltokenstore.db        # Label names
neostore.relationshiptypestore.db  # Relationship type names
neostore.schemastore.db            # Indexes, constraints

Key Principle: Fixed-size records enable O(1) access via record ID.

Node Store

Node Record Format (15 bytes in Neo4j 4.x):

┌────────┬────────────┬────────────┬────────────┬────────┐
│ In Use │ Next Rel   │ Next Prop  │ Labels     │ Extra  │
│ 1 bit  │ 35 bits    │ 36 bits    │ 5 bytes    │ ...    │
└────────┴────────────┴────────────┴────────────┴────────┘

Fields:

In Use (1 bit): Is this record active? (0 = deleted, 1 = active)
Next Rel (35 bits): Pointer to first relationship (ID)
Next Prop (36 bits): Pointer to first property (ID)
Labels (5 bytes): Inline label storage or pointer to label array

Example: Node with ID = 42:

Record at offset: 42 × 15 bytes = 630 bytes into neostore.nodestore.db

┌─────┬─────────┬─────────┬──────────────┐
│  1  │  10050  │  20010  │  0x0001      │  (Person label)
└─────┴─────────┴─────────┴──────────────┘

Accessing Node 42:

// Pseudocode
long nodeId = 42;
long offset = nodeId * NODE_RECORD_SIZE;  // 42 * 15 = 630
byte[] record = readFromFile("neostore.nodestore.db", offset, 15);

boolean inUse = getBit(record, 0);
long firstRelId = getBits(record, 1, 35);
long firstPropId = getBits(record, 36, 36);

Time Complexity: O(1)—direct calculation, no scanning!

Relationship Store

Relationship Record Format (34 bytes in Neo4j 4.x):

┌────────┬────────┬────────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────┐
│ In Use │ First  │ Second │ Rel     │ First    │ First    │ Second   │ Second   │ Next   │
│        │ Node   │ Node   │ Type    │ Prev Rel │ Next Rel │ Prev Rel │ Next Rel │ Prop   │
│ 1 bit  │ 35 bits│ 35 bits│ 16 bits │ 35 bits  │ 35 bits  │ 35 bits  │ 35 bits  │ 36 bits│
└────────┴────────┴────────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────┘

Fields:

First Node: Source node ID
Second Node: Target node ID
Rel Type: Relationship type ID (index into relationship type store)
First Prev/Next Rel: Doubly-linked list for source node’s relationships
Second Prev/Next Rel: Doubly-linked list for target node’s relationships
Next Prop: Pointer to first property

Why Doubly-Linked Lists? Each node maintains a linked list of its relationships:

Node A (ID=10):
  ↓ Next Rel = 100
Relationship 100: (A)-[:KNOWS]->(B)
  ├─ First Node = 10 (A)
  ├─ Second Node = 20 (B)
  ├─ First Next Rel = 101  ← Next relationship for A
  └─ Second Next Rel = 200 ← Next relationship for B

Relationship 101: (A)-[:LIKES]->(C)
  ├─ First Node = 10 (A)
  ├─ Second Node = 30 (C)
  └─ First Next Rel = NULL  ← End of A's relationships

Traversal: To find all relationships for Node A:

Read Node A’s record → get Next Rel = 100
Read Relationship 100 → check if First Node == A or Second Node == A
Follow appropriate Next Rel pointer (100 → 101 → NULL)

Time Complexity: O(D), where D = degree of node (number of relationships) Critical Insight: Independent of total graph size!

In relational DB: O(log N) per relationship (index scan across all edges)
In Neo4j: O(D) (only scan node’s own relationships)

Example:

Graph with 1 billion relationships
Node A has 10 relationships
Relational: O(log 10⁹) ≈ 30 operations
Neo4j: O(10) = 10 operations

Speed-up: 3x, and scales better!

Property Store

Property Record Format (25 bytes):

┌────────┬─────────┬─────────┬──────────┬───────────┐
│ In Use │ Type    │ Key ID  │ Value    │ Next Prop │
│ 1 bit  │ 4 bits  │ 24 bits │ 64 bits  │ 36 bits   │
└────────┴─────────┴─────────┴──────────┴───────────┘

Property Types:

Type	Value Storage
`bool`	Inline (1 bit in value field)
`int`	Inline (up to 64 bits)
`float`	Inline (IEEE 754, 64 bits)
`short string`	Inline (up to 12 characters encoded)
`long string`	Pointer to string store
`array`	Pointer to array store

Property Chain: Properties form a singly-linked list:

Node A:
  Next Prop = 500

Property 500: {name: "Alice"}
  ├─ Key ID = 1 (points to "name" in property key store)
  ├─ Value = pointer to string store → "Alice"
  └─ Next Prop = 501

Property 501: {age: 30}
  ├─ Key ID = 2 ("age")
  ├─ Value = 30 (inline int)
  └─ Next Prop = NULL

Reading All Properties:

long propId = node.nextProp;  // 500
List<Property> props = new ArrayList<>();

while (propId != NULL) {
    PropertyRecord prop = readProperty(propId);
    props.add(new Property(
        getKeyName(prop.keyId),
        decodeValue(prop.type, prop.value)
    ));
    propId = prop.nextProp;
}

String Store

Stores strings > 12 characters: String Record (128 bytes, stores 120 chars + metadata):

┌────────┬──────────┬────────────────────────────────┐
│ In Use │ Length   │ Data (120 chars)               │
│ 1 byte │ 4 bytes  │ 120 bytes                      │
└────────┴──────────┴────────────────────────────────┘

Long Strings (> 120 chars): Stored across multiple records forming a linked list:

String: "This is a very long string..." (200 chars)

Record 1: chars 0-119   + Next Record = 5001
Record 2: chars 120-199 + Next Record = NULL

Dense Node Optimization

Problem: Nodes with many relationships (high degree) slow down traversals. Example: Celebrity node with 1M followers

Without optimization:
MATCH (celeb)-[:FOLLOWED_BY]->(follower)
→ Scan 1M relationship records to find FOLLOWED_BY relationships!

Solution: Relationship Group Store (introduced Neo4j 2.1) When a node exceeds a threshold (default: 50 relationships), Neo4j creates relationship type groups:

Node (Celebrity):
  Next Rel = NULL
  Next Rel Group = 7000  ← Points to group store

Relationship Group 7000:
  ├─ Type = FOLLOWED_BY
  ├─ First Outgoing = 8000
  ├─ First Incoming = 9000
  ├─ First Loop = NULL
  └─ Next Group = 7001

Relationship Group 7001:
  ├─ Type = FOLLOWS
  ├─ First Outgoing = 10000
  ├─ ...

Traversal:

MATCH (celeb)-[:FOLLOWED_BY]->(follower)

Process:

Read celeb node → Get Next Rel Group = 7000
Scan relationship groups for FOLLOWED_BY (typically < 10 types)
Follow First Outgoing pointer
Scan only FOLLOWED_BY relationships (not all 1M!)

Performance: From O(1M) to O(types) + O(followers of type) = much faster!

Part 3: Indexes

Label Indexes (Default)

When you create a label, Neo4j automatically creates an index for lookups:

CREATE (p:Person {name: "Alice"})

Without Index:

MATCH (p:Person {name: "Alice"})
→ Scan ALL nodes (O(N))

With Label Index:

MATCH (p:Person {name: "Alice"})
→ Scan only Person nodes (O(P), where P = # of Person nodes)

Storage: Labels are stored in the Node Record (5 bytes for labels):

Node 42:
  Labels = 0x0001  → Label ID 1 = "Person"

Label IDs map to names in neostore.labeltokenstore.db.

Property Indexes

Create Index:

CREATE INDEX person_name FOR (p:Person) ON (p.name)

Index Structure (Neo4j 4.x): Native Index (custom B+tree variant) Index File: schema/index/lucene-*/1/ (despite “lucene” in path, it’s native in 4.x+) B+Tree Structure:

                Root
               /    \
          [Alice]  [Charlie]
           /   \      /    \
     [Alice] [Bob] [Charlie] [David]
       ↓      ↓       ↓        ↓
    Node 10 Node 20 Node 30  Node 40

Query:

MATCH (p:Person {name: "Bob"})

Process:

Look up “Bob” in index → O(log P)
Get node ID (20)
Read node 20 → O(1)

Total: O(log P) << O(P) (full scan)

Composite Indexes

Create:

CREATE INDEX person_name_age FOR (p:Person) ON (p.name, p.age)

Use Case:

MATCH (p:Person {name: "Alice", age: 30})

Index Key: Composite key (name, age) → ("Alice", 30) Benefit: Single index lookup instead of two separate lookups + intersection.

Full-Text Indexes

Create:

CREATE FULLTEXT INDEX person_description FOR (p:Person) ON EACH [p.name, p.bio]

Query:

CALL db.index.fulltext.queryNodes("person_description", "software engineer")
YIELD node, score
RETURN node.name, score

Backend: Apache Lucene (inverted index) Use Cases:

Fuzzy search
Tokenization (split “software engineer” → [“software”, “engineer”])
Relevance scoring (TF-IDF)

Vector Indexes (Neo4j 5.x)

Create:

CREATE VECTOR INDEX user_embedding FOR (u:User) ON (u.embedding)
OPTIONS {dimension: 256, similarity: 'cosine'}

Query (Nearest neighbor search):

MATCH (u:User)
WHERE u.embedding SIMILAR TO $query_vector
RETURN u.name, vector.similarity(u.embedding, $query_vector) AS score
ORDER BY score DESC
LIMIT 10

Backend: HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor (ANN) search Use Cases:

Recommendation systems (user/item embeddings)
Semantic search (document embeddings)
Image similarity (CNN features)

Part 4: Transaction Management

ACID Guarantees

Neo4j provides full ACID transactions:

Atomicity: All-or-nothing (commit or rollback)
Consistency: Constraints enforced (uniqueness, existence)
Isolation: Transactions don’t interfere (locking)
Durability: Committed transactions survive crashes (WAL)

Write-Ahead Log (WAL)

Purpose: Durability (survive crashes) Location: neostore.transaction.db.0, neostore.transaction.db.1, … Process:

1. Transaction begins
   ↓
2. Modifications stored in memory
   ↓
3. Transaction commits
   ↓
4. Write transaction log entry to WAL (fsync to disk)
   ↓
5. Respond "SUCCESS" to client
   ↓
6. Apply changes to store files (asynchronous)

Log Entry Format:

┌──────────┬─────────┬──────────────────────────────┐
│ TX ID    │ Command │ Data                         │
│ 8 bytes  │ 1 byte  │ Variable                     │
└──────────┴─────────┴──────────────────────────────┘

Commands:

CREATE_NODE: Node ID, labels, properties
CREATE_RELATIONSHIP: Rel ID, type, start/end nodes, properties
SET_PROPERTY: Node/Rel ID, key, value
DELETE_NODE: Node ID

Crash Recovery:

On startup:
Read last committed transaction ID from store
Replay WAL entries from that point forward
Restore to consistent state

Checkpointing: Periodically, Neo4j flushes in-memory changes to store files and truncates WAL:

┌─────────────────────────────────────────────────────┐
│                  Timeline                            │
├─────────────────────────────────────────────────────┤
│ t=0:   Last checkpoint (TX ID = 1000)               │
│ t=10:  TX 1001, 1002, 1003 (in WAL)                 │
│ t=20:  TX 1004, 1005 (in WAL)                       │
│ t=30:  Checkpoint! (flush 1001-1005 to store)       │
│        → Truncate WAL (safe to discard 1001-1005)   │
│ t=40:  TX 1006, 1007 (new WAL)                      │
└─────────────────────────────────────────────────────┘

Configuration:

# neo4j.conf
db.checkpoint.interval.time=15m   # Checkpoint every 15 minutes
db.checkpoint.interval.tx=100000  # Or every 100K transactions

Locking

Neo4j uses pessimistic locking (lock before modify). Lock Types:

Read Locks (Shared locks):
- Multiple transactions can hold read locks simultaneously
- Prevents writes while reading
Write Locks (Exclusive locks):
- Only one transaction can hold a write lock
- Blocks reads and writes

Lock Granularity: Neo4j locks at the node/relationship level:

// Transaction 1:
MATCH (p:Person {name: "Alice"})
SET p.age = 31  ← Acquires WRITE lock on Node(Alice)

// Transaction 2 (concurrent):
MATCH (p:Person {name: "Alice"})
RETURN p.age  ← Blocks until TX1 commits (released lock)

Deadlock Detection:

// Transaction 1:
MATCH (a:Person {name: "Alice"})
MATCH (b:Person {name: "Bob"})
SET a.age = 31, b.age = 32

// Transaction 2 (concurrent):
MATCH (b:Person {name: "Bob"})
MATCH (a:Person {name: "Alice"})
SET b.age = 33, a.age = 34

Deadlock:

TX1 locks Alice, waits for Bob
TX2 locks Bob, waits for Alice
Deadlock!

Resolution: Neo4j detects deadlock and aborts one transaction:

Neo4jException: DeadlockDetected
Transaction aborted. Please retry.

Best Practice: Always acquire locks in consistent order:

// Both transactions lock in order: Alice → Bob
MATCH (a:Person {name: "Alice"})
MATCH (b:Person {name: "Bob"})
SET a.age = ..., b.age = ...

Isolation Levels

Neo4j provides Read Committed isolation by default: Behavior:

Reads see committed data only (no dirty reads)
Phantom reads possible (another TX inserts/deletes between reads)

Example:

// Transaction 1:
BEGIN
MATCH (p:Person) RETURN count(p)  → 100

// Transaction 2 (concurrent):
CREATE (p:Person {name: "New"})
COMMIT

// Transaction 1 (continues):
MATCH (p:Person) RETURN count(p)  → 101 (phantom read!)
COMMIT

Higher Isolation (Serializable): Neo4j doesn’t support serializable isolation natively. Use explicit locking:

// Pessimistic locking (manual):
MATCH (counter:Counter {id: 1})
SET counter.value = counter.value + 1  ← Acquires write lock
RETURN counter.value

Part 5: Page Cache

Memory Architecture

Page Cache = Neo4j’s in-memory buffer pool (like PostgreSQL’s shared buffers). Purpose:

Cache frequently accessed pages (node, relationship, property records)
Reduce disk I/O

Configuration:

# neo4j.conf
db.memory.pagecache.size=4g  # 4GB for page cache

Sizing Rule: pagecache.size = 0.5 × (RAM - heap - OS) Example (64GB RAM):

Heap (JVM): 8GB
OS: 8GB
Available for page cache: 64 - 8 - 8 = 48GB
Recommended: 24-32GB

Page Cache Structure

Page Size: 8KB (default, configurable) Cache Entry:

┌─────────────────────────────────────────┐
│  File: neostore.nodestore.db            │
│  Offset: 630 (Node ID 42)               │
│  Data: [15 bytes of node record]        │
│  LRU: Last accessed timestamp           │
└─────────────────────────────────────────┘

Page Eviction: LRU (Least Recently Used) When cache is full:

Find least recently used page
If dirty (modified), flush to disk
Evict page
Load new page into cache

Monitoring Page Cache

CALL dbms.queryJmx("org.neo4j:instance=kernel#0,name=Page cache")
YIELD attributes
RETURN attributes.hits, attributes.faults, attributes.evictions

Metrics:

Hits: Requests served from cache (fast!)
Faults: Requests requiring disk read (slow)
Evictions: Pages evicted to make room

Hit Rate: hits / (hits + faults) Target: > 95% hit rate Example:

Hits: 950,000
Faults: 50,000
Hit Rate: 950,000 / 1,000,000 = 95%  ✓ Good!

If hit rate < 90%: Increase page cache size!

Part 6: Query Execution

Explain and Profile

Explain (planning only, no execution):

EXPLAIN
MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend)
RETURN friend.name

Output:

+---------------------------+
| Operator         | Rows  |
+---------------------------+
| ProduceResults   | 10    |
| Projection       | 10    |
| Expand(All)      | 10    |
| NodeIndexSeek    | 1     |
+---------------------------+

Profile (execute and collect statistics):

PROFILE
MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend)
RETURN friend.name

Output:

+----------------------------------------+
| Operator         | Rows | DB Hits     |
+----------------------------------------+
| ProduceResults   | 10   | 0           |
| Projection       | 10   | 20          |
| Expand(All)      | 10   | 21          |
| NodeIndexSeek    | 1    | 2           |
+----------------------------------------+
Total DB Hits: 43

DB Hits: Number of page cache accesses (lower = better!)

Query Operators

NodeIndexSeek:

Look up node by indexed property
Cost: O(log N)
DB Hits: 1-3 (index lookup + node fetch)

NodeByLabelScan:

Scan all nodes with a specific label
Cost: O(L), where L = # of labeled nodes
DB Hits: L

Expand(All):

Traverse relationships from a node
Cost: O(D), where D = degree
DB Hits: D (read relationship records)

Filter:

Apply predicate (WHERE clause)
Cost: O(R), where R = input rows
DB Hits: R × P (P = properties checked per row)

Sort:

Sort results (ORDER BY)
Cost: O(R log R)
DB Hits: 0 (in-memory sort)

Aggregation (COUNT, SUM, etc.):

Group and aggregate
Cost: O(R)
DB Hits: 0 (in-memory)

Example Query Analysis

Query:

MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
WHERE friend.age > 25
RETURN friend.name, friend.age
ORDER BY friend.age DESC

Plan:

ProduceResults
  ↑
Sort (friend.age DESC)
  ↑
Projection (friend.name, friend.age)
  ↑
Filter (friend.age > 25)
  ↑
Expand(All) [:KNOWS]
  ↑
NodeIndexSeek (Person.name = "Alice")

Step-by-Step:

NodeIndexSeek: Look up Alice in Person.name index
- Cost: O(log P)
- DB Hits: 2
- Rows: 1
Expand: Traverse :KNOWS relationships
- Cost: O(D), D = Alice’s degree (e.g., 10 friends)
- DB Hits: 10 (relationship records)
- Rows: 10
Filter: Check friend.age > 25
- Cost: O(10)
- DB Hits: 10 (read age property)
- Rows: 6 (assume 6 friends > 25)
Projection: Extract name and age
- Cost: O(6)
- DB Hits: 6 (read name property)
- Rows: 6
Sort: Order by age DESC
- Cost: O(6 log 6) ≈ 15
- DB Hits: 0 (in-memory)
- Rows: 6
ProduceResults: Return to client
- Rows: 6

Total DB Hits: 2 + 10 + 10 + 6 = 28

Optimization Tips

1. Use Indexes:

-- Bad (full scan)
MATCH (p:Person)
WHERE p.name = "Alice"

-- Good (index seek)
CREATE INDEX FOR (p:Person) ON (p.name)
MATCH (p:Person {name: "Alice"})

2. Filter Early:

-- Bad (expand then filter)
MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend)
WHERE friend.age > 25

-- Good (filter during expand, if possible)
MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend:Person)
WHERE friend.age > 25

3. Limit Results:

-- Add LIMIT to avoid processing unnecessary rows
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN friend.name
LIMIT 10

4. Use EXPLAIN/PROFILE: Always check query plans for:

Missing indexes (NodeByLabelScan instead of NodeIndexSeek)
High DB Hits
Cartesian products (avoid!)

Part 7: Performance Characteristics

Time Complexity

Operation	Neo4j	Relational DB
Find node by ID	O(1)	O(1)
Find node by indexed property	O(log N)	O(log N)
Find node by unindexed property	O(N)	O(N)
Get node’s relationships	O(D)	O(log N)
Traverse 2 hops	O(D²)	O(N²)
Traverse 3 hops	O(D³)	O(N³)
Shortest path (BFS, depth L)	O(D^L)	O(N^L)

Typical Values:

N = 1,000,000 (total nodes)
D = 50 (average degree)

3-hop traversal:

Neo4j: 50³ = 125,000
Relational: (10⁶)³ = 10¹⁸ (not feasible!)

Setup:

1M users
50M friendships (average 50 friends/user)

Query 1: Direct friends

MATCH (u:User {id: 12345})-[:FRIEND]->(friend)
RETURN friend.name

Database	Time
Neo4j	5ms
PostgreSQL	20ms

Query 2: Friends of friends

MATCH (u:User {id: 12345})-[:FRIEND*2]->(fof)
RETURN fof.name

Database	Time
Neo4j	50ms
PostgreSQL	5,000ms

Query 3: Friends up to 3 hops

MATCH (u:User {id: 12345})-[:FRIEND*1..3]->(friend)
RETURN DISTINCT friend.name

Database	Time
Neo4j	200ms
PostgreSQL	Timeout (> 60s)

Key Takeaway: Neo4j shines for deep traversals (2+ hops).

Part 8: Hands-On Exercises

Exercise 1: Explore Store Files

Task: Inspect Neo4j store files and calculate sizes

# Navigate to Neo4j data directory
cd <NEO4J_HOME>/data/databases/neo4j/

# List store files
ls -lh neostore.*

# Example output:
# -rw-r--r-- neostore.nodestore.db          150M
# -rw-r--r-- neostore.relationshipstore.db  2.1G
# -rw-r--r-- neostore.propertystore.db      500M

Questions:

How many nodes? 150M / 15 bytes = 10M nodes
How many relationships? 2.1G / 34 bytes ≈ 61.7M relationships
Average degree? 61.7M × 2 / 10M ≈ 12.3

Exercise 2: Analyze Query Plans

Query:

PROFILE
MATCH (p:Person {name: "Alice"})-[:KNOWS*2]->(fof)
RETURN fof.name

Tasks:

Identify bottleneck operator (highest DB Hits)
Check if index is used (NodeIndexSeek vs NodeByLabelScan)
Optimize query (add indexes, limit results)

Exercise 3: Measure Page Cache Hit Rate

// Run query multiple times
MATCH (p:Person)-[:KNOWS]->(friend)
RETURN count(friend)

// Check hit rate
CALL dbms.queryJmx("org.neo4j:instance=kernel#0,name=Page cache")
YIELD attributes
RETURN attributes.hits, attributes.faults,
       (toFloat(attributes.hits) / (attributes.hits + attributes.faults)) AS hit_rate

Expected: Hit rate increases on subsequent runs (data cached)

Summary

Native Graph Storage:

Fixed-size records (O(1) access by ID)
Pointers for relationships (no JOINs!)
Index-free adjacency (O(D) traversal, not O(N))

Indexes:

Label indexes (filter by node type)
Property indexes (B+tree, O(log N))
Full-text indexes (Lucene)
Vector indexes (HNSW for ANN)

Transactions:

Full ACID guarantees
Write-Ahead Log (durability)
Pessimistic locking (read/write locks)
Deadlock detection

Performance:

10-1000x faster than relational for traversals
Scales with degree (D), not graph size (N)
Page cache critical for performance (aim > 95% hit rate)

What’s Next?

Module 4: Cypher Query Language Mastery

Master Cypher syntax, pattern matching, aggregations, and advanced query techniques

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Neo4j Architecture & Native Graph Storage

​Introduction: What Makes Neo4j “Native”?

​Part 1: High-Level Architecture

​System Layering

​Core Components

​Part 2: Storage Layout

​Store Files

​Node Store

​Relationship Store

​Property Store

​String Store

​Dense Node Optimization

​Part 3: Indexes

​Label Indexes (Default)

​Property Indexes

​Composite Indexes

​Full-Text Indexes

​Vector Indexes (Neo4j 5.x)

​Part 4: Transaction Management

​ACID Guarantees

​Write-Ahead Log (WAL)

​Locking

​Isolation Levels

​Part 5: Page Cache

​Memory Architecture

​Page Cache Structure

​Monitoring Page Cache

​Part 6: Query Execution

​Explain and Profile

​Query Operators

​Example Query Analysis

​Optimization Tips

​Part 7: Performance Characteristics

​Time Complexity

​Benchmark: Social Network Queries

​Part 8: Hands-On Exercises

​Exercise 1: Explore Store Files

​Exercise 2: Analyze Query Plans

Neo4j Architecture & Native Graph Storage

Introduction: What Makes Neo4j “Native”?

Part 1: High-Level Architecture

System Layering

Core Components

Part 2: Storage Layout

Store Files

Node Store

Relationship Store

Property Store

String Store

Dense Node Optimization

Part 3: Indexes

Label Indexes (Default)

Property Indexes

Composite Indexes

Full-Text Indexes

Vector Indexes (Neo4j 5.x)

Part 4: Transaction Management

ACID Guarantees

Write-Ahead Log (WAL)

Locking

Isolation Levels

Part 5: Page Cache

Memory Architecture

Page Cache Structure

Monitoring Page Cache

Part 6: Query Execution

Explain and Profile

Query Operators

Example Query Analysis

Optimization Tips

Part 7: Performance Characteristics

Time Complexity

Benchmark: Social Network Queries

Part 8: Hands-On Exercises

Exercise 1: Explore Store Files

Exercise 2: Analyze Query Plans

Exercise 3: Measure Page Cache Hit Rate

Summary

What’s Next?