Most databases were built for tables, then retrofitted for graphs. Neo4j was designed from day one for graphs.Key Difference: Index-free adjacency
Copy
Traditional (PostgreSQL with graph extension):MATCH (a:Person)-[:KNOWS]->(b)→ Index scan for Person nodes→ Index scan for KNOWS relationships→ JOIN on IDs→ O(log N) per hopNeo4j (Native Graph):MATCH (a:Person)-[:KNOWS]->(b)→ Follow pointer from node to relationship to node→ O(1) per hop (constant time, regardless of graph size!)
Performance: 1000x faster for multi-hop traversals!This module reveals how Neo4j achieves this at the storage level.
┌────────┬────────────┬────────────┬────────────┬────────┐│ In Use │ Next Rel │ Next Prop │ Labels │ Extra ││ 1 bit │ 35 bits │ 36 bits │ 5 bytes │ ... │└────────┴────────────┴────────────┴────────────┴────────┘
Fields:
In Use (1 bit): Is this record active? (0 = deleted, 1 = active)
Next Rel (35 bits): Pointer to first relationship (ID)
Next Prop (36 bits): Pointer to first property (ID)
Labels (5 bytes): Inline label storage or pointer to label array
Example:Node with ID = 42:
Copy
Record at offset: 42 × 15 bytes = 630 bytes into neostore.nodestore.db┌─────┬─────────┬─────────┬──────────────┐│ 1 │ 10050 │ 20010 │ 0x0001 │ (Person label)└─────┴─────────┴─────────┴──────────────┘
Relationship Record Format (34 bytes in Neo4j 4.x):
Copy
┌────────┬────────┬────────┬─────────┬──────────┬──────────┬──────────┬──────────┬────────┐│ In Use │ First │ Second │ Rel │ First │ First │ Second │ Second │ Next ││ │ Node │ Node │ Type │ Prev Rel │ Next Rel │ Prev Rel │ Next Rel │ Prop ││ 1 bit │ 35 bits│ 35 bits│ 16 bits │ 35 bits │ 35 bits │ 35 bits │ 35 bits │ 36 bits│└────────┴────────┴────────┴─────────┴──────────┴──────────┴──────────┴──────────┴────────┘
Fields:
First Node: Source node ID
Second Node: Target node ID
Rel Type: Relationship type ID (index into relationship type store)
First Prev/Next Rel: Doubly-linked list for source node’s relationships
Second Prev/Next Rel: Doubly-linked list for target node’s relationships
Next Prop: Pointer to first property
Why Doubly-Linked Lists?Each node maintains a linked list of its relationships:
Copy
Node A (ID=10): ↓ Next Rel = 100Relationship 100: (A)-[:KNOWS]->(B) ├─ First Node = 10 (A) ├─ Second Node = 20 (B) ├─ First Next Rel = 101 ← Next relationship for A └─ Second Next Rel = 200 ← Next relationship for BRelationship 101: (A)-[:LIKES]->(C) ├─ First Node = 10 (A) ├─ Second Node = 30 (C) └─ First Next Rel = NULL ← End of A's relationships
Traversal:To find all relationships for Node A:
Read Node A’s record → get Next Rel = 100
Read Relationship 100 → check if First Node == A or Second Node == A
Follow appropriate Next Rel pointer (100 → 101 → NULL)
Time Complexity: O(D), where D = degree of node (number of relationships)Critical Insight: Independent of total graph size!
In relational DB: O(log N) per relationship (index scan across all edges)
In Neo4j: O(D) (only scan node’s own relationships)
┌────────┬─────────┬─────────┬──────────┬───────────┐│ In Use │ Type │ Key ID │ Value │ Next Prop ││ 1 bit │ 4 bits │ 24 bits │ 64 bits │ 36 bits │└────────┴─────────┴─────────┴──────────┴───────────┘
Property Types:
Type
Value Storage
bool
Inline (1 bit in value field)
int
Inline (up to 64 bits)
float
Inline (IEEE 754, 64 bits)
short string
Inline (up to 12 characters encoded)
long string
Pointer to string store
array
Pointer to array store
Property Chain:Properties form a singly-linked list:
Copy
Node A: Next Prop = 500Property 500: {name: "Alice"} ├─ Key ID = 1 (points to "name" in property key store) ├─ Value = pointer to string store → "Alice" └─ Next Prop = 501Property 501: {age: 30} ├─ Key ID = 2 ("age") ├─ Value = 30 (inline int) └─ Next Prop = NULL
Problem: Nodes with many relationships (high degree) slow down traversals.Example: Celebrity node with 1M followers
Copy
Without optimization:MATCH (celeb)-[:FOLLOWED_BY]->(follower)→ Scan 1M relationship records to find FOLLOWED_BY relationships!
Solution: Relationship Group Store (introduced Neo4j 2.1)When a node exceeds a threshold (default: 50 relationships), Neo4j creates relationship type groups:
Copy
Node (Celebrity): Next Rel = NULL Next Rel Group = 7000 ← Points to group storeRelationship Group 7000: ├─ Type = FOLLOWED_BY ├─ First Outgoing = 8000 ├─ First Incoming = 9000 ├─ First Loop = NULL └─ Next Group = 7001Relationship Group 7001: ├─ Type = FOLLOWS ├─ First Outgoing = 10000 ├─ ...
Traversal:
Copy
MATCH (celeb)-[:FOLLOWED_BY]->(follower)
Process:
Read celeb node → Get Next Rel Group = 7000
Scan relationship groups for FOLLOWED_BY (typically < 10 types)
Follow First Outgoing pointer
Scan only FOLLOWED_BY relationships (not all 1M!)
Performance: From O(1M) to O(types) + O(followers of type) = much faster!
CREATE INDEX person_name FOR (p:Person) ON (p.name)
Index Structure (Neo4j 4.x): Native Index (custom B+tree variant)Index File: schema/index/lucene-*/1/ (despite “lucene” in path, it’s native in 4.x+)B+Tree Structure:
CREATE VECTOR INDEX user_embedding FOR (u:User) ON (u.embedding)OPTIONS {dimension: 256, similarity: 'cosine'}
Query (Nearest neighbor search):
Copy
MATCH (u:User)WHERE u.embedding SIMILAR TO $query_vectorRETURN u.name, vector.similarity(u.embedding, $query_vector) AS scoreORDER BY score DESCLIMIT 10
Backend: HNSW (Hierarchical Navigable Small World) graph for approximate nearest neighbor (ANN) searchUse Cases:
-- Bad (full scan)MATCH (p:Person)WHERE p.name = "Alice"-- Good (index seek)CREATE INDEX FOR (p:Person) ON (p.name)MATCH (p:Person {name: "Alice"})
2. Filter Early:
Copy
-- Bad (expand then filter)MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend)WHERE friend.age > 25-- Good (filter during expand, if possible)MATCH (p:Person {name: "Alice"})-[:KNOWS]->(friend:Person)WHERE friend.age > 25
Task: Inspect Neo4j store files and calculate sizes
Copy
# Navigate to Neo4j data directorycd <NEO4J_HOME>/data/databases/neo4j/# List store filesls -lh neostore.*# Example output:# -rw-r--r-- neostore.nodestore.db 150M# -rw-r--r-- neostore.relationshipstore.db 2.1G# -rw-r--r-- neostore.propertystore.db 500M
Questions:
How many nodes? 150M / 15 bytes = 10M nodes
How many relationships? 2.1G / 34 bytes ≈ 61.7M relationships