> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Graph Data Modeling Best Practices

> Design efficient graph schemas, handle many-to-many relationships, refactor from relational models, and optimize for query patterns

# Graph Data Modeling Best Practices

<Info>
  **Module Duration**: 5-6 hours
  **Learning Style**: Pattern-Based + Refactoring Examples + Real-World Schemas
  **Outcome**: Design graph models that match query patterns and perform at scale
</Info>

## The Golden Rule

**In relational databases**: Normalize first, query later
**In graph databases**: **Design for your queries**

Graph modeling is **query-driven**. Start with:

1. What questions do I need to answer?
2. Design graph structure to answer them efficiently

***

## Part 1: From Relational to Graph

### Example: Social Network

**Relational Schema**:

```sql theme={null}
users(id, name, email, city)
friendships(user_id, friend_id, since)
```

**Problem**: Finding friends-of-friends requires self-join:

```sql theme={null}
SELECT u2.name
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN users u2 ON f2.friend_id = u2.id
WHERE f1.user_id = 123;
```

**Graph Model**:

```cypher theme={null}
(:User {name, email, city})-[:FRIENDS_WITH {since}]->(:User)
```

**Query**:

```cypher theme={null}
MATCH (user:User {id: 123})-[:FRIENDS_WITH*2]->(fof)
RETURN fof.name
```

**Benefit**: No JOINs, natural traversal, 100x faster!

***

## Part 2: Modeling Patterns

### Pattern 1: Entities as Nodes

**Rule**: Domain entities become nodes

**Example**: E-commerce

```cypher theme={null}
(user:User {id, name, email})
(product:Product {id, name, price})
(order:Order {id, date, total})
```

**Anti-Pattern**: Storing lists as properties

```cypher theme={null}
-- Bad: Storing friends as array
(:User {name: "Alice", friends: ["Bob", "Charlie"]})

-- Good: Relationships
(:User {name: "Alice"})-[:KNOWS]->(:User {name: "Bob"})
```

### Pattern 2: Relationships Capture Connections

**Rule**: Relationships represent actions, associations, or hierarchies

**Examples**:

```cypher theme={null}
(user)-[:PURCHASED]->(product)
(employee)-[:WORKS_FOR]->(company)
(post)-[:TAGGED_WITH]->(tag)
(file)-[:PARENT_OF]->(subdirectory)
```

**When to use relationships vs properties**:

| Use Relationship      | Use Property            |
| --------------------- | ----------------------- |
| Connects two entities | Describes single entity |
| Many-to-many          | One-to-one attribute    |
| Traversable           | Filterable value        |
| Example: KNOWS        | Example: name, age      |

### Pattern 3: Relationship Properties

**Use Case**: Metadata about connections

```cypher theme={null}
(alice)-[:KNOWS {since: 2015, strength: 0.8}]->(bob)
(user)-[:RATED {score: 5, timestamp: datetime()}]->(movie)
(employee)-[:WORKS_AT {role: "Engineer", start_date: date()}]->(company)
```

### Pattern 4: Intermediate Nodes

**Problem**: Relationships can't have relationships!

**Example**: User enrolls in course on a specific date, gets a grade

```cypher theme={null}
-- Can't do: relationship properties link to another node
(user)-[:ENROLLED {date, grade}]->(course)

-- Solution: Intermediate node
(user)-[:ENROLLED]->(enrollment:Enrollment {date, grade})-[:IN_COURSE]->(course)
```

**Real-World Example**: Order line items

```cypher theme={null}
(user)-[:PLACED]->(order:Order)-[:CONTAINS]->(lineItem:LineItem {quantity, price})-[:FOR_PRODUCT]->(product)
```

### Pattern 5: Multiple Labels

**Use Case**: Entity belongs to multiple categories

```cypher theme={null}
CREATE (alice:Person:Customer:VIP {name: "Alice"})
CREATE (bob:Person:Employee:Manager {name: "Bob"})
```

**Query by specific role**:

```cypher theme={null}
MATCH (vip:VIP)
RETURN vip.name
```

**Benefits**:

* Fine-grained querying
* Index optimization (indexes per label)

***

## Part 3: Time-Series and Versioning

### Pattern 6: Time-Series Events

**Example**: User actions timeline

```cypher theme={null}
(user:User)-[:PERFORMED]->(event:Event {type: "login", timestamp: datetime()})
(user)-[:PERFORMED]->(event2:Event {type: "purchase", timestamp: datetime(), amount: 99.99})
```

**Query**: Recent events

```cypher theme={null}
MATCH (user:User {id: 123})-[:PERFORMED]->(event:Event)
WHERE event.timestamp > datetime() - duration({days: 7})
RETURN event
ORDER BY event.timestamp DESC
```

### Pattern 7: Versioning (Bi-Temporal Model)

**Scenario**: Track changes over time (audit trail)

```cypher theme={null}
// Version 1 (original)
(product:Product {id: 123})-[:VERSION {valid_from: date("2020-01-01"), valid_to: date("2021-01-01")}]->
  (v1:ProductVersion {price: 10.00, name: "Widget"})

// Version 2 (price change)
(product)-[:VERSION {valid_from: date("2021-01-01"), valid_to: null}]->
  (v2:ProductVersion {price: 12.00, name: "Widget"})
```

**Query**: Price at specific date

```cypher theme={null}
MATCH (product:Product {id: 123})-[v:VERSION]->(version)
WHERE date("2020-06-15") >= v.valid_from AND (v.valid_to IS NULL OR date("2020-06-15") < v.valid_to)
RETURN version.price
```

***

## Part 4: Handling Hierarchies

### Pattern 8: Tree Structures

**Example**: File system

```cypher theme={null}
(root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})
(home)-[:CONTAINS]->(user:Folder {name: "alice"})
(user)-[:CONTAINS]->(file:File {name: "document.txt"})
```

**Query**: All files under /home

```cypher theme={null}
MATCH (root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})-[:CONTAINS*]->(item)
RETURN item.name
```

### Pattern 9: Hierarchies with Shortcuts

**Problem**: Deep hierarchies slow down queries

**Solution**: Add shortcut relationships

```cypher theme={null}
// Full path
(company)-[:CHILD]->(division)-[:CHILD]->(dept)-[:CHILD]->(team)-[:CHILD]->(employee)

// Add shortcuts
(company)-[:ALL_EMPLOYEES]->(employee)
```

**Query**:

```cypher theme={null}
// Fast: Direct relationship
MATCH (company:Company {name: "Acme"})-[:ALL_EMPLOYEES]->(emp)
RETURN count(emp)

// Slow: Deep traversal
MATCH (company:Company {name: "Acme"})-[:CHILD*]->(emp:Employee)
RETURN count(emp)
```

***

## Part 5: Many-to-Many Relationships

### Pattern 10: Tags and Categories

**Example**: Blog posts with tags

```cypher theme={null}
(post:Post {title: "Graph Databases 101"})-[:TAGGED]->(tag:Tag {name: "databases"})
(post)-[:TAGGED]->(tag2:Tag {name: "nosql"})
```

**Query**: Posts with specific tag

```cypher theme={null}
MATCH (post:Post)-[:TAGGED]->(tag:Tag {name: "databases"})
RETURN post.title
```

**Query**: Posts with multiple tags (AND)

```cypher theme={null}
MATCH (post:Post)-[:TAGGED]->(tag1:Tag {name: "databases"})
MATCH (post)-[:TAGGED]->(tag2:Tag {name: "nosql"})
RETURN post.title
```

### Pattern 11: User Roles and Permissions

```cypher theme={null}
(user:User)-[:HAS_ROLE]->(role:Role {name: "Admin"})
(role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})
```

**Query**: Check if user has permission

```cypher theme={null}
MATCH (user:User {id: 123})-[:HAS_ROLE]->(:Role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})
RETURN count(perm) > 0 AS has_permission
```

***

## Part 6: Modeling Anti-Patterns

### Anti-Pattern 1: Dense Nodes

**Problem**: Node with millions of relationships (celebrity with 10M followers)

**Issue**: Traversing all relationships is slow

**Solution**:

1. **Fan-out to intermediate nodes**:

```cypher theme={null}
(celebrity)-[:HAS_FOLLOWERS]->(bucket1:FollowerBucket)-[:CONTAINS]->(follower1)
(celebrity)-[:HAS_FOLLOWERS]->(bucket2:FollowerBucket)-[:CONTAINS]->(follower2)
```

2. **Use properties for aggregates**:

```cypher theme={null}
(celebrity {follower_count: 10000000})
```

### Anti-Pattern 2: Redundant Relationships

**Problem**: Same information as properties

**Example**:

```cypher theme={null}
-- Bad: Redundant
(user {city: "NYC"})-[:LIVES_IN]->(city:City {name: "NYC"})

-- Good: Pick one approach
(user)-[:LIVES_IN]->(city:City {name: "NYC"})  // If city has other properties
OR
(user {city: "NYC"})  // If city is just a string
```

### Anti-Pattern 3: Property Explosion

**Problem**: Too many properties on a node

```cypher theme={null}
-- Bad
(product {name, price, weight, height, width, color, material, manufacturer, ...})  // 50+ properties

-- Good: Use labels and relationships
(product:Product {name, price})
(product)-[:HAS_DIMENSIONS]->(dims:Dimensions {height, width, weight})
(product)-[:MANUFACTURED_BY]->(company:Company)
```

***

## Part 7: Real-World Examples

### Example 1: Social Media Platform

**Requirements**:

* Users post content
* Users follow each other
* Posts have likes and comments
* Posts are tagged

**Model**:

```cypher theme={null}
(user:User {id, name, email, joined: date()})
(post:Post {id, content, timestamp: datetime()})
(comment:Comment {id, text, timestamp: datetime()})
(tag:Tag {name})

(user)-[:POSTED]->(post)
(user)-[:FOLLOWS {since: date()}]->(user2:User)
(user)-[:LIKED {timestamp: datetime()}]->(post)
(user)-[:COMMENTED]->(comment)-[:ON]->(post)
(post)-[:TAGGED_WITH]->(tag)
```

**Queries**:

1. **Newsfeed**: Posts from people I follow

```cypher theme={null}
MATCH (me:User {id: 123})-[:FOLLOWS]->(friend)-[:POSTED]->(post)
RETURN post
ORDER BY post.timestamp DESC
LIMIT 20
```

2. **Popular posts**: Most liked

```cypher theme={null}
MATCH (post:Post)<-[like:LIKED]-()
WITH post, count(like) AS likes
ORDER BY likes DESC
LIMIT 10
RETURN post.content, likes
```

3. **Trending tags**: Most used in last 24 hours

```cypher theme={null}
MATCH (post:Post)-[:TAGGED_WITH]->(tag:Tag)
WHERE post.timestamp > datetime() - duration({hours: 24})
WITH tag, count(post) AS usage
ORDER BY usage DESC
LIMIT 10
RETURN tag.name, usage
```

### Example 2: Recommendation Engine

**Model**:

```cypher theme={null}
(user:User)-[:PURCHASED]->(product:Product)
(product)-[:IN_CATEGORY]->(category:Category)
(product)-[:SIMILAR_TO {score: 0.85}]->(product2:Product)
```

**Query**: Recommend products

```cypher theme={null}
// Collaborative filtering: Users like you also bought
MATCH (me:User {id: 123})-[:PURCHASED]->(product:Product)
MATCH (product)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (me)-[:PURCHASED]->(rec)
WITH rec, count(DISTINCT other) AS score
ORDER BY score DESC
LIMIT 10
RETURN rec.name, score
```

### Example 3: Knowledge Graph

**Model**:

```cypher theme={null}
(entity:Entity {name, type})
(entity)-[:RELATED_TO {relationship: "founded", date: date()}]->(entity2:Entity)
```

**Example Data**:

```cypher theme={null}
CREATE (steve:Person {name: "Steve Jobs"})
CREATE (apple:Company {name: "Apple"})
CREATE (iphone:Product {name: "iPhone"})
CREATE (steve)-[:FOUNDED {year: 1976}]->(apple)
CREATE (steve)-[:INVENTED]->(iphone)
CREATE (apple)-[:PRODUCES]->(iphone)
```

**Query**: What did Steve Jobs create?

```cypher theme={null}
MATCH (steve:Person {name: "Steve Jobs"})-[r]->(thing)
RETURN type(r) AS relationship, thing.name
```

***

## Part 8: Refactoring Strategies

### Refactoring 1: From Properties to Relationships

**Before**:

```cypher theme={null}
(user:User {city: "NYC", country: "USA"})
```

**After** (if cities have more data):

```cypher theme={null}
(user:User)-[:LIVES_IN]->(city:City {name: "NYC"})-[:IN_COUNTRY]->(country:Country {name: "USA"})
```

**When to refactor**: When you need to query by location, aggregate by city, or add city-specific properties.

### Refactoring 2: Adding Intermediate Nodes

**Before**:

```cypher theme={null}
(user)-[:ENROLLED {grade: "A", semester: "Fall 2023"}]->(course)
```

**After**:

```cypher theme={null}
(user)-[:ENROLLED]->(enrollment:Enrollment {grade: "A", semester: "Fall 2023"})-[:IN_COURSE]->(course)
```

**Benefits**: Can add relationships to enrollment (teacher, classroom, etc.)

### Refactoring 3: Denormalization for Performance

**Before** (normalized):

```cypher theme={null}
MATCH (post:Post)<-[:LIKED]-(user)
RETURN post.title, count(user) AS likes  // Count at query time (slow!)
```

**After** (denormalized):

```cypher theme={null}
// Update like_count on every like
MATCH (post:Post {id: 123})
SET post.like_count = post.like_count + 1

// Query (fast!)
MATCH (post:Post)
RETURN post.title, post.like_count
ORDER BY post.like_count DESC
```

**Trade-off**: Faster reads, slower writes (acceptable for read-heavy workloads)

***

## Summary

**Design Principles**:

1. **Query-driven**: Start with questions, design graph to answer them
2. **Entities → Nodes**: Domain objects become nodes
3. **Connections → Relationships**: Actions, associations, hierarchies
4. **Intermediate nodes**: When relationships need relationships
5. **Denormalize**: Duplicate data for read performance

**Anti-Patterns to Avoid**:

* Dense nodes (millions of relationships)
* Redundant relationships
* Property explosion

**Next**: Apply these patterns to graph algorithms!

***

## What's Next?

<Card title="Module 6: Graph Algorithms" icon="project-diagram" href="/distributed-systems-tools/neo4j-graph-algorithms">
  Implement PageRank, community detection, shortest paths, and centrality algorithms at scale
</Card>
