Graph Data Modeling Best Practices
The Golden Rule
Part 1: From Relational to Graph
Example: Social Network
Part 2: Modeling Patterns
Pattern 1: Entities as Nodes
Pattern 2: Relationships Capture Connections
Pattern 3: Relationship Properties
Pattern 4: Intermediate Nodes
Pattern 5: Multiple Labels
Part 3: Time-Series and Versioning
Pattern 6: Time-Series Events
Pattern 7: Versioning (Bi-Temporal Model)
Part 4: Handling Hierarchies
Pattern 8: Tree Structures
Pattern 9: Hierarchies with Shortcuts
Part 5: Many-to-Many Relationships
Pattern 10: Tags and Categories
Pattern 11: User Roles and Permissions
Part 6: Modeling Anti-Patterns
Anti-Pattern 1: Dense Nodes
Anti-Pattern 2: Redundant Relationships
Anti-Pattern 3: Property Explosion
Part 7: Real-World Examples
Example 1: Social Media Platform
Example 2: Recommendation Engine
Example 3: Knowledge Graph
Part 8: Refactoring Strategies
Refactoring 1: From Properties to Relationships
Refactoring 2: Adding Intermediate Nodes
Refactoring 3: Denormalization for Performance
Summary
What’s Next?

Graph Data Modeling Best Practices

Module Duration: 5-6 hours Learning Style: Pattern-Based + Refactoring Examples + Real-World Schemas Outcome: Design graph models that match query patterns and perform at scale

The Golden Rule

In relational databases: Normalize first, query later In graph databases: Design for your queries Graph modeling is query-driven. Start with:

What questions do I need to answer?
Design graph structure to answer them efficiently

Part 1: From Relational to Graph

Relational Schema:

users(id, name, email, city)
friendships(user_id, friend_id, since)

Problem: Finding friends-of-friends requires self-join:

SELECT u2.name
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN users u2 ON f2.friend_id = u2.id
WHERE f1.user_id = 123;

Graph Model:

(:User {name, email, city})-[:FRIENDS_WITH {since}]->(:User)

Query:

MATCH (user:User {id: 123})-[:FRIENDS_WITH*2]->(fof)
RETURN fof.name

Benefit: No JOINs, natural traversal, 100x faster!

Part 2: Modeling Patterns

Pattern 1: Entities as Nodes

Rule: Domain entities become nodes Example: E-commerce

(user:User {id, name, email})
(product:Product {id, name, price})
(order:Order {id, date, total})

Anti-Pattern: Storing lists as properties

-- Bad: Storing friends as array
(:User {name: "Alice", friends: ["Bob", "Charlie"]})

-- Good: Relationships
(:User {name: "Alice"})-[:KNOWS]->(:User {name: "Bob"})

Pattern 2: Relationships Capture Connections

Rule: Relationships represent actions, associations, or hierarchies Examples:

(user)-[:PURCHASED]->(product)
(employee)-[:WORKS_FOR]->(company)
(post)-[:TAGGED_WITH]->(tag)
(file)-[:PARENT_OF]->(subdirectory)

When to use relationships vs properties:

Use Relationship	Use Property
Connects two entities	Describes single entity
Many-to-many	One-to-one attribute
Traversable	Filterable value
Example: KNOWS	Example: name, age

Pattern 3: Relationship Properties

Use Case: Metadata about connections

(alice)-[:KNOWS {since: 2015, strength: 0.8}]->(bob)
(user)-[:RATED {score: 5, timestamp: datetime()}]->(movie)
(employee)-[:WORKS_AT {role: "Engineer", start_date: date()}]->(company)

Pattern 4: Intermediate Nodes

Problem: Relationships can’t have relationships! Example: User enrolls in course on a specific date, gets a grade

-- Can't do: relationship properties link to another node
(user)-[:ENROLLED {date, grade}]->(course)

-- Solution: Intermediate node
(user)-[:ENROLLED]->(enrollment:Enrollment {date, grade})-[:IN_COURSE]->(course)

Real-World Example: Order line items

(user)-[:PLACED]->(order:Order)-[:CONTAINS]->(lineItem:LineItem {quantity, price})-[:FOR_PRODUCT]->(product)

Pattern 5: Multiple Labels

Use Case: Entity belongs to multiple categories

CREATE (alice:Person:Customer:VIP {name: "Alice"})
CREATE (bob:Person:Employee:Manager {name: "Bob"})

Query by specific role:

MATCH (vip:VIP)
RETURN vip.name

Benefits:

Fine-grained querying
Index optimization (indexes per label)

Part 3: Time-Series and Versioning

Pattern 6: Time-Series Events

Example: User actions timeline

(user:User)-[:PERFORMED]->(event:Event {type: "login", timestamp: datetime()})
(user)-[:PERFORMED]->(event2:Event {type: "purchase", timestamp: datetime(), amount: 99.99})

Query: Recent events

MATCH (user:User {id: 123})-[:PERFORMED]->(event:Event)
WHERE event.timestamp > datetime() - duration({days: 7})
RETURN event
ORDER BY event.timestamp DESC

Pattern 7: Versioning (Bi-Temporal Model)

Scenario: Track changes over time (audit trail)

// Version 1 (original)
(product:Product {id: 123})-[:VERSION {valid_from: date("2020-01-01"), valid_to: date("2021-01-01")}]->
  (v1:ProductVersion {price: 10.00, name: "Widget"})

// Version 2 (price change)
(product)-[:VERSION {valid_from: date("2021-01-01"), valid_to: null}]->
  (v2:ProductVersion {price: 12.00, name: "Widget"})

Query: Price at specific date

MATCH (product:Product {id: 123})-[v:VERSION]->(version)
WHERE date("2020-06-15") >= v.valid_from AND (v.valid_to IS NULL OR date("2020-06-15") < v.valid_to)
RETURN version.price

Part 4: Handling Hierarchies

Pattern 8: Tree Structures

Example: File system

(root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})
(home)-[:CONTAINS]->(user:Folder {name: "alice"})
(user)-[:CONTAINS]->(file:File {name: "document.txt"})

Query: All files under /home

MATCH (root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})-[:CONTAINS*]->(item)
RETURN item.name

Pattern 9: Hierarchies with Shortcuts

Problem: Deep hierarchies slow down queries Solution: Add shortcut relationships

// Full path
(company)-[:CHILD]->(division)-[:CHILD]->(dept)-[:CHILD]->(team)-[:CHILD]->(employee)

// Add shortcuts
(company)-[:ALL_EMPLOYEES]->(employee)

Query:

// Fast: Direct relationship
MATCH (company:Company {name: "Acme"})-[:ALL_EMPLOYEES]->(emp)
RETURN count(emp)

// Slow: Deep traversal
MATCH (company:Company {name: "Acme"})-[:CHILD*]->(emp:Employee)
RETURN count(emp)

Part 5: Many-to-Many Relationships

Pattern 10: Tags and Categories

Example: Blog posts with tags

(post:Post {title: "Graph Databases 101"})-[:TAGGED]->(tag:Tag {name: "databases"})
(post)-[:TAGGED]->(tag2:Tag {name: "nosql"})

Query: Posts with specific tag

MATCH (post:Post)-[:TAGGED]->(tag:Tag {name: "databases"})
RETURN post.title

Query: Posts with multiple tags (AND)

MATCH (post:Post)-[:TAGGED]->(tag1:Tag {name: "databases"})
MATCH (post)-[:TAGGED]->(tag2:Tag {name: "nosql"})
RETURN post.title

Pattern 11: User Roles and Permissions

(user:User)-[:HAS_ROLE]->(role:Role {name: "Admin"})
(role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})

Query: Check if user has permission

MATCH (user:User {id: 123})-[:HAS_ROLE]->(:Role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})
RETURN count(perm) > 0 AS has_permission

Part 6: Modeling Anti-Patterns

Anti-Pattern 1: Dense Nodes

Problem: Node with millions of relationships (celebrity with 10M followers) Issue: Traversing all relationships is slow Solution:

Fan-out to intermediate nodes:

(celebrity)-[:HAS_FOLLOWERS]->(bucket1:FollowerBucket)-[:CONTAINS]->(follower1)
(celebrity)-[:HAS_FOLLOWERS]->(bucket2:FollowerBucket)-[:CONTAINS]->(follower2)

Use properties for aggregates:

(celebrity {follower_count: 10000000})

Anti-Pattern 2: Redundant Relationships

Problem: Same information as properties Example:

-- Bad: Redundant
(user {city: "NYC"})-[:LIVES_IN]->(city:City {name: "NYC"})

-- Good: Pick one approach
(user)-[:LIVES_IN]->(city:City {name: "NYC"})  // If city has other properties
OR
(user {city: "NYC"})  // If city is just a string

Anti-Pattern 3: Property Explosion

Problem: Too many properties on a node

-- Bad
(product {name, price, weight, height, width, color, material, manufacturer, ...})  // 50+ properties

-- Good: Use labels and relationships
(product:Product {name, price})
(product)-[:HAS_DIMENSIONS]->(dims:Dimensions {height, width, weight})
(product)-[:MANUFACTURED_BY]->(company:Company)

Part 7: Real-World Examples

Requirements:

Users post content
Users follow each other
Posts have likes and comments
Posts are tagged

Model:

(user:User {id, name, email, joined: date()})
(post:Post {id, content, timestamp: datetime()})
(comment:Comment {id, text, timestamp: datetime()})
(tag:Tag {name})

(user)-[:POSTED]->(post)
(user)-[:FOLLOWS {since: date()}]->(user2:User)
(user)-[:LIKED {timestamp: datetime()}]->(post)
(user)-[:COMMENTED]->(comment)-[:ON]->(post)
(post)-[:TAGGED_WITH]->(tag)

Queries:

Newsfeed: Posts from people I follow

MATCH (me:User {id: 123})-[:FOLLOWS]->(friend)-[:POSTED]->(post)
RETURN post
ORDER BY post.timestamp DESC
LIMIT 20

Popular posts: Most liked

MATCH (post:Post)<-[like:LIKED]-()
WITH post, count(like) AS likes
ORDER BY likes DESC
LIMIT 10
RETURN post.content, likes

Trending tags: Most used in last 24 hours

MATCH (post:Post)-[:TAGGED_WITH]->(tag:Tag)
WHERE post.timestamp > datetime() - duration({hours: 24})
WITH tag, count(post) AS usage
ORDER BY usage DESC
LIMIT 10
RETURN tag.name, usage

Example 2: Recommendation Engine

Model:

(user:User)-[:PURCHASED]->(product:Product)
(product)-[:IN_CATEGORY]->(category:Category)
(product)-[:SIMILAR_TO {score: 0.85}]->(product2:Product)

Query: Recommend products

// Collaborative filtering: Users like you also bought
MATCH (me:User {id: 123})-[:PURCHASED]->(product:Product)
MATCH (product)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (me)-[:PURCHASED]->(rec)
WITH rec, count(DISTINCT other) AS score
ORDER BY score DESC
LIMIT 10
RETURN rec.name, score

Example 3: Knowledge Graph

Model:

(entity:Entity {name, type})
(entity)-[:RELATED_TO {relationship: "founded", date: date()}]->(entity2:Entity)

Example Data:

CREATE (steve:Person {name: "Steve Jobs"})
CREATE (apple:Company {name: "Apple"})
CREATE (iphone:Product {name: "iPhone"})
CREATE (steve)-[:FOUNDED {year: 1976}]->(apple)
CREATE (steve)-[:INVENTED]->(iphone)
CREATE (apple)-[:PRODUCES]->(iphone)

Query: What did Steve Jobs create?

MATCH (steve:Person {name: "Steve Jobs"})-[r]->(thing)
RETURN type(r) AS relationship, thing.name

Part 8: Refactoring Strategies

Refactoring 1: From Properties to Relationships

Before:

(user:User {city: "NYC", country: "USA"})

After (if cities have more data):

(user:User)-[:LIVES_IN]->(city:City {name: "NYC"})-[:IN_COUNTRY]->(country:Country {name: "USA"})

When to refactor: When you need to query by location, aggregate by city, or add city-specific properties.

Refactoring 2: Adding Intermediate Nodes

Before:

(user)-[:ENROLLED {grade: "A", semester: "Fall 2023"}]->(course)

After:

(user)-[:ENROLLED]->(enrollment:Enrollment {grade: "A", semester: "Fall 2023"})-[:IN_COURSE]->(course)

Benefits: Can add relationships to enrollment (teacher, classroom, etc.)

Refactoring 3: Denormalization for Performance

Before (normalized):

MATCH (post:Post)<-[:LIKED]-(user)
RETURN post.title, count(user) AS likes  // Count at query time (slow!)

After (denormalized):

// Update like_count on every like
MATCH (post:Post {id: 123})
SET post.like_count = post.like_count + 1

// Query (fast!)
MATCH (post:Post)
RETURN post.title, post.like_count
ORDER BY post.like_count DESC

Trade-off: Faster reads, slower writes (acceptable for read-heavy workloads)

Summary

Design Principles:

Query-driven: Start with questions, design graph to answer them
Entities → Nodes: Domain objects become nodes
Connections → Relationships: Actions, associations, hierarchies
Intermediate nodes: When relationships need relationships
Denormalize: Duplicate data for read performance

Anti-Patterns to Avoid:

Dense nodes (millions of relationships)
Redundant relationships
Property explosion

Next: Apply these patterns to graph algorithms!

What’s Next?

Module 6: Graph Algorithms

Implement PageRank, community detection, shortest paths, and centrality algorithms at scale

4. Cypher Mastery 6. Graph Algorithms

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Graph Data Modeling Best Practices

​The Golden Rule

​Part 1: From Relational to Graph

​Example: Social Network

​Part 2: Modeling Patterns

​Pattern 1: Entities as Nodes

​Pattern 2: Relationships Capture Connections

​Pattern 3: Relationship Properties

​Pattern 4: Intermediate Nodes

​Pattern 5: Multiple Labels

​Part 3: Time-Series and Versioning

​Pattern 6: Time-Series Events

​Pattern 7: Versioning (Bi-Temporal Model)

​Part 4: Handling Hierarchies

​Pattern 8: Tree Structures

​Pattern 9: Hierarchies with Shortcuts

​Part 5: Many-to-Many Relationships

​Pattern 10: Tags and Categories

​Pattern 11: User Roles and Permissions

​Part 6: Modeling Anti-Patterns

​Anti-Pattern 1: Dense Nodes

​Anti-Pattern 2: Redundant Relationships

​Anti-Pattern 3: Property Explosion

​Part 7: Real-World Examples

​Example 1: Social Media Platform

​Example 2: Recommendation Engine

​Example 3: Knowledge Graph

​Part 8: Refactoring Strategies

​Refactoring 1: From Properties to Relationships

​Refactoring 2: Adding Intermediate Nodes

​Refactoring 3: Denormalization for Performance

​Summary

​What’s Next?

Module 6: Graph Algorithms

Graph Data Modeling Best Practices

The Golden Rule

Part 1: From Relational to Graph

Example: Social Network

Part 2: Modeling Patterns

Pattern 1: Entities as Nodes

Pattern 2: Relationships Capture Connections

Pattern 3: Relationship Properties

Pattern 4: Intermediate Nodes

Pattern 5: Multiple Labels

Part 3: Time-Series and Versioning

Pattern 6: Time-Series Events

Pattern 7: Versioning (Bi-Temporal Model)

Part 4: Handling Hierarchies

Pattern 8: Tree Structures

Pattern 9: Hierarchies with Shortcuts

Part 5: Many-to-Many Relationships

Pattern 10: Tags and Categories

Pattern 11: User Roles and Permissions

Part 6: Modeling Anti-Patterns

Anti-Pattern 1: Dense Nodes

Anti-Pattern 2: Redundant Relationships

Anti-Pattern 3: Property Explosion

Part 7: Real-World Examples

Example 1: Social Media Platform

Example 2: Recommendation Engine

Example 3: Knowledge Graph

Part 8: Refactoring Strategies

Refactoring 1: From Properties to Relationships

Refactoring 2: Adding Intermediate Nodes

Refactoring 3: Denormalization for Performance

Summary

What’s Next?