Skip to main content

Graph Data Modeling Best Practices

Module Duration: 5-6 hours Learning Style: Pattern-Based + Refactoring Examples + Real-World Schemas Outcome: Design graph models that match query patterns and perform at scale

The Golden Rule

In relational databases: Normalize first, query later In graph databases: Design for your queries Graph modeling is query-driven. Start with:
  1. What questions do I need to answer?
  2. Design graph structure to answer them efficiently

Part 1: From Relational to Graph

Example: Social Network

Relational Schema:
users(id, name, email, city)
friendships(user_id, friend_id, since)
Problem: Finding friends-of-friends requires self-join:
SELECT u2.name
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN users u2 ON f2.friend_id = u2.id
WHERE f1.user_id = 123;
Graph Model:
(:User {name, email, city})-[:FRIENDS_WITH {since}]->(:User)
Query:
MATCH (user:User {id: 123})-[:FRIENDS_WITH*2]->(fof)
RETURN fof.name
Benefit: No JOINs, natural traversal, 100x faster!

Part 2: Modeling Patterns

Pattern 1: Entities as Nodes

Rule: Domain entities become nodes Example: E-commerce
(user:User {id, name, email})
(product:Product {id, name, price})
(order:Order {id, date, total})
Anti-Pattern: Storing lists as properties
-- Bad: Storing friends as array
(:User {name: "Alice", friends: ["Bob", "Charlie"]})

-- Good: Relationships
(:User {name: "Alice"})-[:KNOWS]->(:User {name: "Bob"})

Pattern 2: Relationships Capture Connections

Rule: Relationships represent actions, associations, or hierarchies Examples:
(user)-[:PURCHASED]->(product)
(employee)-[:WORKS_FOR]->(company)
(post)-[:TAGGED_WITH]->(tag)
(file)-[:PARENT_OF]->(subdirectory)
When to use relationships vs properties:
Use RelationshipUse Property
Connects two entitiesDescribes single entity
Many-to-manyOne-to-one attribute
TraversableFilterable value
Example: KNOWSExample: name, age

Pattern 3: Relationship Properties

Use Case: Metadata about connections
(alice)-[:KNOWS {since: 2015, strength: 0.8}]->(bob)
(user)-[:RATED {score: 5, timestamp: datetime()}]->(movie)
(employee)-[:WORKS_AT {role: "Engineer", start_date: date()}]->(company)

Pattern 4: Intermediate Nodes

Problem: Relationships can’t have relationships! Example: User enrolls in course on a specific date, gets a grade
-- Can't do: relationship properties link to another node
(user)-[:ENROLLED {date, grade}]->(course)

-- Solution: Intermediate node
(user)-[:ENROLLED]->(enrollment:Enrollment {date, grade})-[:IN_COURSE]->(course)
Real-World Example: Order line items
(user)-[:PLACED]->(order:Order)-[:CONTAINS]->(lineItem:LineItem {quantity, price})-[:FOR_PRODUCT]->(product)

Pattern 5: Multiple Labels

Use Case: Entity belongs to multiple categories
CREATE (alice:Person:Customer:VIP {name: "Alice"})
CREATE (bob:Person:Employee:Manager {name: "Bob"})
Query by specific role:
MATCH (vip:VIP)
RETURN vip.name
Benefits:
  • Fine-grained querying
  • Index optimization (indexes per label)

Part 3: Time-Series and Versioning

Pattern 6: Time-Series Events

Example: User actions timeline
(user:User)-[:PERFORMED]->(event:Event {type: "login", timestamp: datetime()})
(user)-[:PERFORMED]->(event2:Event {type: "purchase", timestamp: datetime(), amount: 99.99})
Query: Recent events
MATCH (user:User {id: 123})-[:PERFORMED]->(event:Event)
WHERE event.timestamp > datetime() - duration({days: 7})
RETURN event
ORDER BY event.timestamp DESC

Pattern 7: Versioning (Bi-Temporal Model)

Scenario: Track changes over time (audit trail)
// Version 1 (original)
(product:Product {id: 123})-[:VERSION {valid_from: date("2020-01-01"), valid_to: date("2021-01-01")}]->
  (v1:ProductVersion {price: 10.00, name: "Widget"})

// Version 2 (price change)
(product)-[:VERSION {valid_from: date("2021-01-01"), valid_to: null}]->
  (v2:ProductVersion {price: 12.00, name: "Widget"})
Query: Price at specific date
MATCH (product:Product {id: 123})-[v:VERSION]->(version)
WHERE date("2020-06-15") >= v.valid_from AND (v.valid_to IS NULL OR date("2020-06-15") < v.valid_to)
RETURN version.price

Part 4: Handling Hierarchies

Pattern 8: Tree Structures

Example: File system
(root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})
(home)-[:CONTAINS]->(user:Folder {name: "alice"})
(user)-[:CONTAINS]->(file:File {name: "document.txt"})
Query: All files under /home
MATCH (root:Folder {name: "/"})-[:CONTAINS]->(home:Folder {name: "home"})-[:CONTAINS*]->(item)
RETURN item.name

Pattern 9: Hierarchies with Shortcuts

Problem: Deep hierarchies slow down queries Solution: Add shortcut relationships
// Full path
(company)-[:CHILD]->(division)-[:CHILD]->(dept)-[:CHILD]->(team)-[:CHILD]->(employee)

// Add shortcuts
(company)-[:ALL_EMPLOYEES]->(employee)
Query:
// Fast: Direct relationship
MATCH (company:Company {name: "Acme"})-[:ALL_EMPLOYEES]->(emp)
RETURN count(emp)

// Slow: Deep traversal
MATCH (company:Company {name: "Acme"})-[:CHILD*]->(emp:Employee)
RETURN count(emp)

Part 5: Many-to-Many Relationships

Pattern 10: Tags and Categories

Example: Blog posts with tags
(post:Post {title: "Graph Databases 101"})-[:TAGGED]->(tag:Tag {name: "databases"})
(post)-[:TAGGED]->(tag2:Tag {name: "nosql"})
Query: Posts with specific tag
MATCH (post:Post)-[:TAGGED]->(tag:Tag {name: "databases"})
RETURN post.title
Query: Posts with multiple tags (AND)
MATCH (post:Post)-[:TAGGED]->(tag1:Tag {name: "databases"})
MATCH (post)-[:TAGGED]->(tag2:Tag {name: "nosql"})
RETURN post.title

Pattern 11: User Roles and Permissions

(user:User)-[:HAS_ROLE]->(role:Role {name: "Admin"})
(role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})
Query: Check if user has permission
MATCH (user:User {id: 123})-[:HAS_ROLE]->(:Role)-[:HAS_PERMISSION]->(perm:Permission {name: "DELETE_USER"})
RETURN count(perm) > 0 AS has_permission

Part 6: Modeling Anti-Patterns

Anti-Pattern 1: Dense Nodes

Problem: Node with millions of relationships (celebrity with 10M followers) Issue: Traversing all relationships is slow Solution:
  1. Fan-out to intermediate nodes:
(celebrity)-[:HAS_FOLLOWERS]->(bucket1:FollowerBucket)-[:CONTAINS]->(follower1)
(celebrity)-[:HAS_FOLLOWERS]->(bucket2:FollowerBucket)-[:CONTAINS]->(follower2)
  1. Use properties for aggregates:
(celebrity {follower_count: 10000000})

Anti-Pattern 2: Redundant Relationships

Problem: Same information as properties Example:
-- Bad: Redundant
(user {city: "NYC"})-[:LIVES_IN]->(city:City {name: "NYC"})

-- Good: Pick one approach
(user)-[:LIVES_IN]->(city:City {name: "NYC"})  // If city has other properties
OR
(user {city: "NYC"})  // If city is just a string

Anti-Pattern 3: Property Explosion

Problem: Too many properties on a node
-- Bad
(product {name, price, weight, height, width, color, material, manufacturer, ...})  // 50+ properties

-- Good: Use labels and relationships
(product:Product {name, price})
(product)-[:HAS_DIMENSIONS]->(dims:Dimensions {height, width, weight})
(product)-[:MANUFACTURED_BY]->(company:Company)

Part 7: Real-World Examples

Example 1: Social Media Platform

Requirements:
  • Users post content
  • Users follow each other
  • Posts have likes and comments
  • Posts are tagged
Model:
(user:User {id, name, email, joined: date()})
(post:Post {id, content, timestamp: datetime()})
(comment:Comment {id, text, timestamp: datetime()})
(tag:Tag {name})

(user)-[:POSTED]->(post)
(user)-[:FOLLOWS {since: date()}]->(user2:User)
(user)-[:LIKED {timestamp: datetime()}]->(post)
(user)-[:COMMENTED]->(comment)-[:ON]->(post)
(post)-[:TAGGED_WITH]->(tag)
Queries:
  1. Newsfeed: Posts from people I follow
MATCH (me:User {id: 123})-[:FOLLOWS]->(friend)-[:POSTED]->(post)
RETURN post
ORDER BY post.timestamp DESC
LIMIT 20
  1. Popular posts: Most liked
MATCH (post:Post)<-[like:LIKED]-()
WITH post, count(like) AS likes
ORDER BY likes DESC
LIMIT 10
RETURN post.content, likes
  1. Trending tags: Most used in last 24 hours
MATCH (post:Post)-[:TAGGED_WITH]->(tag:Tag)
WHERE post.timestamp > datetime() - duration({hours: 24})
WITH tag, count(post) AS usage
ORDER BY usage DESC
LIMIT 10
RETURN tag.name, usage

Example 2: Recommendation Engine

Model:
(user:User)-[:PURCHASED]->(product:Product)
(product)-[:IN_CATEGORY]->(category:Category)
(product)-[:SIMILAR_TO {score: 0.85}]->(product2:Product)
Query: Recommend products
// Collaborative filtering: Users like you also bought
MATCH (me:User {id: 123})-[:PURCHASED]->(product:Product)
MATCH (product)<-[:PURCHASED]-(other:User)-[:PURCHASED]->(rec:Product)
WHERE NOT (me)-[:PURCHASED]->(rec)
WITH rec, count(DISTINCT other) AS score
ORDER BY score DESC
LIMIT 10
RETURN rec.name, score

Example 3: Knowledge Graph

Model:
(entity:Entity {name, type})
(entity)-[:RELATED_TO {relationship: "founded", date: date()}]->(entity2:Entity)
Example Data:
CREATE (steve:Person {name: "Steve Jobs"})
CREATE (apple:Company {name: "Apple"})
CREATE (iphone:Product {name: "iPhone"})
CREATE (steve)-[:FOUNDED {year: 1976}]->(apple)
CREATE (steve)-[:INVENTED]->(iphone)
CREATE (apple)-[:PRODUCES]->(iphone)
Query: What did Steve Jobs create?
MATCH (steve:Person {name: "Steve Jobs"})-[r]->(thing)
RETURN type(r) AS relationship, thing.name

Part 8: Refactoring Strategies

Refactoring 1: From Properties to Relationships

Before:
(user:User {city: "NYC", country: "USA"})
After (if cities have more data):
(user:User)-[:LIVES_IN]->(city:City {name: "NYC"})-[:IN_COUNTRY]->(country:Country {name: "USA"})
When to refactor: When you need to query by location, aggregate by city, or add city-specific properties.

Refactoring 2: Adding Intermediate Nodes

Before:
(user)-[:ENROLLED {grade: "A", semester: "Fall 2023"}]->(course)
After:
(user)-[:ENROLLED]->(enrollment:Enrollment {grade: "A", semester: "Fall 2023"})-[:IN_COURSE]->(course)
Benefits: Can add relationships to enrollment (teacher, classroom, etc.)

Refactoring 3: Denormalization for Performance

Before (normalized):
MATCH (post:Post)<-[:LIKED]-(user)
RETURN post.title, count(user) AS likes  // Count at query time (slow!)
After (denormalized):
// Update like_count on every like
MATCH (post:Post {id: 123})
SET post.like_count = post.like_count + 1

// Query (fast!)
MATCH (post:Post)
RETURN post.title, post.like_count
ORDER BY post.like_count DESC
Trade-off: Faster reads, slower writes (acceptable for read-heavy workloads)

Summary

Design Principles:
  1. Query-driven: Start with questions, design graph to answer them
  2. Entities → Nodes: Domain objects become nodes
  3. Connections → Relationships: Actions, associations, hierarchies
  4. Intermediate nodes: When relationships need relationships
  5. Denormalize: Duplicate data for read performance
Anti-Patterns to Avoid:
  • Dense nodes (millions of relationships)
  • Redundant relationships
  • Property explosion
Next: Apply these patterns to graph algorithms!

What’s Next?

Module 6: Graph Algorithms

Implement PageRank, community detection, shortest paths, and centrality algorithms at scale