Graph Data Modeling Best Practices
Module Duration : 5-6 hours
Learning Style : Pattern-Based + Refactoring Examples + Real-World Schemas
Outcome : Design graph models that match query patterns and perform at scale
The Golden Rule
In relational databases : Normalize first, query later
In graph databases : Design for your queries
Graph modeling is query-driven . Start with:
What questions do I need to answer?
Design graph structure to answer them efficiently
Part 1: From Relational to Graph
Example: Social Network
Relational Schema :
users(id, name , email, city)
friendships(user_id, friend_id, since)
Problem : Finding friends-of-friends requires self-join:
SELECT u2 . name
FROM friendships f1
JOIN friendships f2 ON f1 . friend_id = f2 . user_id
JOIN users u2 ON f2 . friend_id = u2 . id
WHERE f1 . user_id = 123 ;
Graph Model :
(: User { name , email , city } ) - [: FRIENDS_WITH { since } ] -> (: User )
Query :
MATCH ( user : User { id : 123 } ) - [: FRIENDS_WITH *2 ] -> ( fof )
RETURN fof . name
Benefit : No JOINs, natural traversal, 100x faster!
Part 2: Modeling Patterns
Pattern 1: Entities as Nodes
Rule : Domain entities become nodes
Example : E-commerce
( user : User { id , name , email } )
( product : Product { id , name , price } )
( order : Order { id , date , total } )
Anti-Pattern : Storing lists as properties
-- Bad : Storing friends as array
(: User { name : "Alice" , friends : [ "Bob" , "Charlie" ] } )
-- Good : Relationships
(: User { name : "Alice" } ) - [: KNOWS ] -> (: User { name : "Bob" } )
Pattern 2: Relationships Capture Connections
Rule : Relationships represent actions, associations, or hierarchies
Examples :
( user ) - [: PURCHASED ] -> ( product )
( employee ) - [: WORKS_FOR ] -> ( company )
( post ) - [: TAGGED_WITH ] -> ( tag )
( file ) - [: PARENT_OF ] -> ( subdirectory )
When to use relationships vs properties :
Use Relationship Use Property Connects two entities Describes single entity Many-to-many One-to-one attribute Traversable Filterable value Example: KNOWS Example: name, age
Pattern 3: Relationship Properties
Use Case : Metadata about connections
( alice ) - [: KNOWS { since : 2015 , strength : 0.8 } ] -> ( bob )
( user ) - [: RATED { score : 5 , timestamp : datetime () } ] -> ( movie )
( employee ) - [: WORKS_AT { role : "Engineer" , start_date : date () } ] -> ( company )
Problem : Relationships can’t have relationships!
Example : User enrolls in course on a specific date, gets a grade
-- Can 't do: relationship properties link to another node
(user)-[:ENROLLED {date, grade}]->(course)
-- Solution: Intermediate node
(user)-[:ENROLLED]->(enrollment:Enrollment {date, grade})-[:IN_COURSE]->(course)
Real-World Example : Order line items
( user ) - [: PLACED ] -> ( order : Order ) - [: CONTAINS ] -> ( lineItem : LineItem { quantity , price } ) - [: FOR_PRODUCT ] -> ( product )
Pattern 5: Multiple Labels
Use Case : Entity belongs to multiple categories
CREATE ( alice : Person : Customer : VIP { name : "Alice" } )
CREATE ( bob : Person : Employee : Manager { name : "Bob" } )
Query by specific role :
MATCH ( vip : VIP )
RETURN vip . name
Benefits :
Fine-grained querying
Index optimization (indexes per label)
Part 3: Time-Series and Versioning
Pattern 6: Time-Series Events
Example : User actions timeline
( user : User ) - [: PERFORMED ] -> ( event : Event { type : "login" , timestamp : datetime () } )
( user ) - [: PERFORMED ] -> ( event2 : Event { type : "purchase" , timestamp : datetime () , amount : 99.99 } )
Query : Recent events
MATCH ( user : User { id : 123 } ) - [: PERFORMED ] -> ( event : Event )
WHERE event . timestamp > datetime () - duration ( { days : 7 } )
RETURN event
ORDER BY event . timestamp DESC
Pattern 7: Versioning (Bi-Temporal Model)
Scenario : Track changes over time (audit trail)
// Version 1 (original)
( product : Product { id : 123 } ) - [: VERSION { valid_from : date ( "2020-01-01" ) , valid_to : date ( "2021-01-01" ) } ] ->
( v1 : ProductVersion { price : 10.00 , name : "Widget" } )
// Version 2 (price change)
( product ) - [: VERSION { valid_from : date ( "2021-01-01" ) , valid_to : null } ] ->
( v2 : ProductVersion { price : 12.00 , name : "Widget" } )
Query : Price at specific date
MATCH ( product : Product { id : 123 } ) - [ v : VERSION ] -> ( version )
WHERE date ( "2020-06-15" ) >= v . valid_from AND ( v . valid_to IS NULL OR date ( "2020-06-15" ) < v . valid_to )
RETURN version . price
Part 4: Handling Hierarchies
Pattern 8: Tree Structures
Example : File system
( root : Folder { name : "/" } ) - [: CONTAINS ] -> ( home : Folder { name : "home" } )
( home ) - [: CONTAINS ] -> ( user : Folder { name : "alice" } )
( user ) - [: CONTAINS ] -> ( file : File { name : "document.txt" } )
Query : All files under /home
MATCH ( root : Folder { name : "/" } ) - [: CONTAINS ] -> ( home : Folder { name : "home" } ) - [: CONTAINS * ] -> ( item )
RETURN item . name
Pattern 9: Hierarchies with Shortcuts
Problem : Deep hierarchies slow down queries
Solution : Add shortcut relationships
// Full path
( company ) - [: CHILD ] -> ( division ) - [: CHILD ] -> ( dept ) - [: CHILD ] -> ( team ) - [: CHILD ] -> ( employee )
// Add shortcuts
( company ) - [: ALL_EMPLOYEES ] -> ( employee )
Query :
// Fast: Direct relationship
MATCH ( company : Company { name : "Acme" } ) - [: ALL_EMPLOYEES ] -> ( emp )
RETURN count ( emp )
// Slow: Deep traversal
MATCH ( company : Company { name : "Acme" } ) - [: CHILD * ] -> ( emp : Employee )
RETURN count ( emp )
Part 5: Many-to-Many Relationships
Example : Blog posts with tags
( post : Post { title : "Graph Databases 101" } ) - [: TAGGED ] -> ( tag : Tag { name : "databases" } )
( post ) - [: TAGGED ] -> ( tag2 : Tag { name : "nosql" } )
Query : Posts with specific tag
MATCH ( post : Post ) - [: TAGGED ] -> ( tag : Tag { name : "databases" } )
RETURN post . title
Query : Posts with multiple tags (AND)
MATCH ( post : Post ) - [: TAGGED ] -> ( tag1 : Tag { name : "databases" } )
MATCH ( post ) - [: TAGGED ] -> ( tag2 : Tag { name : "nosql" } )
RETURN post . title
Pattern 11: User Roles and Permissions
( user : User ) - [: HAS_ROLE ] -> ( role : Role { name : "Admin" } )
( role ) - [: HAS_PERMISSION ] -> ( perm : Permission { name : "DELETE_USER" } )
Query : Check if user has permission
MATCH ( user : User { id : 123 } ) - [: HAS_ROLE ] -> (: Role ) - [: HAS_PERMISSION ] -> ( perm : Permission { name : "DELETE_USER" } )
RETURN count ( perm ) > 0 AS has_permission
Part 6: Modeling Anti-Patterns
Anti-Pattern 1: Dense Nodes
Problem : Node with millions of relationships (celebrity with 10M followers)
Issue : Traversing all relationships is slow
Solution :
Fan-out to intermediate nodes :
( celebrity ) - [: HAS_FOLLOWERS ] -> ( bucket1 : FollowerBucket ) - [: CONTAINS ] -> ( follower1 )
( celebrity ) - [: HAS_FOLLOWERS ] -> ( bucket2 : FollowerBucket ) - [: CONTAINS ] -> ( follower2 )
Use properties for aggregates :
( celebrity { follower_count : 10000000 } )
Anti-Pattern 2: Redundant Relationships
Problem : Same information as properties
Example :
-- Bad : Redundant
( user { city : "NYC" } ) - [: LIVES_IN ] -> ( city : City { name : "NYC" } )
-- Good : Pick one approach
( user ) - [: LIVES_IN ] -> ( city : City { name : "NYC" } ) // If city has other properties
OR
( user { city : "NYC" } ) // If city is just a string
Anti-Pattern 3: Property Explosion
Problem : Too many properties on a node
-- Bad
( product { name , price , weight , height , width , color , material , manufacturer , ... } ) // 50+ properties
-- Good : Use labels and relationships
( product : Product { name , price } )
( product ) - [: HAS_DIMENSIONS ] -> ( dims : Dimensions { height , width , weight } )
( product ) - [: MANUFACTURED_BY ] -> ( company : Company )
Part 7: Real-World Examples
Requirements :
Users post content
Users follow each other
Posts have likes and comments
Posts are tagged
Model :
( user : User { id , name , email , joined : date () } )
( post : Post { id , content , timestamp : datetime () } )
( comment : Comment { id , text , timestamp : datetime () } )
( tag : Tag { name } )
( user ) - [: POSTED ] -> ( post )
( user ) - [: FOLLOWS { since : date () } ] -> ( user2 : User )
( user ) - [: LIKED { timestamp : datetime () } ] -> ( post )
( user ) - [: COMMENTED ] -> ( comment ) - [: ON ] -> ( post )
( post ) - [: TAGGED_WITH ] -> ( tag )
Queries :
Newsfeed : Posts from people I follow
MATCH ( me : User { id : 123 } ) - [: FOLLOWS ] -> ( friend ) - [: POSTED ] -> ( post )
RETURN post
ORDER BY post . timestamp DESC
LIMIT 20
Popular posts : Most liked
MATCH ( post : Post ) <- [ like : LIKED ] - ()
WITH post , count ( like ) AS likes
ORDER BY likes DESC
LIMIT 10
RETURN post . content , likes
Trending tags : Most used in last 24 hours
MATCH ( post : Post ) - [: TAGGED_WITH ] -> ( tag : Tag )
WHERE post . timestamp > datetime () - duration ( { hours : 24 } )
WITH tag , count ( post ) AS usage
ORDER BY usage DESC
LIMIT 10
RETURN tag . name , usage
Example 2: Recommendation Engine
Model :
( user : User ) - [: PURCHASED ] -> ( product : Product )
( product ) - [: IN_CATEGORY ] -> ( category : Category )
( product ) - [: SIMILAR_TO { score : 0.85 } ] -> ( product2 : Product )
Query : Recommend products
// Collaborative filtering: Users like you also bought
MATCH ( me : User { id : 123 } ) - [: PURCHASED ] -> ( product : Product )
MATCH ( product ) <- [: PURCHASED ] - ( other : User ) - [: PURCHASED ] -> ( rec : Product )
WHERE NOT ( me ) - [: PURCHASED ] -> ( rec )
WITH rec , count ( DISTINCT other ) AS score
ORDER BY score DESC
LIMIT 10
RETURN rec . name , score
Example 3: Knowledge Graph
Model :
( entity : Entity { name , type } )
( entity ) - [: RELATED_TO { relationship : "founded" , date : date () } ] -> ( entity2 : Entity )
Example Data :
CREATE ( steve : Person { name : "Steve Jobs" } )
CREATE ( apple : Company { name : "Apple" } )
CREATE ( iphone : Product { name : "iPhone" } )
CREATE ( steve ) - [: FOUNDED { year : 1976 } ] -> ( apple )
CREATE ( steve ) - [: INVENTED ] -> ( iphone )
CREATE ( apple ) - [: PRODUCES ] -> ( iphone )
Query : What did Steve Jobs create?
MATCH ( steve : Person { name : "Steve Jobs" } ) - [ r ] -> ( thing )
RETURN type ( r ) AS relationship , thing . name
Part 8: Refactoring Strategies
Refactoring 1: From Properties to Relationships
Before :
( user : User { city : "NYC" , country : "USA" } )
After (if cities have more data):
( user : User ) - [: LIVES_IN ] -> ( city : City { name : "NYC" } ) - [: IN_COUNTRY ] -> ( country : Country { name : "USA" } )
When to refactor : When you need to query by location, aggregate by city, or add city-specific properties.
Before :
( user ) - [: ENROLLED { grade : "A" , semester : "Fall 2023" } ] -> ( course )
After :
( user ) - [: ENROLLED ] -> ( enrollment : Enrollment { grade : "A" , semester : "Fall 2023" } ) - [: IN_COURSE ] -> ( course )
Benefits : Can add relationships to enrollment (teacher, classroom, etc.)
Before (normalized):
MATCH ( post : Post ) <- [: LIKED ] - ( user )
RETURN post . title , count ( user ) AS likes // Count at query time (slow!)
After (denormalized):
// Update like_count on every like
MATCH ( post : Post { id : 123 } )
SET post . like_count = post . like_count + 1
// Query (fast!)
MATCH ( post : Post )
RETURN post . title , post . like_count
ORDER BY post . like_count DESC
Trade-off : Faster reads, slower writes (acceptable for read-heavy workloads)
Summary
Design Principles :
Query-driven : Start with questions, design graph to answer them
Entities → Nodes : Domain objects become nodes
Connections → Relationships : Actions, associations, hierarchies
Intermediate nodes : When relationships need relationships
Denormalize : Duplicate data for read performance
Anti-Patterns to Avoid :
Dense nodes (millions of relationships)
Redundant relationships
Property explosion
Next : Apply these patterns to graph algorithms!
What’s Next?
Module 6: Graph Algorithms Implement PageRank, community detection, shortest paths, and centrality algorithms at scale