Graph Algorithms at Scale
Introduction: Why Graph Algorithms?
Part 1: Neo4j GDS Setup
Installation
Creating a Graph Projection
Listing and Dropping Projections
Part 2: Pathfinding Algorithms
1. Shortest Path (Dijkstra)
2. All Shortest Paths
3. Single-Source Shortest Path
Part 3: Centrality Algorithms
1. Degree Centrality
2. PageRank
3. Betweenness Centrality
4. Closeness Centrality
Part 4: Community Detection
1. Louvain (Modularity-Based)
2. Label Propagation
3. Weakly Connected Components
Part 5: Similarity Algorithms
1. Node Similarity (Jaccard)
2. Cosine Similarity (Vector-Based)
Part 6: Link Prediction
1. Adamic Adar
2. Common Neighbors
Part 7: Graph Embeddings
Node2Vec
Part 8: Real-World Use Cases
Use Case 1: Fraud Detection
Use Case 2: Supply Chain Analysis
Use Case 3: Knowledge Graph Reasoning
Part 9: Performance Tips
1. Use Graph Projections
2. Estimate Memory Usage
3. Limit Iterations
4. Filter Projections
Summary
What’s Next?

Graph Algorithms at Scale

Module Duration: 6-8 hours Learning Style: Algorithm Theory + Implementation + Real-World Applications Outcome: Apply graph algorithms to solve real problems: recommendations, fraud detection, network analysis

Introduction: Why Graph Algorithms?

Graph algorithms solve problems that are intractable in relational databases:

Shortest Path: GPS navigation, network routing
PageRank: Search engine ranking, influence scoring
Community Detection: Social groups, customer segmentation
Centrality: Find influencers, critical infrastructure
Link Prediction: Recommendations, fraud detection

Neo4j provides these via the Graph Data Science (GDS) library.

Part 1: Neo4j GDS Setup

Installation

// Check if GDS is installed
CALL gds.version()

If not installed, download from: https://neo4j.com/download-center/#algorithms

Creating a Graph Projection

Problem: Running algorithms on live data is slow and risky. Solution: Create an in-memory graph projection (read-only snapshot):

CALL gds.graph.project(
  'my-graph',              // Graph name
  'Person',                // Node labels
  'KNOWS'                  // Relationship types
)
YIELD graphName, nodeCount, relationshipCount

With properties:

CALL gds.graph.project(
  'social-network',
  'Person',
  'KNOWS',
  {
    nodeProperties: ['age', 'city'],
    relationshipProperties: ['since']
  }
)

Cypher projection (more flexible):

CALL gds.graph.project.cypher(
  'my-graph',
  'MATCH (n:Person) RETURN id(n) AS id',
  'MATCH (a:Person)-[r:KNOWS]->(b:Person) RETURN id(a) AS source, id(b) AS target'
)

Listing and Dropping Projections

// List all projections
CALL gds.graph.list()

// Drop projection
CALL gds.graph.drop('my-graph')

Part 2: Pathfinding Algorithms

1. Shortest Path (Dijkstra)

Use Case: Find minimum-cost path between two nodes Example: Shortest route between cities

// Create graph with weighted relationships
MATCH (source:City {name: 'San Francisco'})
MATCH (target:City {name: 'New York'})
CALL gds.shortestPath.dijkstra.stream('city-network', {
  sourceNode: source,
  targetNode: target,
  relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN
  gds.util.asNode(sourceNode).name AS from,
  gds.util.asNode(targetNode).name AS to,
  totalCost AS distance,
  [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS route

Output:

from: San Francisco
to: New York
distance: 2900
route: ["San Francisco", "Denver", "Chicago", "New York"]

2. All Shortest Paths

Find all shortest paths (multiple paths with same cost):

MATCH (source:Person {name: 'Alice'})
MATCH (target:Person {name: 'Bob'})
CALL gds.shortestPath.dijkstra.stream('social-network', {
  sourceNode: source,
  targetNode: target
})
YIELD path
RETURN [node IN nodes(path) | node.name] AS path

3. Single-Source Shortest Path

Find shortest paths from one node to all others:

MATCH (source:Person {name: 'Alice'})
CALL gds.allShortestPaths.delta.stream('social-network', {
  sourceNode: source
})
YIELD targetNode, distance
RETURN gds.util.asNode(targetNode).name AS person, distance
ORDER BY distance ASC
LIMIT 10

Part 3: Centrality Algorithms

1. Degree Centrality

Measures: Number of connections (simplest centrality) Use Case: Find most connected people

CALL gds.degree.stream('social-network')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score AS connections
ORDER BY score DESC
LIMIT 10

Write back to graph:

CALL gds.degree.write('social-network', {
  writeProperty: 'degree'
})
YIELD nodePropertiesWritten, computeMillis

2. PageRank

Measures: Importance based on connections (Google’s algorithm) Formula:

PR(A) = (1-d) + d × Σ(PR(T_i) / C(T_i))

Where:

d = damping factor (0.85)
T_i = nodes linking to A
C(T_i) = out-degree of T_i

Use Case: Rank influencers, important nodes

CALL gds.pageRank.stream('social-network', {
  maxIterations: 20,
  dampingFactor: 0.85
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10

Real-World Example: Twitter influence

// High PageRank = influential (many followers who are also influential)
CALL gds.graph.project(
  'twitter',
  'User',
  {FOLLOWS: {orientation: 'NATURAL'}}
)

CALL gds.pageRank.write('twitter', {
  writeProperty: 'influence_score'
})
YIELD nodePropertiesWritten

3. Betweenness Centrality

Measures: How often a node lies on shortest paths between others Use Case: Find bridges, bottlenecks

CALL gds.betweenness.stream('social-network')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score AS betweenness
ORDER BY score DESC
LIMIT 10

Interpretation: High betweenness = broker, gatekeeper

4. Closeness Centrality

Measures: Average distance to all other nodes Use Case: Find nodes that can spread information quickly

CALL gds.closeness.stream('social-network')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, score
ORDER BY score DESC
LIMIT 10

Interpretation: High closeness = well-connected, central position

Part 4: Community Detection

1. Louvain (Modularity-Based)

Finds: Groups with more internal connections than external Use Case: Customer segmentation, fraud rings

CALL gds.louvain.stream('social-network')
YIELD nodeId, communityId
WITH communityId, collect(gds.util.asNode(nodeId).name) AS members
RETURN communityId, size(members) AS size, members
ORDER BY size DESC

Write communities back:

CALL gds.louvain.write('social-network', {
  writeProperty: 'community'
})
YIELD communityCount, modularity

Query by community:

MATCH (p:Person {community: 5})
RETURN p.name

2. Label Propagation

Faster than Louvain, less accurate Algorithm: Nodes adopt the most common label among neighbors

CALL gds.labelPropagation.stream('social-network')
YIELD nodeId, communityId
RETURN gds.util.asNode(nodeId).name AS person, communityId

3. Weakly Connected Components

Finds: Disconnected subgraphs Use Case: Find isolated clusters

CALL gds.wcc.stream('social-network')
YIELD nodeId, componentId
WITH componentId, collect(gds.util.asNode(nodeId).name) AS members
WHERE size(members) > 1
RETURN componentId, size(members) AS size, members
ORDER BY size DESC

Part 5: Similarity Algorithms

1. Node Similarity (Jaccard)

Measures: Overlap of neighbors Use Case: Recommend similar users

CALL gds.nodeSimilarity.stream('social-network', {
  topK: 10  // Top 10 most similar per node
})
YIELD node1, node2, similarity
RETURN
  gds.util.asNode(node1).name AS person1,
  gds.util.asNode(node2).name AS person2,
  similarity
ORDER BY similarity DESC

Write similarities as relationships:

CALL gds.nodeSimilarity.write('social-network', {
  writeRelationshipType: 'SIMILAR_TO',
  writeProperty: 'score',
  topK: 10
})

2. Cosine Similarity (Vector-Based)

Use Case: Compare embeddings, feature vectors

// Assume nodes have 'embedding' property (array of floats)
CALL gds.graph.project(
  'embeddings-graph',
  {Person: {properties: 'embedding'}},
  '*'
)

CALL gds.similarity.cosine.stream('embeddings-graph', {
  nodeProperties: ['embedding']
})
YIELD node1, node2, similarity
RETURN
  gds.util.asNode(node1).name,
  gds.util.asNode(node2).name,
  similarity
ORDER BY similarity DESC

Part 6: Link Prediction

1. Adamic Adar

Predicts: Likelihood of future connection Formula: Sum of 1/log(degree) for common neighbors Use Case: Friend recommendations, missing links

MATCH (p1:Person {name: 'Alice'})
MATCH (p2:Person {name: 'Bob'})
WHERE NOT (p1)-[:KNOWS]-(p2)  // Not already connected
RETURN gds.alpha.linkprediction.adamicAdar(p1, p2) AS score

Find top recommendations:

MATCH (alice:Person {name: 'Alice'})
MATCH (candidate:Person)
WHERE NOT (alice)-[:KNOWS]-(candidate) AND alice <> candidate
WITH alice, candidate,
     gds.alpha.linkprediction.adamicAdar(alice, candidate) AS score
WHERE score > 0
RETURN candidate.name, score
ORDER BY score DESC
LIMIT 10

2. Common Neighbors

Simpler: Count shared neighbors

MATCH (alice:Person {name: 'Alice'})
MATCH (candidate:Person)
WHERE NOT (alice)-[:KNOWS]-(candidate) AND alice <> candidate
MATCH (alice)-[:KNOWS]-(mutual)-[:KNOWS]-(candidate)
WITH candidate, count(DISTINCT mutual) AS common_friends
RETURN candidate.name, common_friends
ORDER BY common_friends DESC
LIMIT 10

Part 7: Graph Embeddings

Node2Vec

Generates: Vector representation of nodes Use Case: Machine learning features, clustering

CALL gds.node2vec.stream('social-network', {
  embeddingDimension: 128,
  iterations: 10
})
YIELD nodeId, embedding
RETURN gds.util.asNode(nodeId).name AS person, embedding

Write embeddings:

CALL gds.node2vec.write('social-network', {
  embeddingDimension: 128,
  writeProperty: 'embedding'
})

Use embeddings for similarity:

MATCH (p1:Person {name: 'Alice'})
MATCH (p2:Person)
WITH p1, p2, gds.similarity.cosine(p1.embedding, p2.embedding) AS similarity
WHERE similarity > 0.8 AND p1 <> p2
RETURN p2.name, similarity
ORDER BY similarity DESC

Part 8: Real-World Use Cases

Use Case 1: Fraud Detection

Goal: Find fraud rings (groups of accounts committing fraud together) Model:

(account:Account)-[:SHARED_IP]->(ip:IP)
(account)-[:SHARED_DEVICE]->(device:Device)
(account)-[:SHARED_EMAIL]->(email:Email)

Algorithm: Louvain community detection

CALL gds.graph.project('fraud-network', 'Account', ['SHARED_IP', 'SHARED_DEVICE', 'SHARED_EMAIL'])

CALL gds.louvain.write('fraud-network', {
  writeProperty: 'fraud_community'
})

// Find suspicious communities (high connectivity)
MATCH (a:Account)
WITH a.fraud_community AS community, collect(a) AS accounts
WHERE size(accounts) > 5  // Rings with 5+ accounts
RETURN community, size(accounts) AS ring_size, accounts
ORDER BY ring_size DESC

Use Case 2: Supply Chain Analysis

Goal: Find critical suppliers (bottlenecks) Algorithm: Betweenness centrality

CALL gds.graph.project('supply-chain', ['Supplier', 'Manufacturer', 'Distributor'], 'SUPPLIES')

CALL gds.betweenness.write('supply-chain', {
  writeProperty: 'criticality'
})

// Find critical nodes
MATCH (n)
WHERE n.criticality > 1000
RETURN labels(n) AS type, n.name, n.criticality
ORDER BY n.criticality DESC

Use Case 3: Knowledge Graph Reasoning

Goal: Infer missing relationships Example: Predict who might know whom

// Find people in same company who don't know each other yet
MATCH (p1:Person)-[:WORKS_AT]->(company:Company)<-[:WORKS_AT]-(p2:Person)
WHERE NOT (p1)-[:KNOWS]-(p2) AND p1 <> p2
WITH p1, p2, gds.alpha.linkprediction.adamicAdar(p1, p2) AS score
WHERE score > 2
RETURN p1.name, p2.name, score
ORDER BY score DESC

Part 9: Performance Tips

1. Use Graph Projections

Always project before running algorithms:

// Bad: Direct algorithm on database (slow, locks data)
CALL gds.pageRank.stream({nodeQuery: 'MATCH (n:Person) RETURN id(n)', ...})

// Good: Project first (fast, safe)
CALL gds.graph.project('my-graph', 'Person', 'KNOWS')
CALL gds.pageRank.stream('my-graph')

2. Estimate Memory Usage

CALL gds.degree.estimate('my-graph')
YIELD requiredMemory, nodeCount, relationshipCount

3. Limit Iterations

CALL gds.pageRank.stream('my-graph', {
  maxIterations: 20,  // Stop after 20 iterations
  tolerance: 0.0001   // Or when change < 0.01%
})

4. Filter Projections

Don’t load entire graph if you only need subset:

CALL gds.graph.project.cypher(
  'active-users',
  'MATCH (n:User) WHERE n.active = true RETURN id(n) AS id',
  'MATCH (a:User)-[r:FOLLOWS]->(b:User) RETURN id(a) AS source, id(b) AS target'
)

Summary

Pathfinding: Shortest path, Dijkstra for routing Centrality: Degree, PageRank, Betweenness for influence Community Detection: Louvain, Label Propagation for clustering Similarity: Node similarity, Cosine for recommendations Link Prediction: Adamic Adar for missing connections Embeddings: Node2Vec for ML features Best Practices:

Always use graph projections
Estimate memory before running
Write results back for reuse
Filter projections for performance

What’s Next?

Module 7: Production Deployment & Operations

Deploy Neo4j clusters, configure high availability, monitor performance, and operate at scale

5. Data Modeling 7. Production & Ops

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Graph Algorithms at Scale

​Introduction: Why Graph Algorithms?

​Part 1: Neo4j GDS Setup

​Installation

​Creating a Graph Projection

​Listing and Dropping Projections

​Part 2: Pathfinding Algorithms

​1. Shortest Path (Dijkstra)

​2. All Shortest Paths

​3. Single-Source Shortest Path

​Part 3: Centrality Algorithms

​1. Degree Centrality

​2. PageRank

​3. Betweenness Centrality

​4. Closeness Centrality

​Part 4: Community Detection

​1. Louvain (Modularity-Based)

​2. Label Propagation

​3. Weakly Connected Components

​Part 5: Similarity Algorithms

​1. Node Similarity (Jaccard)

​2. Cosine Similarity (Vector-Based)

​Part 6: Link Prediction

​1. Adamic Adar

​2. Common Neighbors

​Part 7: Graph Embeddings

​Node2Vec

​Part 8: Real-World Use Cases

​Use Case 1: Fraud Detection

​Use Case 2: Supply Chain Analysis

​Use Case 3: Knowledge Graph Reasoning

​Part 9: Performance Tips

​1. Use Graph Projections

​2. Estimate Memory Usage

​3. Limit Iterations

​4. Filter Projections

​Summary

​What’s Next?

Graph Algorithms at Scale

Introduction: Why Graph Algorithms?

Part 1: Neo4j GDS Setup

Installation

Creating a Graph Projection

Listing and Dropping Projections

Part 2: Pathfinding Algorithms

1. Shortest Path (Dijkstra)

2. All Shortest Paths

3. Single-Source Shortest Path

Part 3: Centrality Algorithms

1. Degree Centrality

2. PageRank

3. Betweenness Centrality

4. Closeness Centrality

Part 4: Community Detection

1. Louvain (Modularity-Based)

2. Label Propagation

3. Weakly Connected Components

Part 5: Similarity Algorithms

1. Node Similarity (Jaccard)

2. Cosine Similarity (Vector-Based)

Part 6: Link Prediction

1. Adamic Adar

2. Common Neighbors

Part 7: Graph Embeddings

Node2Vec

Part 8: Real-World Use Cases

Use Case 1: Fraud Detection

Use Case 2: Supply Chain Analysis

Use Case 3: Knowledge Graph Reasoning

Part 9: Performance Tips

1. Use Graph Projections

2. Estimate Memory Usage

3. Limit Iterations

4. Filter Projections

Summary

What’s Next?