Load Balancing Deep Dive
Load balancing is critical for distributing traffic across service instances efficiently and reliably.Learning Objectives:
- Understand client-side vs server-side load balancing
- Master load balancing algorithms
- Implement health checking strategies
- Build intelligent load balancing with Node.js
Client-Side vs Server-Side Load Balancing
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ LOAD BALANCING APPROACHES │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ SERVER-SIDE LOAD BALANCING │
│ ────────────────────────────── │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Client │────────▶│ Load Balancer│────────▶│ Service A-1 │ │
│ └──────────┘ │ (nginx/HAP) │ ├──────────────┤ │
│ └──────────────┘────────▶│ Service A-2 │ │
│ └───────▶│ Service A-3 │ │
│ └──────────────┘ │
│ │
│ Pros: Simple for clients, centralized control │
│ Cons: Single point of failure, extra hop, limited to L4/L7 │
│ │
│ ═══════════════════════════════════════════════════════════════════════ │
│ │
│ CLIENT-SIDE LOAD BALANCING │
│ ──────────────────────────── │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Client │◀────────│ Service │ ┌──────────────┐ │
│ │ + LB │ │ Registry │────────▶│ Service A-1 │ │
│ │ Logic │────────────────────────────────▶│ Service A-2 │ │
│ └──────────┘ └──────────────┘└───────▶│ Service A-3 │ │
│ └──────────────┘ │
│ │
│ Pros: No extra hop, distributed (no SPOF), more intelligent │
│ Cons: Complex clients, language-specific implementations │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Server-Side: NGINX Configuration
Copy
# nginx.conf - Production-ready load balancing
upstream user_service {
# Load balancing algorithm
least_conn; # Send to server with fewest active connections
# Backend servers with weights and health
server user-1.internal:3000 weight=5 max_fails=3 fail_timeout=30s;
server user-2.internal:3000 weight=5 max_fails=3 fail_timeout=30s;
server user-3.internal:3000 weight=3 backup; # Backup server
# Keepalive connections to backends
keepalive 32;
keepalive_timeout 60s;
}
upstream order_service {
# IP Hash - session affinity
ip_hash;
server order-1.internal:3001;
server order-2.internal:3001;
server order-3.internal:3001;
}
server {
listen 80;
location /api/users {
proxy_pass http://user_service;
# Health check headers
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
# Keepalive
proxy_http_version 1.1;
proxy_set_header Connection "";
}
location /api/orders {
proxy_pass http://order_service;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
Client-Side: Node.js Implementation
Copy
// client-side-lb.js - Client-side load balancing with service discovery
const EventEmitter = require('events');
class ClientSideLoadBalancer extends EventEmitter {
constructor(options = {}) {
super();
this.serviceName = options.serviceName;
this.registry = options.registry;
this.algorithm = options.algorithm || 'round-robin';
this.healthCheckInterval = options.healthCheckInterval || 10000;
this.instances = [];
this.currentIndex = 0;
this.healthStatus = new Map();
this.initializeDiscovery();
this.startHealthChecks();
}
async initializeDiscovery() {
// Initial fetch from service registry
await this.refreshInstances();
// Subscribe to registry updates
this.registry.on('instances-changed', (serviceName) => {
if (serviceName === this.serviceName) {
this.refreshInstances();
}
});
}
async refreshInstances() {
try {
const instances = await this.registry.getInstances(this.serviceName);
this.instances = instances.map(instance => ({
...instance,
weight: instance.weight || 1,
activeConnections: 0,
responseTime: 0,
consecutiveFailures: 0
}));
this.emit('instances-updated', this.instances);
} catch (error) {
this.emit('error', error);
}
}
startHealthChecks() {
setInterval(async () => {
for (const instance of this.instances) {
try {
const start = Date.now();
const response = await fetch(`http://${instance.host}:${instance.port}/health`, {
timeout: 5000
});
const latency = Date.now() - start;
this.healthStatus.set(instance.id, {
healthy: response.ok,
latency,
lastCheck: Date.now()
});
instance.responseTime = latency;
instance.consecutiveFailures = 0;
} catch (error) {
const status = this.healthStatus.get(instance.id) || {};
instance.consecutiveFailures++;
this.healthStatus.set(instance.id, {
...status,
healthy: false,
lastCheck: Date.now(),
error: error.message
});
}
}
}, this.healthCheckInterval);
}
// Get next available instance based on algorithm
getNextInstance() {
const healthyInstances = this.instances.filter(
i => (this.healthStatus.get(i.id)?.healthy !== false) &&
i.consecutiveFailures < 3
);
if (healthyInstances.length === 0) {
throw new Error(`No healthy instances available for ${this.serviceName}`);
}
switch (this.algorithm) {
case 'round-robin':
return this.roundRobin(healthyInstances);
case 'weighted-round-robin':
return this.weightedRoundRobin(healthyInstances);
case 'least-connections':
return this.leastConnections(healthyInstances);
case 'least-response-time':
return this.leastResponseTime(healthyInstances);
case 'random':
return this.random(healthyInstances);
default:
return this.roundRobin(healthyInstances);
}
}
roundRobin(instances) {
const instance = instances[this.currentIndex % instances.length];
this.currentIndex++;
return instance;
}
weightedRoundRobin(instances) {
// Create weighted list
const weighted = [];
for (const instance of instances) {
for (let i = 0; i < instance.weight; i++) {
weighted.push(instance);
}
}
const instance = weighted[this.currentIndex % weighted.length];
this.currentIndex++;
return instance;
}
leastConnections(instances) {
return instances.reduce((min, current) =>
current.activeConnections < min.activeConnections ? current : min
);
}
leastResponseTime(instances) {
return instances.reduce((best, current) => {
const currentScore = current.activeConnections * 0.5 + current.responseTime * 0.5;
const bestScore = best.activeConnections * 0.5 + best.responseTime * 0.5;
return currentScore < bestScore ? current : best;
});
}
random(instances) {
return instances[Math.floor(Math.random() * instances.length)];
}
// Execute request with automatic failover
async execute(requestFn, options = {}) {
const maxRetries = options.retries || 3;
const retryDelay = options.retryDelay || 100;
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
const instance = this.getNextInstance();
try {
instance.activeConnections++;
const start = Date.now();
const result = await requestFn(instance);
instance.responseTime = Date.now() - start;
instance.activeConnections--;
return result;
} catch (error) {
instance.activeConnections--;
instance.consecutiveFailures++;
lastError = error;
this.emit('request-failed', { instance, error, attempt });
if (attempt < maxRetries - 1) {
await this.delay(retryDelay * Math.pow(2, attempt));
}
}
}
throw lastError;
}
delay(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// Usage example
const axios = require('axios');
const userServiceLB = new ClientSideLoadBalancer({
serviceName: 'user-service',
registry: serviceRegistry,
algorithm: 'least-connections'
});
// Make requests through the load balancer
async function getUser(userId) {
return userServiceLB.execute(async (instance) => {
const response = await axios.get(
`http://${instance.host}:${instance.port}/users/${userId}`,
{ timeout: 5000 }
);
return response.data;
});
}
module.exports = { ClientSideLoadBalancer };
Load Balancing Algorithms
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ LOAD BALANCING ALGORITHMS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ROUND ROBIN WEIGHTED ROUND ROBIN │
│ ───────────────── ───────────────────── │
│ │
│ Request 1 → Server A Request 1 → Server A (w=5) │
│ Request 2 → Server B Request 2 → Server A │
│ Request 3 → Server C Request 3 → Server A │
│ Request 4 → Server A Request 4 → Server B (w=3) │
│ ... Request 5 → Server B │
│ Request 6 → Server B │
│ Simple, equal distribution Request 7 → Server C (w=2) │
│ ... │
│ Accounts for server capacity │
│ │
│ ═══════════════════════════════════════════════════════════════════════ │
│ │
│ LEAST CONNECTIONS LEAST RESPONSE TIME │
│ ────────────────────── ───────────────────── │
│ │
│ Server A: 5 conn ←──── Server A: 50ms avg ←──── │
│ Server B: 8 conn Server B: 75ms avg │
│ Server C: 3 conn Server C: 45ms avg │
│ ↑ ↑ │
│ Next → Server C Next → Server C (fastest) │
│ │
│ Best for long-lived connections Best for varying server loads │
│ │
│ ═══════════════════════════════════════════════════════════════════════ │
│ │
│ IP HASH CONSISTENT HASHING │
│ ──────────── ─────────────────── │
│ │
│ hash(client_ip) → Server B hash(request) → Ring position │
│ Minimal redistribution on change │
│ Same client → same server │
│ Session affinity Good for caching, stateful services │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Advanced Algorithms Implementation
Copy
// advanced-lb-algorithms.js
class LoadBalancingAlgorithms {
// Weighted Round Robin with smooth distribution
static createWeightedRoundRobin(instances) {
let currentWeight = 0;
let maxWeight = Math.max(...instances.map(i => i.weight));
let gcdWeight = instances.reduce((a, b) => gcd(a, b.weight), instances[0].weight);
let currentIndex = -1;
function gcd(a, b) {
return b === 0 ? a : gcd(b, a % b);
}
return function getNext() {
while (true) {
currentIndex = (currentIndex + 1) % instances.length;
if (currentIndex === 0) {
currentWeight -= gcdWeight;
if (currentWeight <= 0) {
currentWeight = maxWeight;
}
}
if (instances[currentIndex].weight >= currentWeight) {
return instances[currentIndex];
}
}
};
}
// Consistent Hashing with virtual nodes
static createConsistentHash(instances, virtualNodes = 150) {
const ring = new Map();
const sortedKeys = [];
// Add virtual nodes for each instance
for (const instance of instances) {
for (let i = 0; i < virtualNodes; i++) {
const key = hash(`${instance.id}-${i}`);
ring.set(key, instance);
sortedKeys.push(key);
}
}
sortedKeys.sort((a, b) => a - b);
function hash(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash = hash & hash;
}
return Math.abs(hash);
}
return function getNode(key) {
const keyHash = hash(key);
// Binary search for first key >= keyHash
let low = 0, high = sortedKeys.length - 1;
while (low < high) {
const mid = Math.floor((low + high) / 2);
if (sortedKeys[mid] < keyHash) {
low = mid + 1;
} else {
high = mid;
}
}
// Wrap around if key is larger than all
const index = sortedKeys[low] >= keyHash ? low : 0;
return ring.get(sortedKeys[index]);
};
}
// Power of Two Choices (P2C)
// Pick 2 random servers, choose the one with fewer connections
static createP2C(instances) {
return function getNext() {
if (instances.length === 1) return instances[0];
// Pick two random instances
const i1 = Math.floor(Math.random() * instances.length);
let i2 = Math.floor(Math.random() * instances.length);
while (i2 === i1) {
i2 = Math.floor(Math.random() * instances.length);
}
// Choose the one with fewer active connections
return instances[i1].activeConnections <= instances[i2].activeConnections
? instances[i1]
: instances[i2];
};
}
// Adaptive Load Balancing (based on real-time metrics)
static createAdaptive(instances) {
return function getNext() {
// Calculate scores based on multiple factors
const scored = instances.map(instance => ({
instance,
score: calculateScore(instance)
}));
// Sort by score (lower is better)
scored.sort((a, b) => a.score - b.score);
// Weighted random from top 3
const topN = scored.slice(0, Math.min(3, scored.length));
const totalWeight = topN.reduce((sum, s) => sum + (1 / s.score), 0);
let random = Math.random() * totalWeight;
for (const { instance, score } of topN) {
random -= (1 / score);
if (random <= 0) return instance;
}
return topN[0].instance;
};
function calculateScore(instance) {
// Lower score = better
const connectionScore = instance.activeConnections * 0.3;
const latencyScore = instance.avgResponseTime * 0.4;
const errorScore = instance.errorRate * 100 * 0.3;
return connectionScore + latencyScore + errorScore + 0.001; // Avoid division by zero
}
}
}
module.exports = { LoadBalancingAlgorithms };
Health Checking Strategies
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ HEALTH CHECK PATTERNS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ LIVENESS vs READINESS │
│ ───────────────────────── │
│ │
│ LIVENESS: Is the process running? │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GET /healthz │ │
│ │ → 200 OK: Process is alive │ │
│ │ → 5xx: Process is dead, restart it │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ READINESS: Can it handle traffic? │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GET /ready │ │
│ │ → 200 OK: Ready to receive traffic │ │
│ │ → 503: Not ready (warming up, dependencies down) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ DEEP HEALTH CHECK (with dependencies) │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ GET /health/deep │ │
│ │ { │ │
│ │ "status": "degraded", │ │
│ │ "checks": { │ │
│ │ "database": { "status": "healthy", "latency": "5ms" }, │ │
│ │ "redis": { "status": "healthy", "latency": "2ms" }, │ │
│ │ "external-api": { "status": "unhealthy", "error": "timeout" } │ │
│ │ } │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Comprehensive Health Check Service
Copy
// health-check-service.js
const express = require('express');
class HealthCheckService {
constructor() {
this.checks = new Map();
this.status = 'starting';
this.startTime = Date.now();
}
// Register a health check
registerCheck(name, checkFn, options = {}) {
this.checks.set(name, {
fn: checkFn,
critical: options.critical !== false, // Default to critical
timeout: options.timeout || 5000,
interval: options.interval || 30000,
lastResult: null,
lastCheck: null
});
// Start periodic checking
if (options.interval) {
setInterval(() => this.runCheck(name), options.interval);
}
}
async runCheck(name) {
const check = this.checks.get(name);
if (!check) return null;
const startTime = Date.now();
try {
const result = await Promise.race([
check.fn(),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Health check timeout')), check.timeout)
)
]);
check.lastResult = {
status: 'healthy',
latency: Date.now() - startTime,
...result
};
} catch (error) {
check.lastResult = {
status: 'unhealthy',
error: error.message,
latency: Date.now() - startTime
};
}
check.lastCheck = Date.now();
return check.lastResult;
}
async runAllChecks() {
const results = {};
for (const [name, check] of this.checks) {
results[name] = await this.runCheck(name);
}
return results;
}
getOverallStatus(checkResults) {
const criticalChecks = Array.from(this.checks.entries())
.filter(([_, check]) => check.critical)
.map(([name]) => name);
const hasUnhealthyCritical = criticalChecks.some(
name => checkResults[name]?.status === 'unhealthy'
);
const hasAnyUnhealthy = Object.values(checkResults).some(
r => r?.status === 'unhealthy'
);
if (hasUnhealthyCritical) return 'unhealthy';
if (hasAnyUnhealthy) return 'degraded';
return 'healthy';
}
// Express middleware for health endpoints
createRouter() {
const router = express.Router();
// Liveness probe - is the process running?
router.get('/healthz', (req, res) => {
res.status(200).json({
status: 'alive',
uptime: Date.now() - this.startTime
});
});
// Readiness probe - can we handle traffic?
router.get('/ready', async (req, res) => {
if (this.status !== 'ready') {
return res.status(503).json({
status: this.status,
message: 'Service not ready'
});
}
// Quick check of critical dependencies
const criticalResults = {};
for (const [name, check] of this.checks) {
if (check.critical) {
criticalResults[name] = check.lastResult;
}
}
const hasUnhealthy = Object.values(criticalResults).some(
r => r?.status === 'unhealthy'
);
if (hasUnhealthy) {
return res.status(503).json({
status: 'not_ready',
checks: criticalResults
});
}
res.status(200).json({ status: 'ready' });
});
// Deep health check - detailed status of all dependencies
router.get('/health', async (req, res) => {
const checkResults = await this.runAllChecks();
const overallStatus = this.getOverallStatus(checkResults);
const statusCode = overallStatus === 'healthy' ? 200 :
overallStatus === 'degraded' ? 200 : 503;
res.status(statusCode).json({
status: overallStatus,
timestamp: new Date().toISOString(),
uptime: Date.now() - this.startTime,
version: process.env.APP_VERSION || 'unknown',
checks: checkResults
});
});
return router;
}
setReady() {
this.status = 'ready';
}
setNotReady(reason) {
this.status = reason || 'not_ready';
}
}
// Usage example
const healthService = new HealthCheckService();
// Register database check
healthService.registerCheck('database', async () => {
const start = Date.now();
await pool.query('SELECT 1');
return { latency: Date.now() - start };
}, { critical: true, interval: 30000 });
// Register Redis check
healthService.registerCheck('redis', async () => {
const start = Date.now();
await redis.ping();
return { latency: Date.now() - start };
}, { critical: true, interval: 30000 });
// Register external API check
healthService.registerCheck('payment-api', async () => {
const response = await fetch('https://api.stripe.com/v1/health', {
timeout: 5000
});
return { status: response.ok ? 'reachable' : 'unreachable' };
}, { critical: false, interval: 60000 });
// Mount health routes
app.use(healthService.createRouter());
// Mark service as ready after initialization
await initializeDatabase();
await warmUpCaches();
healthService.setReady();
module.exports = { HealthCheckService };
Load Balancer Patterns
Kubernetes Service Load Balancing
Copy
# kubernetes-lb.yaml
# ClusterIP Service (internal load balancing)
apiVersion: v1
kind: Service
metadata:
name: user-service
spec:
type: ClusterIP
selector:
app: user-service
ports:
- port: 80
targetPort: 3000
sessionAffinity: None # or ClientIP for sticky sessions
---
# Headless Service (for client-side LB with service discovery)
apiVersion: v1
kind: Service
metadata:
name: user-service-headless
spec:
clusterIP: None
selector:
app: user-service
ports:
- port: 3000
---
# Deployment with health checks
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
spec:
containers:
- name: user-service
image: user-service:latest
ports:
- containerPort: 3000
# Liveness probe - restart if fails
livenessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
# Readiness probe - remove from LB if fails
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
# Startup probe - for slow starting containers
startupProbe:
httpGet:
path: /healthz
port: 3000
failureThreshold: 30
periodSeconds: 10
Envoy Proxy Configuration
Copy
# envoy-lb.yaml - Advanced L7 load balancing
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/api/users"
route:
cluster: user_service
timeout: 30s
retry_policy:
retry_on: "5xx,reset,connect-failure"
num_retries: 3
per_try_timeout: 10s
clusters:
- name: user_service
connect_timeout: 5s
type: STRICT_DNS
lb_policy: LEAST_REQUEST # Least connections
# Circuit breaker
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 1000
max_requests: 1000
max_retries: 3
# Health checking
health_checks:
- timeout: 5s
interval: 10s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: "/health"
# Outlier detection (automatic ejection of unhealthy hosts)
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 50
load_assignment:
cluster_name: user_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: user-1.internal
port_value: 3000
- endpoint:
address:
socket_address:
address: user-2.internal
port_value: 3000
- endpoint:
address:
socket_address:
address: user-3.internal
port_value: 3000
Interview Questions
Q1: Client-side vs Server-side load balancing?
Q1: Client-side vs Server-side load balancing?
Answer:Server-side (e.g., NGINX, HAProxy):
- Single point for routing
- Simple clients
- Extra network hop
- Centralized control
- Client decides which server
- No extra hop
- More complex clients
- Better for microservices
- Server-side: External traffic, legacy clients
- Client-side: Service-to-service within cluster
- Hybrid: Edge LB + client-side internally
Q2: When would you use Consistent Hashing?
Q2: When would you use Consistent Hashing?
Answer:Use cases:
- Cache servers (minimize cache misses on scale)
- Session affinity without IP hash
- Partitioned data (same key → same server)
- Servers and keys mapped to same hash ring
- Key routed to next server clockwise
- Adding/removing server affects only neighbors
- Multiple positions per server for balance
- 100-200 virtual nodes per physical server
Q3: What's the difference between liveness and readiness probes?
Q3: What's the difference between liveness and readiness probes?
Answer:Liveness:
- “Is the process stuck?”
- Failure → container restart
- Should be simple (no dependencies)
- Example: Can the HTTP server respond?
- “Can it handle traffic?”
- Failure → remove from load balancer
- Can check dependencies
- Example: Is database connection ready?
Q4: How does Power of Two Choices (P2C) work?
Q4: How does Power of Two Choices (P2C) work?
Answer:Simple but effective algorithm:
- Pick 2 random servers
- Choose the one with fewer connections
- Avoids herd behavior (all clients picking same “best” server)
- O(1) complexity (no sorting)
- Statistical guarantees: max load ~log(log(n))
Chapter Summary
Key Takeaways:
- Server-side LB for external traffic, client-side for internal
- Algorithm choice depends on workload: Round-robin for simple, Least-connections for varying load
- Consistent hashing for caching and stateful services
- Implement both liveness and readiness probes
- Health checks should have appropriate timeouts
- Use circuit breakers with load balancing for resilience