As applications scale, single servers can’t handle the load. Load balancers distribute traffic across multiple servers, while proxies act as intermediaries for various purposes. This module covers both in depth.Think of a load balancer like a host at a busy restaurant. When customers (requests) arrive, the host does not send everyone to the same waiter (server). They distribute customers across all available waiters based on who is least busy, who can handle the largest parties, or simply in rotation. Without the host, all customers would crowd around one waiter while the others stood idle.
Server A (weight 3): Gets 3 out of 6 requestsServer B (weight 2): Gets 2 out of 6 requestsServer C (weight 1): Gets 1 out of 6 requestsSequence: A, A, A, B, B, C, A, A, A, B, B, C...
Operates on TCP/UDP level. Sees IP addresses and ports only. It does not terminate the TCP connection or inspect the payload — it simply forwards packets based on IP and port information.
Client ──TCP──→ Load Balancer ──TCP──→ Server │ Sees: Source IP, Dest IP Source Port, Dest Port Cannot see: HTTP headers, URLs, cookies
Pros: Very fast (no packet inspection), simple, handles any TCP/UDP protocol (not just HTTP)
Cons: Limited routing options, no content-based decisions, cannot do SSL terminationAnalogy: A Layer 4 LB is like a highway toll booth that reads license plates and directs cars to different lanes, but never looks inside the car.
Operates on HTTP/HTTPS level. The load balancer terminates the client’s TCP connection, reads the full HTTP request, makes a routing decision, and then opens a new TCP connection to the backend server.
Pros: Smart routing (route /api/* to API servers, /static/* to CDN), SSL termination, caching, compression, header injection (X-Forwarded-For), A/B testing
Cons: More processing overhead, higher latency (must parse HTTP), only works for HTTP/HTTPSAnalogy: A Layer 7 LB is like a hotel concierge who reads your request, understands what you need, and directs you to the right department — they actually open and read the letter.
Which should you use? Default to Layer 7 for web applications — the routing flexibility and SSL termination are worth the small overhead. Use Layer 4 for non-HTTP protocols (databases, game servers, custom TCP services) or when you need maximum throughput and do not need content-based routing.
Encryption is CPU-intensive. Every server doing SSL:
Client ──HTTPS──► LB ──HTTPS──► Server 1 (decrypts) ──HTTPS──► Server 2 (decrypts) ──HTTPS──► Server 3 (decrypts)Each server needs:- SSL certificates- CPU for encryption/decryption
PASSTHROUGH (L4):Client ──HTTPS──► LB ──HTTPS──► Server (just forwards, doesn't decrypt)TERMINATION:Client ──HTTPS──► LB ──HTTP──► Server (decrypts, internal is plain)RE-ENCRYPTION:Client ──HTTPS──► LB ──HTTPS──► Server (decrypts, re-encrypts for backend)
When to use which: Passthrough is for when compliance requires end-to-end encryption and the LB must not see plaintext — but you lose L7 routing. Termination is the most common choice: it lets the LB do smart routing and offloads CPU from backends, and internal traffic is usually on a trusted private network. Re-encryption is the compromise: the LB can still inspect and route traffic, but internal traffic is also encrypted — required by some compliance regimes (PCI DSS, HIPAA) even within private networks.
Troubleshooting load balancer issues: If your LB returns 502 (Bad Gateway), the LB is healthy but cannot reach the backend — check backend health, security groups, and whether the backend is listening on the expected port. If it returns 503 (Service Unavailable), all backends have failed health checks. If you see uneven traffic distribution, verify your health check configuration — a backend that is slow but technically “healthy” will keep receiving traffic. Use curl -H "Host: your-domain.com" http://LB-IP to test directly against the LB, bypassing DNS.
All servers share sessions — any server can handle any request. This is the preferred pattern because it makes your backend truly stateless and horizontally scalable. If a server dies, no sessions are lost.
Best practice: Design your applications to be stateless from the start. Store sessions in Redis, DynamoDB, or a similar external store. This eliminates the need for sticky sessions entirely and makes scaling, deployment, and failover dramatically simpler. Sticky sessions are a crutch for stateful applications — they work, but they limit your ability to scale and create uneven load distribution.
# Nginx exampleupstream backend { server 10.0.1.1:8080; server 10.0.1.2:8080; server 10.0.1.3:8080; # Health check parameters health_check interval=5s fails=3 passes=2;}
Your application should expose health endpoints. Think of these like a doctor’s checkup — each endpoint tests something different:
/health -> Simple "I'm alive" (heartbeat -- am I running at all?)/health/ready -> "I'm ready to serve traffic" (can I handle requests? DB connected? Cache warm?)/health/live -> "I'm running (not deadlocked)" (is my process responsive, even if dependencies are down?)
The distinction between liveness and readiness matters in Kubernetes especially. A pod that fails the liveness check gets restarted (the process is stuck). A pod that fails the readiness check stops receiving traffic but is not restarted (it might be warming up or waiting for a dependency). Confusing these two causes either unnecessary restarts or traffic being sent to pods that cannot handle it.
Route based on user attributes:User in test group A → Server with feature XUser in test group B → Server without feature XAnalyze which performs better