Service Discovery
In a microservices architecture, services need to find and communicate with each other dynamically. Service discovery solves the problem of locating services in a constantly changing environment.Learning Objectives:
- Understand service discovery patterns
- Implement Consul-based discovery
- Use DNS-based service discovery
- Configure Kubernetes native discovery
- Build health-aware load balancing
Why Service Discovery?
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ THE PROBLEM: DYNAMIC INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ STATIC CONFIGURATION (doesn't work): │
│ ──────────────────────────────────── │
│ │
│ Order Service config: │
│ payment_service: http://192.168.1.10:3000 ◀── What if IP changes? │
│ inventory_service: http://192.168.1.11:3001 ◀── What if it scales? │
│ │
│ │
│ DYNAMIC CHALLENGES: │
│ ─────────────────── │
│ │
│ ┌─────────────┐ ┌─────────────────────────────────────────┐ │
│ │ Payment │ ───▶ │ Multiple instances, any can handle │ │
│ │ Service │ │ Instance 1: 10.0.0.5:3000 │ │
│ │ (3 instances) │ Instance 2: 10.0.0.6:3000 (new) │ │
│ └─────────────┘ │ Instance 3: 10.0.0.7:3000 (new) │ │
│ │ Instance 4: 10.0.0.4:3000 (terminated) │ │
│ └─────────────────────────────────────────┘ │
│ │
│ • Instances come and go │
│ • Auto-scaling adds/removes instances │
│ • Containers get new IPs on restart │
│ • Deployments create new instances │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Service Discovery Patterns
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE DISCOVERY PATTERNS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. CLIENT-SIDE DISCOVERY │
│ ───────────────────────── │
│ │
│ ┌────────┐ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Order │───▶│ Service │───▶│ Payment instances │ │
│ │Service │ │ Registry │ │ ├─ 10.0.0.5:3000 │ │
│ └────────┘ └──────────────┘ │ ├─ 10.0.0.6:3000 │ │
│ │ │ └─ 10.0.0.7:3000 │ │
│ │ └─────────────────────┘ │
│ │ ┌─────────────────────┐ │
│ └───────────▶│ Selected instance │ │
│ │ (client picks one) │ │
│ └─────────────────────┘ │
│ │
│ Pros: Client has full control, can implement smart routing │
│ Cons: Client complexity, language-specific │
│ │
│ │
│ 2. SERVER-SIDE DISCOVERY │
│ ───────────────────────── │
│ │
│ ┌────────┐ ┌───────────────┐ ┌─────────────────────┐ │
│ │ Order │───▶│ Load Balancer│───▶│ Payment instances │ │
│ │Service │ │ / Router │ │ ├─ 10.0.0.5:3000 │ │
│ └────────┘ └───────────────┘ │ ├─ 10.0.0.6:3000 │ │
│ │ │ └─ 10.0.0.7:3000 │ │
│ ▼ └─────────────────────┘ │
│ ┌──────────────┐ │
│ │ Service │ │
│ │ Registry │ │
│ └──────────────┘ │
│ │
│ Pros: Client simplicity, centralized control │
│ Cons: Additional hop, potential bottleneck │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Consul-Based Discovery
Consul is a full-featured service discovery solution with health checking.Setting Up Consul
Copy
# docker-compose.yml
version: '3.8'
services:
consul:
image: consul:latest
container_name: consul
ports:
- "8500:8500" # HTTP API/UI
- "8600:8600/udp" # DNS
command: agent -server -bootstrap-expect=1 -ui -client=0.0.0.0
volumes:
- consul-data:/consul/data
order-service:
build: ./order-service
environment:
- CONSUL_HOST=consul
- SERVICE_NAME=order-service
- SERVICE_PORT=3000
depends_on:
- consul
payment-service:
build: ./payment-service
environment:
- CONSUL_HOST=consul
- SERVICE_NAME=payment-service
- SERVICE_PORT=3000
depends_on:
- consul
deploy:
replicas: 3
volumes:
consul-data:
Service Registration
Copy
// discovery/ConsulRegistry.js
const Consul = require('consul');
class ConsulRegistry {
constructor(options = {}) {
this.consul = new Consul({
host: options.host || process.env.CONSUL_HOST || 'localhost',
port: options.port || 8500
});
this.serviceName = options.serviceName;
this.serviceId = `${options.serviceName}-${process.env.HOSTNAME || require('os').hostname()}`;
this.servicePort = options.servicePort;
this.checkInterval = options.checkInterval || '10s';
this.deregisterAfter = options.deregisterAfter || '1m';
}
async register() {
const registration = {
id: this.serviceId,
name: this.serviceName,
address: this.getServiceAddress(),
port: this.servicePort,
tags: ['node', 'api'],
check: {
http: `http://${this.getServiceAddress()}:${this.servicePort}/health`,
interval: this.checkInterval,
deregisterCriticalServiceAfter: this.deregisterAfter
}
};
try {
await this.consul.agent.service.register(registration);
console.log(`Service registered: ${this.serviceId}`);
// Handle graceful shutdown
this.setupGracefulShutdown();
} catch (error) {
console.error('Failed to register service:', error);
throw error;
}
}
async deregister() {
try {
await this.consul.agent.service.deregister(this.serviceId);
console.log(`Service deregistered: ${this.serviceId}`);
} catch (error) {
console.error('Failed to deregister service:', error);
}
}
getServiceAddress() {
// In Docker, use container hostname
if (process.env.DOCKER) {
return process.env.HOSTNAME;
}
// Get local IP
const interfaces = require('os').networkInterfaces();
for (const name of Object.keys(interfaces)) {
for (const iface of interfaces[name]) {
if (iface.family === 'IPv4' && !iface.internal) {
return iface.address;
}
}
}
return 'localhost';
}
setupGracefulShutdown() {
const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT'];
signals.forEach(signal => {
process.on(signal, async () => {
console.log(`Received ${signal}, deregistering service...`);
await this.deregister();
process.exit(0);
});
});
}
}
// Usage
const registry = new ConsulRegistry({
serviceName: 'order-service',
servicePort: 3000
});
await registry.register();
Service Discovery Client
Copy
// discovery/ConsulDiscovery.js
class ConsulDiscovery {
constructor(consul) {
this.consul = consul;
this.cache = new Map();
this.watchers = new Map();
}
async discover(serviceName, options = {}) {
const { healthy = true, cached = true } = options;
// Check cache first
if (cached && this.cache.has(serviceName)) {
const cacheEntry = this.cache.get(serviceName);
if (Date.now() - cacheEntry.timestamp < 5000) {
return cacheEntry.instances;
}
}
try {
const services = await this.consul.health.service({
service: serviceName,
passing: healthy
});
const instances = services.map(entry => ({
id: entry.Service.ID,
address: entry.Service.Address,
port: entry.Service.Port,
tags: entry.Service.Tags,
status: entry.Checks.every(c => c.Status === 'passing') ? 'healthy' : 'unhealthy'
}));
// Update cache
this.cache.set(serviceName, {
instances,
timestamp: Date.now()
});
return instances;
} catch (error) {
console.error(`Failed to discover ${serviceName}:`, error);
// Return cached if available
if (this.cache.has(serviceName)) {
return this.cache.get(serviceName).instances;
}
return [];
}
}
async watch(serviceName, callback) {
if (this.watchers.has(serviceName)) {
return;
}
const watch = this.consul.watch({
method: this.consul.health.service,
options: {
service: serviceName,
passing: true
}
});
watch.on('change', (data) => {
const instances = data.map(entry => ({
id: entry.Service.ID,
address: entry.Service.Address,
port: entry.Service.Port
}));
this.cache.set(serviceName, {
instances,
timestamp: Date.now()
});
callback(instances);
});
watch.on('error', (error) => {
console.error(`Watch error for ${serviceName}:`, error);
});
this.watchers.set(serviceName, watch);
}
stopWatch(serviceName) {
const watcher = this.watchers.get(serviceName);
if (watcher) {
watcher.end();
this.watchers.delete(serviceName);
}
}
}
Load Balancer with Discovery
Copy
// discovery/ServiceClient.js
class ServiceClient {
constructor(discovery, serviceName, options = {}) {
this.discovery = discovery;
this.serviceName = serviceName;
this.strategy = options.strategy || 'round-robin';
this.instances = [];
this.currentIndex = 0;
this.circuitBreakers = new Map();
// Start watching for changes
this.discovery.watch(serviceName, (instances) => {
console.log(`Service ${serviceName} updated:`, instances.length, 'instances');
this.instances = instances;
});
// Initial load
this.refresh();
}
async refresh() {
this.instances = await this.discovery.discover(this.serviceName);
}
selectInstance() {
const healthyInstances = this.instances.filter(i =>
!this.isCircuitOpen(i.id)
);
if (healthyInstances.length === 0) {
throw new Error(`No healthy instances of ${this.serviceName}`);
}
switch (this.strategy) {
case 'round-robin':
return this.roundRobin(healthyInstances);
case 'random':
return this.random(healthyInstances);
case 'least-connections':
return this.leastConnections(healthyInstances);
default:
return healthyInstances[0];
}
}
roundRobin(instances) {
const instance = instances[this.currentIndex % instances.length];
this.currentIndex++;
return instance;
}
random(instances) {
const index = Math.floor(Math.random() * instances.length);
return instances[index];
}
async request(path, options = {}) {
const instance = this.selectInstance();
const url = `http://${instance.address}:${instance.port}${path}`;
try {
const response = await fetch(url, {
...options,
timeout: 5000
});
this.recordSuccess(instance.id);
return response;
} catch (error) {
this.recordFailure(instance.id);
throw error;
}
}
// Circuit breaker integration
isCircuitOpen(instanceId) {
const cb = this.circuitBreakers.get(instanceId);
return cb && cb.state === 'OPEN';
}
recordSuccess(instanceId) {
const cb = this.getOrCreateCircuitBreaker(instanceId);
cb.failures = 0;
}
recordFailure(instanceId) {
const cb = this.getOrCreateCircuitBreaker(instanceId);
cb.failures++;
if (cb.failures >= 3) {
cb.state = 'OPEN';
cb.openTime = Date.now();
// Auto-reset after 30 seconds
setTimeout(() => {
cb.state = 'HALF_OPEN';
}, 30000);
}
}
getOrCreateCircuitBreaker(instanceId) {
if (!this.circuitBreakers.has(instanceId)) {
this.circuitBreakers.set(instanceId, {
state: 'CLOSED',
failures: 0
});
}
return this.circuitBreakers.get(instanceId);
}
}
// Usage
const discovery = new ConsulDiscovery(consul);
const paymentClient = new ServiceClient(discovery, 'payment-service', {
strategy: 'round-robin'
});
const response = await paymentClient.request('/api/payments', {
method: 'POST',
body: JSON.stringify({ amount: 100 })
});
DNS-Based Discovery
Simpler alternative using DNS for service discovery.Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ DNS-BASED DISCOVERY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ Order │──DNS──▶│ DNS │───────▶│ payment.service │ │
│ │ Service │ lookup │ Server │ │ → 10.0.0.5 │ │
│ └────────────┘ └─────────────┘ │ → 10.0.0.6 │ │
│ ▲ │ → 10.0.0.7 │ │
│ │ └──────────────────┘ │
│ ┌────────┴────────┐ │
│ │ Service Registry│ │
│ │ updates DNS │ │
│ └─────────────────┘ │
│ │
│ A Records: │
│ payment.service.consul → 10.0.0.5 │
│ payment.service.consul → 10.0.0.6 │
│ payment.service.consul → 10.0.0.7 │
│ │
│ SRV Records (with ports): │
│ _payment._tcp.service.consul → 10.0.0.5:3000 │
│ _payment._tcp.service.consul → 10.0.0.6:3000 │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
DNS Discovery Implementation
Copy
// discovery/DNSDiscovery.js
const dns = require('dns').promises;
class DNSDiscovery {
constructor(options = {}) {
this.suffix = options.suffix || '.service.consul';
this.dnsServer = options.dnsServer || '127.0.0.1';
this.dnsPort = options.dnsPort || 8600;
this.cache = new Map();
this.cacheTtl = options.cacheTtl || 10000;
// Configure DNS resolver
dns.setServers([`${this.dnsServer}:${this.dnsPort}`]);
}
async discover(serviceName) {
const hostname = `${serviceName}${this.suffix}`;
// Check cache
const cached = this.cache.get(hostname);
if (cached && Date.now() - cached.timestamp < this.cacheTtl) {
return cached.instances;
}
try {
// A record lookup for IP addresses
const addresses = await dns.resolve4(hostname);
const instances = addresses.map(address => ({
address,
port: 3000 // Default port, or use SRV records
}));
this.cache.set(hostname, {
instances,
timestamp: Date.now()
});
return instances;
} catch (error) {
if (error.code === 'ENOTFOUND') {
console.warn(`Service not found: ${serviceName}`);
return [];
}
throw error;
}
}
async discoverWithPort(serviceName) {
// SRV record lookup for address AND port
const hostname = `_${serviceName}._tcp${this.suffix}`;
try {
const records = await dns.resolveSrv(hostname);
return records.map(record => ({
address: record.name,
port: record.port,
priority: record.priority,
weight: record.weight
}));
} catch (error) {
console.error('SRV lookup failed:', error);
return this.discover(serviceName); // Fallback to A records
}
}
}
// Usage
const dnsDiscovery = new DNSDiscovery({
dnsServer: 'consul',
dnsPort: 8600
});
const paymentInstances = await dnsDiscovery.discover('payment');
// [{ address: '10.0.0.5', port: 3000 }, { address: '10.0.0.6', port: 3000 }]
Kubernetes Service Discovery
Kubernetes provides built-in service discovery.Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ KUBERNETES SERVICE DISCOVERY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ KUBERNETES CLUSTER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │ │
│ │ │ Order │────▶│ payment │────▶│ Payment Pods │ │ │
│ │ │ Pod │ │ (Service) │ │ ├── pod-1 │ │ │
│ │ └─────────────┘ │ │ │ ├── pod-2 │ │ │
│ │ │ ClusterIP │ │ └── pod-3 │ │ │
│ │ │10.96.0.100 │ └─────────────────────┘ │ │
│ │ └─────────────┘ │ │
│ │ ▲ │ │
│ │ │ │ │
│ │ ┌─────┴─────┐ │ │
│ │ │ CoreDNS │ │ │
│ │ └───────────┘ │ │
│ │ │ │
│ │ DNS Resolution: │ │
│ │ payment.default.svc.cluster.local → 10.96.0.100 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Kubernetes Service Configuration
Copy
# k8s/payment-service.yaml
apiVersion: v1
kind: Service
metadata:
name: payment
namespace: default
spec:
selector:
app: payment
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment
spec:
replicas: 3
selector:
matchLabels:
app: payment
template:
metadata:
labels:
app: payment
spec:
containers:
- name: payment
image: payment-service:latest
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
Using Kubernetes DNS
Copy
// In Kubernetes, just use the service name!
const paymentUrl = process.env.PAYMENT_URL || 'http://payment';
// Or use full DNS name
const fullDns = 'http://payment.default.svc.cluster.local';
// Cross-namespace
const orderUrl = 'http://order.other-namespace.svc.cluster.local';
Headless Services for Direct Pod Access
Copy
# k8s/payment-headless.yaml
apiVersion: v1
kind: Service
metadata:
name: payment-headless
spec:
clusterIP: None # Headless service
selector:
app: payment
ports:
- port: 3000
Copy
// discovery/K8sDiscovery.js
const dns = require('dns').promises;
class K8sDiscovery {
async discoverPods(serviceName, namespace = 'default') {
// Headless service DNS returns all pod IPs
const hostname = `${serviceName}-headless.${namespace}.svc.cluster.local`;
try {
const addresses = await dns.resolve4(hostname);
return addresses.map(ip => ({
address: ip,
port: 3000
}));
} catch (error) {
console.error(`Failed to discover ${serviceName}:`, error);
return [];
}
}
}
// Get individual pod IPs for custom load balancing
const discovery = new K8sDiscovery();
const pods = await discovery.discoverPods('payment');
// [{ address: '10.244.0.5', port: 3000 }, { address: '10.244.0.6', port: 3000 }]
Kubernetes Client for Service Discovery
Copy
// discovery/K8sClient.js
const k8s = require('@kubernetes/client-node');
class K8sServiceDiscovery {
constructor() {
const kc = new k8s.KubeConfig();
kc.loadFromCluster(); // When running in cluster
// kc.loadFromDefault(); // For local development
this.coreApi = kc.makeApiClient(k8s.CoreV1Api);
}
async getEndpoints(serviceName, namespace = 'default') {
try {
const response = await this.coreApi.readNamespacedEndpoints(
serviceName,
namespace
);
const endpoints = [];
for (const subset of response.body.subsets || []) {
for (const address of subset.addresses || []) {
for (const port of subset.ports || []) {
endpoints.push({
ip: address.ip,
port: port.port,
nodeName: address.nodeName,
ready: true
});
}
}
// Include not-ready endpoints if needed
for (const address of subset.notReadyAddresses || []) {
for (const port of subset.ports || []) {
endpoints.push({
ip: address.ip,
port: port.port,
nodeName: address.nodeName,
ready: false
});
}
}
}
return endpoints;
} catch (error) {
console.error('Failed to get endpoints:', error);
return [];
}
}
async watchEndpoints(serviceName, namespace, callback) {
const watch = new k8s.Watch(this.kc);
await watch.watch(
`/api/v1/namespaces/${namespace}/endpoints`,
{ fieldSelector: `metadata.name=${serviceName}` },
(type, endpoint) => {
console.log(`Endpoint ${type}:`, endpoint.metadata.name);
callback(type, endpoint);
},
(err) => {
console.error('Watch error:', err);
// Reconnect
setTimeout(() => this.watchEndpoints(serviceName, namespace, callback), 5000);
}
);
}
}
Service Mesh Discovery
For complex scenarios, use a service mesh like Istio or Linkerd.Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE MESH (ISTIO) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Istio Control Plane │ │
│ │ ┌─────────┐ ┌──────────┐ ┌───────────┐ ┌─────────────┐ │ │
│ │ │ Pilot │ │ Citadel │ │ Galley │ │ Mixer │ │ │
│ │ │(routing)│ │ (certs) │ │ (config) │ │ (telemetry) │ │ │
│ │ └────┬────┘ └──────────┘ └───────────┘ └─────────────┘ │ │
│ └───────┼───────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────┼───────────────────────────────────────────────────────────────┐ │
│ │ ▼ Data Plane │ │
│ │ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ Order Pod │ │ Payment Pod │ │ │
│ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │
│ │ │ │ Envoy │◀├──────────────┤▶│ Envoy │ │ │ │
│ │ │ │ Proxy │ │ mTLS + │ │ Proxy │ │ │ │
│ │ │ └────────────┘ │ Discovery │ └────────────┘ │ │ │
│ │ │ ┌────────────┐ │ │ ┌────────────┐ │ │ │
│ │ │ │ Order │ │ │ │ Payment │ │ │ │
│ │ │ │ Service │ │ │ │ Service │ │ │ │
│ │ │ └────────────┘ │ │ └────────────┘ │ │ │
│ │ └────────────────┘ └────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ • Service discovery handled by Envoy sidecars │
│ • No code changes needed │
│ • Automatic mTLS, retries, circuit breaking │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Comparison
| Aspect | Consul | DNS-Based | Kubernetes | Service Mesh |
|---|---|---|---|---|
| Setup | Medium | Simple | Built-in | Complex |
| Health Checks | Yes | Limited | Probes | Yes |
| Load Balancing | Client | DNS round-robin | kube-proxy | Envoy |
| Dynamic Updates | Real-time | TTL-based | Real-time | Real-time |
| Multi-cluster | Yes | Limited | With tools | Yes |
| Code Changes | Some | Minimal | Minimal | None |
Interview Questions
Q1: Explain client-side vs server-side discovery
Q1: Explain client-side vs server-side discovery
Answer:Client-Side Discovery:
- Client queries registry directly
- Client implements load balancing logic
- More control, but more complexity per client
- Examples: Netflix Ribbon, custom implementations
- Client calls load balancer/router
- Router queries registry and forwards request
- Simpler clients, centralized control
- Examples: AWS ALB, Kubernetes Services, Nginx
- Client-side: Better for smart routing, custom logic
- Server-side: Simpler clients, potential bottleneck
Q2: How does Kubernetes service discovery work?
Q2: How does Kubernetes service discovery work?
Answer:Components:
- Service - Stable IP (ClusterIP) for pod group
- CoreDNS - Resolves service names to IPs
- kube-proxy - Maintains iptables/IPVS rules
- Endpoints - Tracks healthy pod IPs
- Pod calls
payment.default.svc.cluster.local - CoreDNS resolves to Service ClusterIP
- kube-proxy routes to healthy pod IP
- Readiness probes ensure only healthy pods receive traffic
clusterIP: None- DNS returns individual pod IPs
- For stateful apps or custom load balancing
Q3: What is service registration and how to handle failures?
Q3: What is service registration and how to handle failures?
Answer:Registration Methods:
- Self-registration - Service registers itself
- Third-party - Orchestrator registers service
- Sidecar - Proxy handles registration
- Health checks - Registry removes unhealthy instances
- TTL/Heartbeat - Auto-deregister on missed beats
- Graceful shutdown - Deregister before terminating
- Circuit breakers - Client-side protection
- Always implement health endpoints
- Use graceful shutdown hooks
- Cache discovery results with TTL
- Have fallback for registry unavailability
Summary
Key Takeaways
- Service discovery enables dynamic routing
- Client-side vs server-side trade-offs
- Consul provides full-featured discovery
- Kubernetes has built-in discovery
- Health checks ensure traffic goes to healthy instances
Next Steps
In the next chapter, we’ll explore Observability - logging, metrics, and distributed tracing.