Service Mesh
A service mesh provides infrastructure-layer functionality for service-to-service communication, taking networking concerns out of your application code.Learning Objectives:
- Understand service mesh architecture and benefits
- Implement Istio for traffic management
- Configure mTLS automation
- Set up advanced traffic patterns (canary, A/B testing)
- Compare Istio vs Linkerd
What is a Service Mesh?
The Problem
As microservices grow, managing cross-cutting concerns becomes complex:Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITHOUT SERVICE MESH │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Service A │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Application Code │ │ │
│ │ ├─────────────────────────────────────────────────┤ │ │
│ │ │ + Retry Logic │ │ │
│ │ │ + Circuit Breaker │ │ │
│ │ │ + Load Balancing │ │ │
│ │ │ + TLS/mTLS │ │ │
│ │ │ + Metrics Collection │ │ │
│ │ │ + Tracing │ │ │
│ │ │ + Rate Limiting │ │ │
│ │ │ + Service Discovery │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ⚠️ Problems: │
│ • Every service needs same networking code │
│ • Different languages = different implementations │
│ • Hard to update consistently │
│ • Application developers handle infrastructure │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────────────┐
│ WITH SERVICE MESH │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Service A │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Application Code │ │ │
│ │ │ (Pure Business Logic Only) │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ Sidecar Proxy (Envoy) │ │ │
│ │ │ ✓ Retry, Circuit Breaker, Load Balancing │ │ │
│ │ │ ✓ mTLS, Auth, Rate Limiting │ │ │
│ │ │ ✓ Metrics, Tracing, Logging │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ✅ Benefits: │
│ • Networking handled by infrastructure │
│ • Language agnostic │
│ • Centralized policy management │
│ • Consistent across all services │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Service Mesh Architecture
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ SERVICE MESH ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────┐ │
│ │ CONTROL PLANE │ │
│ │ ┌────────────────┐ │ │
│ │ │ Pilot/Istiod │ │ ← Configuration │
│ │ │ (Config) │ │ │
│ │ ├────────────────┤ │ │
│ │ │ Citadel │ │ ← Certificate Mgmt │
│ │ │ (Security) │ │ │
│ │ ├────────────────┤ │ │
│ │ │ Galley │ │ ← Validation │
│ │ │ (Validation) │ │ │
│ │ └────────────────┘ │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌─────────────────────┼─────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
│ │ Service A │ │ Service B │ │ Service C │ │
│ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │
│ │ │ App │ │ │ │ App │ │ │ │ App │ │ │
│ │ └──────┬──────┘ │ │ └──────┬──────┘ │ │ └──────┬──────┘ │ │
│ │ │ │ │ │ │ │ │ │ │
│ │ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │ │ ┌──────▼──────┐ │ │
│ │ │ Envoy │◀─┼─┼─▶│ Envoy │◀─┼─┼─▶│ Envoy │ │ │
│ │ │ (Sidecar) │ │ │ │ (Sidecar) │ │ │ │ (Sidecar) │ │ │
│ │ └─────────────┘ │ │ └─────────────┘ │ │ └─────────────┘ │ │
│ └───────────────────┘ └───────────────────┘ └───────────────────┘ │
│ │ │ │ │
│ └─────────────────────┴─────────────────────┘ │
│ DATA PLANE │
│ (Sidecar Proxies - Envoy) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Istio Deep Dive
Installing Istio
Copy
# Download Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.20.0
export PATH=$PWD/bin:$PATH
# Install with demo profile (includes all features)
istioctl install --set profile=demo -y
# Enable automatic sidecar injection for default namespace
kubectl label namespace default istio-injection=enabled
# Verify installation
kubectl get pods -n istio-system
Deploying a Service with Istio
Copy
# deployment.yaml - No changes needed, Istio injects sidecar automatically
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
labels:
app: order-service
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: order-service
version: v1
template:
metadata:
labels:
app: order-service
version: v1
spec:
containers:
- name: order-service
image: myregistry/order-service:v1
ports:
- containerPort: 3000
env:
- name: PORT
value: "3000"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
name: order-service
labels:
app: order-service
spec:
ports:
- port: 80
targetPort: 3000
name: http
selector:
app: order-service
Traffic Management
Virtual Services
Control how requests are routed:Copy
# virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service
spec:
hosts:
- order-service
http:
# Route based on headers
- match:
- headers:
x-user-type:
exact: premium
route:
- destination:
host: order-service
subset: v2 # Premium users get v2
# Default route
- route:
- destination:
host: order-service
subset: v1
weight: 90 # 90% to v1
- destination:
host: order-service
subset: v2
weight: 10 # 10% canary to v2
timeout: 10s
retries:
attempts: 3
perTryTimeout: 3s
retryOn: 5xx,reset,connect-failure
Destination Rules
Define subsets and policies:Copy
# destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: order-service
spec:
host: order-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: UPGRADE
http1MaxPendingRequests: 100
http2MaxRequests: 1000
loadBalancer:
simple: LEAST_CONN # ROUND_ROBIN, RANDOM, PASSTHROUGH
outlierDetection:
consecutive5xxErrors: 5
interval: 10s
baseEjectionTime: 30s
maxEjectionPercent: 50
subsets:
- name: v1
labels:
version: v1
trafficPolicy:
connectionPool:
http:
http1MaxPendingRequests: 50
- name: v2
labels:
version: v2
Canary Deployments with Istio
Copy
# Step 1: Deploy v2 alongside v1
# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service-v2
spec:
replicas: 1 # Start with 1 replica
selector:
matchLabels:
app: order-service
version: v2
template:
metadata:
labels:
app: order-service
version: v2
spec:
containers:
- name: order-service
image: myregistry/order-service:v2
ports:
- containerPort: 3000
---
# Step 2: Route 10% traffic to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: order-service-canary
spec:
hosts:
- order-service
http:
- route:
- destination:
host: order-service
subset: v1
weight: 90
- destination:
host: order-service
subset: v2
weight: 10
Copy
// scripts/canary-rollout.js
const k8s = require('@kubernetes/client-node');
class CanaryRollout {
constructor(serviceName, namespace = 'default') {
this.serviceName = serviceName;
this.namespace = namespace;
const kc = new k8s.KubeConfig();
kc.loadFromDefault();
this.customApi = kc.makeApiClient(k8s.CustomObjectsApi);
}
async updateTrafficSplit(v1Weight, v2Weight) {
const virtualService = {
apiVersion: 'networking.istio.io/v1beta1',
kind: 'VirtualService',
metadata: {
name: `${this.serviceName}-canary`,
namespace: this.namespace
},
spec: {
hosts: [this.serviceName],
http: [{
route: [
{
destination: {
host: this.serviceName,
subset: 'v1'
},
weight: v1Weight
},
{
destination: {
host: this.serviceName,
subset: 'v2'
},
weight: v2Weight
}
]
}]
}
};
try {
await this.customApi.patchNamespacedCustomObject(
'networking.istio.io',
'v1beta1',
this.namespace,
'virtualservices',
`${this.serviceName}-canary`,
virtualService,
undefined,
undefined,
undefined,
{ headers: { 'Content-Type': 'application/merge-patch+json' } }
);
console.log(`Traffic split updated: v1=${v1Weight}%, v2=${v2Weight}%`);
} catch (error) {
console.error('Failed to update traffic split:', error);
throw error;
}
}
async progressiveRollout(steps = [10, 25, 50, 75, 100], intervalMinutes = 5) {
for (const v2Percentage of steps) {
const v1Percentage = 100 - v2Percentage;
console.log(`\n🚀 Rolling out: ${v2Percentage}% to v2`);
await this.updateTrafficSplit(v1Percentage, v2Percentage);
// Wait and monitor
console.log(`⏳ Waiting ${intervalMinutes} minutes before next step...`);
console.log('📊 Monitor metrics at: http://localhost:3000/grafana');
if (v2Percentage < 100) {
await this.waitAndMonitor(intervalMinutes);
// Check error rates before continuing
const healthy = await this.checkHealth();
if (!healthy) {
console.log('❌ Health check failed! Rolling back...');
await this.rollback();
return false;
}
}
}
console.log('\n✅ Canary rollout complete! v2 receiving 100% traffic');
return true;
}
async rollback() {
console.log('🔄 Rolling back to v1...');
await this.updateTrafficSplit(100, 0);
console.log('✅ Rollback complete. All traffic to v1');
}
async waitAndMonitor(minutes) {
const ms = minutes * 60 * 1000;
await new Promise(resolve => setTimeout(resolve, ms));
}
async checkHealth() {
// Query Prometheus for error rates
try {
const response = await fetch(
`http://prometheus:9090/api/v1/query?query=` +
`sum(rate(istio_requests_total{destination_service="${this.serviceName}",` +
`response_code=~"5.*",destination_version="v2"}[5m]))` +
`/ sum(rate(istio_requests_total{destination_service="${this.serviceName}",` +
`destination_version="v2"}[5m]))`
);
const data = await response.json();
const errorRate = parseFloat(data.data.result[0]?.value[1] || 0);
console.log(`📈 v2 error rate: ${(errorRate * 100).toFixed(2)}%`);
return errorRate < 0.01; // Less than 1% error rate
} catch (error) {
console.log('⚠️ Could not fetch metrics, continuing...');
return true;
}
}
}
// Usage
const canary = new CanaryRollout('order-service');
canary.progressiveRollout([10, 25, 50, 75, 100], 5);
mTLS (Mutual TLS)
Automatic encryption between services:Copy
# peer-authentication.yaml - Enable strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: default
spec:
mtls:
mode: STRICT # PERMISSIVE allows plaintext, STRICT requires mTLS
---
# authorization-policy.yaml - Service-to-service authorization
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: order-service-policy
namespace: default
spec:
selector:
matchLabels:
app: order-service
action: ALLOW
rules:
- from:
- source:
principals:
- "cluster.local/ns/default/sa/api-gateway"
- "cluster.local/ns/default/sa/payment-service"
to:
- operation:
methods: ["GET", "POST"]
paths: ["/api/v1/orders/*"]
- from:
- source:
principals:
- "cluster.local/ns/monitoring/sa/prometheus"
to:
- operation:
methods: ["GET"]
paths: ["/metrics"]
Circuit Breaking with Istio
Copy
# circuit-breaker.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: payment-service-circuit-breaker
spec:
host: payment-service
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100 # Max TCP connections
http:
http1MaxPendingRequests: 100 # Queue size
http2MaxRequests: 1000 # Max concurrent requests
maxRequestsPerConnection: 10 # Connection reuse
maxRetries: 3 # Max retries
outlierDetection:
# Circuit breaker configuration
consecutive5xxErrors: 5 # Eject after 5 consecutive 5xx
consecutiveGatewayErrors: 5 # Eject after 5 gateway errors
interval: 10s # Detection interval
baseEjectionTime: 30s # Base ejection time
maxEjectionPercent: 50 # Max % of hosts to eject
minHealthPercent: 30 # Min healthy hosts before breaking
Rate Limiting
Copy
# rate-limit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: api-rate-limit
namespace: default
spec:
workloadSelector:
labels:
app: api-gateway
configPatches:
- applyTo: HTTP_FILTER
match:
context: SIDECAR_INBOUND
listener:
filterChain:
filter:
name: "envoy.filters.network.http_connection_manager"
patch:
operation: INSERT_BEFORE
value:
name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 1000
tokens_per_fill: 100
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value:
numerator: 100
denominator: HUNDRED
response_headers_to_add:
- append: false
header:
key: x-rate-limit-remaining
value: "%DYNAMIC_METADATA(envoy.filters.http.local_ratelimit:remaining_tokens)%"
Istio vs Linkerd Comparison
Copy
┌─────────────────────────────────────────────────────────────────────────────┐
│ ISTIO vs LINKERD COMPARISON │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Feature │ Istio │ Linkerd │
│ ─────────────────────┼────────────────────────┼────────────────────────────│
│ Proxy │ Envoy (C++) │ linkerd2-proxy (Rust) │
│ Resource Usage │ Higher │ Lower (~10x less) │
│ Latency Overhead │ ~2-3ms │ ~1ms │
│ Complexity │ High │ Low │
│ Learning Curve │ Steep │ Gentle │
│ Features │ Very extensive │ Focused, simpler │
│ mTLS │ Yes │ Yes (default on) │
│ Traffic Management │ Advanced │ Basic │
│ Multi-cluster │ Yes │ Yes │
│ Web UI │ Kiali (separate) │ Built-in dashboard │
│ Best For │ Large enterprises │ Kubernetes-native teams │
│ │
│ Choose Istio if: │
│ • Need advanced traffic management │
│ • Already using Envoy │
│ • Enterprise with complex requirements │
│ • Need extensive customization │
│ │
│ Choose Linkerd if: │
│ • Want simplicity and low overhead │
│ • Kubernetes-only environment │
│ • Need quick implementation │
│ • Resource-constrained clusters │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Linkerd Quick Start
Copy
# Install Linkerd CLI
curl -fsL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin
# Validate cluster
linkerd check --pre
# Install Linkerd
linkerd install | kubectl apply -f -
linkerd check
# Inject sidecar into deployment
kubectl get deploy order-service -o yaml | linkerd inject - | kubectl apply -f -
# Or enable auto-injection for namespace
kubectl annotate namespace default linkerd.io/inject=enabled
# Access dashboard
linkerd viz install | kubectl apply -f -
linkerd viz dashboard &
Observability with Service Mesh
Automatic Metrics
Istio automatically generates:Copy
# Request rate by service
sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service_name)
# Error rate
sum(rate(istio_requests_total{reporter="destination",response_code=~"5.*"}[5m]))
by (destination_service_name)
/
sum(rate(istio_requests_total{reporter="destination"}[5m]))
by (destination_service_name)
# P99 latency
histogram_quantile(0.99,
sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m]))
by (destination_service_name, le)
)
# Traffic between services
sum(rate(istio_requests_total{reporter="source"}[5m]))
by (source_workload, destination_service_name)
Distributed Tracing Integration
Copy
# Enable tracing with Jaeger
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istio-config
spec:
meshConfig:
enableTracing: true
defaultConfig:
tracing:
sampling: 100.0 # 100% sampling for dev, reduce in prod
zipkin:
address: jaeger-collector.istio-system.svc:9411
Service Mesh Interview Questions
Q1: What is a service mesh and when would you use one?
Q1: What is a service mesh and when would you use one?
Answer:A service mesh is an infrastructure layer that handles service-to-service communication, providing:
- Traffic management: Load balancing, routing, retries
- Security: mTLS, authorization
- Observability: Metrics, tracing, logging
- 10+ microservices
- Need consistent security policies
- Multiple languages/frameworks
- Complex traffic patterns
- Small number of services
- Simple architecture
- Resource constraints
- Team unfamiliar with Kubernetes
Q2: Explain the sidecar pattern
Q2: Explain the sidecar pattern
Answer:The sidecar pattern deploys a helper container alongside the main application container in the same pod:Benefits:
Copy
┌─────────── Pod ───────────┐
│ ┌────────┐ ┌─────────┐ │
│ │ App │◀▶│ Sidecar │ │
│ │ │ │ (Envoy) │ │
│ └────────┘ └─────────┘ │
└───────────────────────────┘
- App doesn’t need networking code
- Language agnostic
- Updated independently
- Consistent behavior
- Latency overhead (1-3ms)
- Resource consumption
- Complexity
Q3: How does mTLS work in a service mesh?
Q3: How does mTLS work in a service mesh?
Answer:Benefits:
- Certificate Authority (CA) generates root certificate
- Each service gets unique certificate (SPIFFE identity)
- Sidecars automatically handle TLS handshake
- Both client and server verify each other’s certificates
Copy
Service A ──────[mTLS]────── Service B
│ │
├─ Client Certificate ├─ Server Certificate
└─ Verify Server Cert └─ Verify Client Cert
- Zero code changes
- Automatic rotation
- Service identity verification
- Encrypted in transit
Q4: How do you implement canary deployments with Istio?
Q4: How do you implement canary deployments with Istio?
Answer:Key metrics to watch:
- Deploy new version alongside old
- Create VirtualService with weight-based routing
- Gradually shift traffic (10% → 25% → 50% → 100%)
- Monitor error rates at each step
- Rollback if issues detected
Copy
spec:
http:
- route:
- destination:
host: my-service
subset: v1
weight: 90
- destination:
host: my-service
subset: v2
weight: 10
- Error rates (5xx responses)
- Latency percentiles
- Business metrics
Q5: What's the difference between Istio and Linkerd?
Q5: What's the difference between Istio and Linkerd?
Answer:
Choose Istio: Complex requirements, multi-cluster, extensive customization
Choose Linkerd: Simplicity, Kubernetes-only, resource constraints
| Aspect | Istio | Linkerd |
|---|---|---|
| Complexity | High | Low |
| Resource usage | Higher | Lower |
| Latency | ~2-3ms | ~1ms |
| Features | Extensive | Focused |
| Proxy | Envoy (C++) | Rust-based |
Best Practices
Start Simple
Begin with basic features (mTLS, observability), add complexity gradually
Test Thoroughly
Service mesh adds latency - load test before production
Monitor Resources
Sidecar proxies consume CPU/memory - size appropriately
Plan for Failures
Service mesh itself can fail - have runbooks ready
Chapter Summary
Key Takeaways:
- Service mesh moves networking from app code to infrastructure
- Istio provides advanced traffic management and security
- mTLS ensures encrypted, authenticated service communication
- Canary deployments enable safe progressive rollouts
- Choose between Istio and Linkerd based on complexity needs