Skip to main content

Service Mesh

A service mesh provides infrastructure-layer functionality for service-to-service communication, taking networking concerns out of your application code.
Learning Objectives:
  • Understand service mesh architecture and benefits
  • Implement Istio for traffic management
  • Configure mTLS automation
  • Set up advanced traffic patterns (canary, A/B testing)
  • Compare Istio vs Linkerd

What is a Service Mesh?

The Problem

As microservices grow, managing cross-cutting concerns becomes complex:
┌─────────────────────────────────────────────────────────────────────────────┐
│                    WITHOUT SERVICE MESH                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌───────────────────────────────────────────────────────┐                  │
│  │                    Service A                          │                  │
│  │  ┌─────────────────────────────────────────────────┐  │                  │
│  │  │              Application Code                   │  │                  │
│  │  ├─────────────────────────────────────────────────┤  │                  │
│  │  │  + Retry Logic                                  │  │                  │
│  │  │  + Circuit Breaker                              │  │                  │
│  │  │  + Load Balancing                               │  │                  │
│  │  │  + TLS/mTLS                                     │  │                  │
│  │  │  + Metrics Collection                           │  │                  │
│  │  │  + Tracing                                      │  │                  │
│  │  │  + Rate Limiting                                │  │                  │
│  │  │  + Service Discovery                            │  │                  │
│  │  └─────────────────────────────────────────────────┘  │                  │
│  └───────────────────────────────────────────────────────┘                  │
│                                                                              │
│  ⚠️ Problems:                                                                │
│  • Every service needs same networking code                                 │
│  • Different languages = different implementations                          │
│  • Hard to update consistently                                              │
│  • Application developers handle infrastructure                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────────────┐
│                      WITH SERVICE MESH                                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌───────────────────────────────────────────────────────┐                  │
│  │                    Service A                          │                  │
│  │  ┌─────────────────────────────────────────────────┐  │                  │
│  │  │              Application Code                   │  │                  │
│  │  │         (Pure Business Logic Only)              │  │                  │
│  │  └─────────────────────────────────────────────────┘  │                  │
│  │  ┌─────────────────────────────────────────────────┐  │                  │
│  │  │              Sidecar Proxy (Envoy)              │  │                  │
│  │  │  ✓ Retry, Circuit Breaker, Load Balancing      │  │                  │
│  │  │  ✓ mTLS, Auth, Rate Limiting                   │  │                  │
│  │  │  ✓ Metrics, Tracing, Logging                   │  │                  │
│  │  └─────────────────────────────────────────────────┘  │                  │
│  └───────────────────────────────────────────────────────┘                  │
│                                                                              │
│  ✅ Benefits:                                                                │
│  • Networking handled by infrastructure                                     │
│  • Language agnostic                                                        │
│  • Centralized policy management                                            │
│  • Consistent across all services                                           │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Service Mesh Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                     SERVICE MESH ARCHITECTURE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│                         ┌──────────────────────┐                            │
│                         │    CONTROL PLANE     │                            │
│                         │  ┌────────────────┐  │                            │
│                         │  │   Pilot/Istiod │  │ ← Configuration           │
│                         │  │   (Config)     │  │                            │
│                         │  ├────────────────┤  │                            │
│                         │  │    Citadel     │  │ ← Certificate Mgmt        │
│                         │  │   (Security)   │  │                            │
│                         │  ├────────────────┤  │                            │
│                         │  │    Galley      │  │ ← Validation              │
│                         │  │  (Validation)  │  │                            │
│                         │  └────────────────┘  │                            │
│                         └──────────┬───────────┘                            │
│                                    │                                         │
│              ┌─────────────────────┼─────────────────────┐                  │
│              │                     │                     │                  │
│              ▼                     ▼                     ▼                  │
│  ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐         │
│  │    Service A      │ │    Service B      │ │    Service C      │         │
│  │  ┌─────────────┐  │ │  ┌─────────────┐  │ │  ┌─────────────┐  │         │
│  │  │    App      │  │ │  │    App      │  │ │  │    App      │  │         │
│  │  └──────┬──────┘  │ │  └──────┬──────┘  │ │  └──────┬──────┘  │         │
│  │         │         │ │         │         │ │         │         │         │
│  │  ┌──────▼──────┐  │ │  ┌──────▼──────┐  │ │  ┌──────▼──────┐  │         │
│  │  │   Envoy     │◀─┼─┼─▶│   Envoy     │◀─┼─┼─▶│   Envoy     │  │         │
│  │  │  (Sidecar)  │  │ │  │  (Sidecar)  │  │ │  │  (Sidecar)  │  │         │
│  │  └─────────────┘  │ │  └─────────────┘  │ │  └─────────────┘  │         │
│  └───────────────────┘ └───────────────────┘ └───────────────────┘         │
│           │                     │                     │                     │
│           └─────────────────────┴─────────────────────┘                     │
│                          DATA PLANE                                         │
│                  (Sidecar Proxies - Envoy)                                  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Istio Deep Dive

Installing Istio

# Download Istio
curl -L https://istio.io/downloadIstio | sh -
cd istio-1.20.0
export PATH=$PWD/bin:$PATH

# Install with demo profile (includes all features)
istioctl install --set profile=demo -y

# Enable automatic sidecar injection for default namespace
kubectl label namespace default istio-injection=enabled

# Verify installation
kubectl get pods -n istio-system

Deploying a Service with Istio

# deployment.yaml - No changes needed, Istio injects sidecar automatically
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  labels:
    app: order-service
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
      version: v1
  template:
    metadata:
      labels:
        app: order-service
        version: v1
    spec:
      containers:
      - name: order-service
        image: myregistry/order-service:v1
        ports:
        - containerPort: 3000
        env:
        - name: PORT
          value: "3000"
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  labels:
    app: order-service
spec:
  ports:
  - port: 80
    targetPort: 3000
    name: http
  selector:
    app: order-service

Traffic Management

Virtual Services

Control how requests are routed:
# virtual-service.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
  - order-service
  http:
  # Route based on headers
  - match:
    - headers:
        x-user-type:
          exact: premium
    route:
    - destination:
        host: order-service
        subset: v2  # Premium users get v2
  # Default route
  - route:
    - destination:
        host: order-service
        subset: v1
      weight: 90  # 90% to v1
    - destination:
        host: order-service
        subset: v2
      weight: 10  # 10% canary to v2
    timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 3s
      retryOn: 5xx,reset,connect-failure

Destination Rules

Define subsets and policies:
# destination-rule.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: order-service
spec:
  host: order-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    loadBalancer:
      simple: LEAST_CONN  # ROUND_ROBIN, RANDOM, PASSTHROUGH
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
  subsets:
  - name: v1
    labels:
      version: v1
    trafficPolicy:
      connectionPool:
        http:
          http1MaxPendingRequests: 50
  - name: v2
    labels:
      version: v2

Canary Deployments with Istio

# Step 1: Deploy v2 alongside v1
# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service-v2
spec:
  replicas: 1  # Start with 1 replica
  selector:
    matchLabels:
      app: order-service
      version: v2
  template:
    metadata:
      labels:
        app: order-service
        version: v2
    spec:
      containers:
      - name: order-service
        image: myregistry/order-service:v2
        ports:
        - containerPort: 3000
---
# Step 2: Route 10% traffic to v2
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service-canary
spec:
  hosts:
  - order-service
  http:
  - route:
    - destination:
        host: order-service
        subset: v1
      weight: 90
    - destination:
        host: order-service
        subset: v2
      weight: 10
Progressive rollout script:
// scripts/canary-rollout.js
const k8s = require('@kubernetes/client-node');

class CanaryRollout {
  constructor(serviceName, namespace = 'default') {
    this.serviceName = serviceName;
    this.namespace = namespace;
    
    const kc = new k8s.KubeConfig();
    kc.loadFromDefault();
    this.customApi = kc.makeApiClient(k8s.CustomObjectsApi);
  }

  async updateTrafficSplit(v1Weight, v2Weight) {
    const virtualService = {
      apiVersion: 'networking.istio.io/v1beta1',
      kind: 'VirtualService',
      metadata: {
        name: `${this.serviceName}-canary`,
        namespace: this.namespace
      },
      spec: {
        hosts: [this.serviceName],
        http: [{
          route: [
            {
              destination: {
                host: this.serviceName,
                subset: 'v1'
              },
              weight: v1Weight
            },
            {
              destination: {
                host: this.serviceName,
                subset: 'v2'
              },
              weight: v2Weight
            }
          ]
        }]
      }
    };

    try {
      await this.customApi.patchNamespacedCustomObject(
        'networking.istio.io',
        'v1beta1',
        this.namespace,
        'virtualservices',
        `${this.serviceName}-canary`,
        virtualService,
        undefined,
        undefined,
        undefined,
        { headers: { 'Content-Type': 'application/merge-patch+json' } }
      );
      console.log(`Traffic split updated: v1=${v1Weight}%, v2=${v2Weight}%`);
    } catch (error) {
      console.error('Failed to update traffic split:', error);
      throw error;
    }
  }

  async progressiveRollout(steps = [10, 25, 50, 75, 100], intervalMinutes = 5) {
    for (const v2Percentage of steps) {
      const v1Percentage = 100 - v2Percentage;
      
      console.log(`\n🚀 Rolling out: ${v2Percentage}% to v2`);
      await this.updateTrafficSplit(v1Percentage, v2Percentage);
      
      // Wait and monitor
      console.log(`⏳ Waiting ${intervalMinutes} minutes before next step...`);
      console.log('📊 Monitor metrics at: http://localhost:3000/grafana');
      
      if (v2Percentage < 100) {
        await this.waitAndMonitor(intervalMinutes);
        
        // Check error rates before continuing
        const healthy = await this.checkHealth();
        if (!healthy) {
          console.log('❌ Health check failed! Rolling back...');
          await this.rollback();
          return false;
        }
      }
    }
    
    console.log('\n✅ Canary rollout complete! v2 receiving 100% traffic');
    return true;
  }

  async rollback() {
    console.log('🔄 Rolling back to v1...');
    await this.updateTrafficSplit(100, 0);
    console.log('✅ Rollback complete. All traffic to v1');
  }

  async waitAndMonitor(minutes) {
    const ms = minutes * 60 * 1000;
    await new Promise(resolve => setTimeout(resolve, ms));
  }

  async checkHealth() {
    // Query Prometheus for error rates
    try {
      const response = await fetch(
        `http://prometheus:9090/api/v1/query?query=` +
        `sum(rate(istio_requests_total{destination_service="${this.serviceName}",` +
        `response_code=~"5.*",destination_version="v2"}[5m]))` +
        `/ sum(rate(istio_requests_total{destination_service="${this.serviceName}",` +
        `destination_version="v2"}[5m]))`
      );
      
      const data = await response.json();
      const errorRate = parseFloat(data.data.result[0]?.value[1] || 0);
      
      console.log(`📈 v2 error rate: ${(errorRate * 100).toFixed(2)}%`);
      return errorRate < 0.01; // Less than 1% error rate
    } catch (error) {
      console.log('⚠️ Could not fetch metrics, continuing...');
      return true;
    }
  }
}

// Usage
const canary = new CanaryRollout('order-service');
canary.progressiveRollout([10, 25, 50, 75, 100], 5);

mTLS (Mutual TLS)

Automatic encryption between services:
# peer-authentication.yaml - Enable strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: default
spec:
  mtls:
    mode: STRICT  # PERMISSIVE allows plaintext, STRICT requires mTLS
---
# authorization-policy.yaml - Service-to-service authorization
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: order-service-policy
  namespace: default
spec:
  selector:
    matchLabels:
      app: order-service
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        - "cluster.local/ns/default/sa/api-gateway"
        - "cluster.local/ns/default/sa/payment-service"
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/v1/orders/*"]
  - from:
    - source:
        principals:
        - "cluster.local/ns/monitoring/sa/prometheus"
    to:
    - operation:
        methods: ["GET"]
        paths: ["/metrics"]

Circuit Breaking with Istio

# circuit-breaker.yaml
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: payment-service-circuit-breaker
spec:
  host: payment-service
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100  # Max TCP connections
      http:
        http1MaxPendingRequests: 100  # Queue size
        http2MaxRequests: 1000  # Max concurrent requests
        maxRequestsPerConnection: 10  # Connection reuse
        maxRetries: 3  # Max retries
    outlierDetection:
      # Circuit breaker configuration
      consecutive5xxErrors: 5  # Eject after 5 consecutive 5xx
      consecutiveGatewayErrors: 5  # Eject after 5 gateway errors
      interval: 10s  # Detection interval
      baseEjectionTime: 30s  # Base ejection time
      maxEjectionPercent: 50  # Max % of hosts to eject
      minHealthPercent: 30  # Min healthy hosts before breaking

Rate Limiting

# rate-limit.yaml
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: api-rate-limit
  namespace: default
spec:
  workloadSelector:
    labels:
      app: api-gateway
  configPatches:
  - applyTo: HTTP_FILTER
    match:
      context: SIDECAR_INBOUND
      listener:
        filterChain:
          filter:
            name: "envoy.filters.network.http_connection_manager"
    patch:
      operation: INSERT_BEFORE
      value:
        name: envoy.filters.http.local_ratelimit
        typed_config:
          "@type": type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
          value:
            stat_prefix: http_local_rate_limiter
            token_bucket:
              max_tokens: 1000
              tokens_per_fill: 100
              fill_interval: 1s
            filter_enabled:
              runtime_key: local_rate_limit_enabled
              default_value:
                numerator: 100
                denominator: HUNDRED
            filter_enforced:
              runtime_key: local_rate_limit_enforced
              default_value:
                numerator: 100
                denominator: HUNDRED
            response_headers_to_add:
            - append: false
              header:
                key: x-rate-limit-remaining
                value: "%DYNAMIC_METADATA(envoy.filters.http.local_ratelimit:remaining_tokens)%"

Istio vs Linkerd Comparison

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ISTIO vs LINKERD COMPARISON                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Feature              │ Istio                  │ Linkerd                    │
│  ─────────────────────┼────────────────────────┼────────────────────────────│
│  Proxy                │ Envoy (C++)            │ linkerd2-proxy (Rust)      │
│  Resource Usage       │ Higher                 │ Lower (~10x less)          │
│  Latency Overhead     │ ~2-3ms                 │ ~1ms                       │
│  Complexity           │ High                   │ Low                        │
│  Learning Curve       │ Steep                  │ Gentle                     │
│  Features             │ Very extensive         │ Focused, simpler           │
│  mTLS                 │ Yes                    │ Yes (default on)           │
│  Traffic Management   │ Advanced               │ Basic                      │
│  Multi-cluster        │ Yes                    │ Yes                        │
│  Web UI               │ Kiali (separate)       │ Built-in dashboard         │
│  Best For             │ Large enterprises      │ Kubernetes-native teams    │
│                                                                              │
│  Choose Istio if:                                                           │
│  • Need advanced traffic management                                         │
│  • Already using Envoy                                                      │
│  • Enterprise with complex requirements                                     │
│  • Need extensive customization                                             │
│                                                                              │
│  Choose Linkerd if:                                                         │
│  • Want simplicity and low overhead                                         │
│  • Kubernetes-only environment                                              │
│  • Need quick implementation                                                │
│  • Resource-constrained clusters                                            │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Linkerd Quick Start

# Install Linkerd CLI
curl -fsL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin

# Validate cluster
linkerd check --pre

# Install Linkerd
linkerd install | kubectl apply -f -
linkerd check

# Inject sidecar into deployment
kubectl get deploy order-service -o yaml | linkerd inject - | kubectl apply -f -

# Or enable auto-injection for namespace
kubectl annotate namespace default linkerd.io/inject=enabled

# Access dashboard
linkerd viz install | kubectl apply -f -
linkerd viz dashboard &

Observability with Service Mesh

Automatic Metrics

Istio automatically generates:
# Request rate by service
sum(rate(istio_requests_total{reporter="destination"}[5m])) by (destination_service_name)

# Error rate
sum(rate(istio_requests_total{reporter="destination",response_code=~"5.*"}[5m])) 
  by (destination_service_name)
/ 
sum(rate(istio_requests_total{reporter="destination"}[5m])) 
  by (destination_service_name)

# P99 latency
histogram_quantile(0.99, 
  sum(rate(istio_request_duration_milliseconds_bucket{reporter="destination"}[5m])) 
  by (destination_service_name, le)
)

# Traffic between services
sum(rate(istio_requests_total{reporter="source"}[5m])) 
  by (source_workload, destination_service_name)

Distributed Tracing Integration

# Enable tracing with Jaeger
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-config
spec:
  meshConfig:
    enableTracing: true
    defaultConfig:
      tracing:
        sampling: 100.0  # 100% sampling for dev, reduce in prod
        zipkin:
          address: jaeger-collector.istio-system.svc:9411

Service Mesh Interview Questions

Answer:A service mesh is an infrastructure layer that handles service-to-service communication, providing:
  • Traffic management: Load balancing, routing, retries
  • Security: mTLS, authorization
  • Observability: Metrics, tracing, logging
Use when:
  • 10+ microservices
  • Need consistent security policies
  • Multiple languages/frameworks
  • Complex traffic patterns
Avoid when:
  • Small number of services
  • Simple architecture
  • Resource constraints
  • Team unfamiliar with Kubernetes
Answer:The sidecar pattern deploys a helper container alongside the main application container in the same pod:
┌─────────── Pod ───────────┐
│  ┌────────┐  ┌─────────┐  │
│  │  App   │◀▶│ Sidecar │  │
│  │        │  │ (Envoy) │  │
│  └────────┘  └─────────┘  │
└───────────────────────────┘
Benefits:
  • App doesn’t need networking code
  • Language agnostic
  • Updated independently
  • Consistent behavior
Drawbacks:
  • Latency overhead (1-3ms)
  • Resource consumption
  • Complexity
Answer:
  1. Certificate Authority (CA) generates root certificate
  2. Each service gets unique certificate (SPIFFE identity)
  3. Sidecars automatically handle TLS handshake
  4. Both client and server verify each other’s certificates
Service A ──────[mTLS]────── Service B
    │                           │
    ├─ Client Certificate       ├─ Server Certificate
    └─ Verify Server Cert       └─ Verify Client Cert
Benefits:
  • Zero code changes
  • Automatic rotation
  • Service identity verification
  • Encrypted in transit
Answer:
  1. Deploy new version alongside old
  2. Create VirtualService with weight-based routing
  3. Gradually shift traffic (10% → 25% → 50% → 100%)
  4. Monitor error rates at each step
  5. Rollback if issues detected
spec:
  http:
  - route:
    - destination:
        host: my-service
        subset: v1
      weight: 90
    - destination:
        host: my-service
        subset: v2
      weight: 10
Key metrics to watch:
  • Error rates (5xx responses)
  • Latency percentiles
  • Business metrics
Answer:
AspectIstioLinkerd
ComplexityHighLow
Resource usageHigherLower
Latency~2-3ms~1ms
FeaturesExtensiveFocused
ProxyEnvoy (C++)Rust-based
Choose Istio: Complex requirements, multi-cluster, extensive customization Choose Linkerd: Simplicity, Kubernetes-only, resource constraints

Best Practices

Start Simple

Begin with basic features (mTLS, observability), add complexity gradually

Test Thoroughly

Service mesh adds latency - load test before production

Monitor Resources

Sidecar proxies consume CPU/memory - size appropriately

Plan for Failures

Service mesh itself can fail - have runbooks ready

Chapter Summary

Key Takeaways:
  • Service mesh moves networking from app code to infrastructure
  • Istio provides advanced traffic management and security
  • mTLS ensures encrypted, authenticated service communication
  • Canary deployments enable safe progressive rollouts
  • Choose between Istio and Linkerd based on complexity needs
Next Chapter: Configuration Management - Centralized configuration for microservices.