Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Service Discovery

In a microservices architecture, services need to find and communicate with each other dynamically. Service discovery solves the problem of locating services in a constantly changing environment.
Learning Objectives:
  • Understand service discovery patterns
  • Implement Consul-based discovery
  • Use DNS-based service discovery
  • Configure Kubernetes native discovery
  • Build health-aware load balancing

Why Service Discovery?

In a monolith, components talk to each other via in-process function calls. In a microservices system, components are separate processes, often on separate machines, sometimes in separate data centers. Traditionally you would hardcode IP addresses or hostnames into configuration files, but this falls apart the moment your infrastructure becomes dynamic. Containers are ephemeral. Auto-scalers spin instances up and down. Cloud providers reschedule workloads onto new VMs without warning. A service that was at 192.168.1.10 this morning might be at 10.0.4.77 this afternoon, and the IP you cached an hour ago now points to nothing. Service discovery is the mechanism that lets a caller ask “where is the payment service right now?” and get back a fresh, healthy answer. It decouples logical service identity (the name “payment-service”) from physical network location (IP and port). Without it, every deployment, scaling event, or node failure becomes a config-change emergency.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    THE PROBLEM: DYNAMIC INFRASTRUCTURE                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STATIC CONFIGURATION (doesn't work):                                        │
│  ────────────────────────────────────                                       │
│                                                                              │
│  Order Service config:                                                       │
│    payment_service: http://192.168.1.10:3000  ◀── What if IP changes?       │
│    inventory_service: http://192.168.1.11:3001 ◀── What if it scales?       │
│                                                                              │
│                                                                              │
│  DYNAMIC CHALLENGES:                                                         │
│  ───────────────────                                                         │
│                                                                              │
│  ┌─────────────┐       ┌─────────────────────────────────────────┐          │
│  │ Payment     │ ───▶  │ Multiple instances, any can handle     │          │
│  │ Service     │       │ Instance 1: 10.0.0.5:3000              │          │
│  │ (3 instances)       │ Instance 2: 10.0.0.6:3000 (new)        │          │
│  └─────────────┘       │ Instance 3: 10.0.0.7:3000 (new)        │          │
│                        │ Instance 4: 10.0.0.4:3000 (terminated) │          │
│                        └─────────────────────────────────────────┘          │
│                                                                              │
│  • Instances come and go                                                     │
│  • Auto-scaling adds/removes instances                                      │
│  • Containers get new IPs on restart                                        │
│  • Deployments create new instances                                         │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Service Discovery Patterns

There are two fundamental architectures for how a caller locates a callee: client-side discovery and server-side discovery. They represent a classic tradeoff between client complexity and infrastructure complexity. In client-side discovery, each service embeds a discovery library that queries a registry, picks an instance, and makes the call directly. The client gets total control (custom load-balancing, latency-aware routing, circuit breaking per instance) but every language you use needs a compatible client library, and the logic lives in every service you ship. In server-side discovery, clients call a single well-known address (usually a load balancer or ingress), and that intermediary is responsible for looking up healthy instances and forwarding the request. Clients become dumb; they do not even need to know the registry exists. The tradeoff is an extra network hop and a potential bottleneck, but you gain uniform cross-language behavior and simpler app code. Kubernetes Services, AWS ALBs, and Nginx Plus all implement server-side discovery. Most modern teams on Kubernetes start with server-side and only reach for client-side when they have a specific need (gRPC load balancing, smart routing, cross-cluster traffic).
┌─────────────────────────────────────────────────────────────────────────────┐
│                    SERVICE DISCOVERY PATTERNS                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  1. CLIENT-SIDE DISCOVERY                                                    │
│  ─────────────────────────                                                  │
│                                                                              │
│  ┌────────┐    ┌──────────────┐    ┌─────────────────────┐                 │
│  │ Order  │───▶│   Service    │───▶│ Payment instances   │                 │
│  │Service │    │   Registry   │    │ ├─ 10.0.0.5:3000   │                 │
│  └────────┘    └──────────────┘    │ ├─ 10.0.0.6:3000   │                 │
│       │                             │ └─ 10.0.0.7:3000   │                 │
│       │                             └─────────────────────┘                 │
│       │            ┌─────────────────────┐                                  │
│       └───────────▶│ Selected instance   │                                  │
│                    │ (client picks one)  │                                  │
│                    └─────────────────────┘                                  │
│                                                                              │
│  Pros: Client has full control, can implement smart routing                 │
│  Cons: Client complexity, language-specific                                 │
│                                                                              │
│                                                                              │
│  2. SERVER-SIDE DISCOVERY                                                    │
│  ─────────────────────────                                                  │
│                                                                              │
│  ┌────────┐    ┌───────────────┐    ┌─────────────────────┐                │
│  │ Order  │───▶│  Load Balancer│───▶│ Payment instances   │                │
│  │Service │    │  / Router     │    │ ├─ 10.0.0.5:3000   │                │
│  └────────┘    └───────────────┘    │ ├─ 10.0.0.6:3000   │                │
│                       │              │ └─ 10.0.0.7:3000   │                │
│                       ▼              └─────────────────────┘                │
│                ┌──────────────┐                                             │
│                │   Service    │                                             │
│                │   Registry   │                                             │
│                └──────────────┘                                             │
│                                                                              │
│  Pros: Client simplicity, centralized control                               │
│  Cons: Additional hop, potential bottleneck                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Consul-Based Discovery

Consul is a full-featured service discovery solution with health checking.

Setting Up Consul

Consul is HashiCorp’s distributed service registry. It solves two problems at once: it stores a live catalog of every service instance (address, port, tags, health status) and it actively probes those instances with health checks so unhealthy ones are automatically hidden from lookups. You run a small Consul cluster (three or five servers for production) and each application host or container runs a lightweight Consul agent. This is the classic third-party registry pattern, and it works equally well outside Kubernetes (VMs, bare-metal, mixed environments) where Kubernetes-native discovery is not an option.
# docker-compose.yml
version: '3.8'

services:
  consul:
    image: consul:latest
    container_name: consul
    ports:
      - "8500:8500"   # HTTP API/UI
      - "8600:8600/udp" # DNS
    command: agent -server -bootstrap-expect=1 -ui -client=0.0.0.0
    volumes:
      - consul-data:/consul/data

  order-service:
    build: ./order-service
    environment:
      - CONSUL_HOST=consul
      - SERVICE_NAME=order-service
      - SERVICE_PORT=3000
    depends_on:
      - consul

  payment-service:
    build: ./payment-service
    environment:
      - CONSUL_HOST=consul
      - SERVICE_NAME=payment-service
      - SERVICE_PORT=3000
    depends_on:
      - consul
    deploy:
      replicas: 3

volumes:
  consul-data:

Service Registration

Registration is how an instance announces itself to the world. At startup, the service calls Consul’s agent API with its name, address, port, and a description of how Consul should check its health. Consul immediately begins polling that health endpoint (usually every 10 seconds) and marks the instance as passing, warning, or critical accordingly. Only passing instances show up in discovery queries by default. The critical detail most teams miss is deregistration on shutdown. If your pod receives SIGTERM and exits without telling Consul, your instance stays in the registry as “healthy” until the next health check fails, typically 10-30 seconds later. During that window, callers send real requests to a dead instance and get connection refused errors. The fix is a graceful shutdown hook that explicitly deregisters before the process dies. Here is the pattern in both Node.js and Python; in Python we use FastAPI’s lifespan context manager, which is the idiomatic place to put “run at startup, run at shutdown” code.
// discovery/ConsulRegistry.js
const Consul = require('consul');

class ConsulRegistry {
  constructor(options = {}) {
    this.consul = new Consul({
      host: options.host || process.env.CONSUL_HOST || 'localhost',
      port: options.port || 8500
    });

    this.serviceName = options.serviceName;
    this.serviceId = `${options.serviceName}-${process.env.HOSTNAME || require('os').hostname()}`;
    this.servicePort = options.servicePort;
    this.checkInterval = options.checkInterval || '10s';
    this.deregisterAfter = options.deregisterAfter || '1m';
  }

  async register() {
    const registration = {
      id: this.serviceId,
      name: this.serviceName,
      address: this.getServiceAddress(),
      port: this.servicePort,
      tags: ['node', 'api'],
      check: {
        http: `http://${this.getServiceAddress()}:${this.servicePort}/health`,
        interval: this.checkInterval,
        deregisterCriticalServiceAfter: this.deregisterAfter
      }
    };

    try {
      await this.consul.agent.service.register(registration);
      console.log(`Service registered: ${this.serviceId}`);

      // Handle graceful shutdown
      this.setupGracefulShutdown();
    } catch (error) {
      console.error('Failed to register service:', error);
      throw error;
    }
  }

  async deregister() {
    try {
      await this.consul.agent.service.deregister(this.serviceId);
      console.log(`Service deregistered: ${this.serviceId}`);
    } catch (error) {
      console.error('Failed to deregister service:', error);
    }
  }

  getServiceAddress() {
    // In Docker, use container hostname
    if (process.env.DOCKER) {
      return process.env.HOSTNAME;
    }

    // Get local IP
    const interfaces = require('os').networkInterfaces();
    for (const name of Object.keys(interfaces)) {
      for (const iface of interfaces[name]) {
        if (iface.family === 'IPv4' && !iface.internal) {
          return iface.address;
        }
      }
    }
    return 'localhost';
  }

  setupGracefulShutdown() {
    const signals = ['SIGINT', 'SIGTERM', 'SIGQUIT'];

    signals.forEach(signal => {
      process.on(signal, async () => {
        console.log(`Received ${signal}, deregistering service...`);
        await this.deregister();
        process.exit(0);
      });
    });
  }
}

// Usage
const registry = new ConsulRegistry({
  serviceName: 'order-service',
  servicePort: 3000
});

await registry.register();

Service Discovery Client

Registration is only half the story. The other half is discovery: callers need a way to ask “give me a list of healthy payment-service instances” and get a fresh answer. A naive implementation would hit the Consul API on every outbound request, but that turns Consul into a per-request bottleneck and adds tens of milliseconds of latency. The standard fix is a local cache with a short TTL (5-10 seconds) combined with a background watch that streams changes from Consul in real time. When an instance goes unhealthy or a new one registers, Consul pushes the update and the client cache refreshes within a second. This gives you near-real-time visibility without hammering the registry. A well-designed discovery client also degrades gracefully. If Consul itself is unreachable, return the last-known-good cached list rather than failing. It is almost always better to serve a stale instance list than to take the entire caller offline because the registry had a hiccup.
// discovery/ConsulDiscovery.js
class ConsulDiscovery {
  constructor(consul) {
    this.consul = consul;
    this.cache = new Map();
    this.watchers = new Map();
  }

  async discover(serviceName, options = {}) {
    const { healthy = true, cached = true } = options;

    // Check cache first
    if (cached && this.cache.has(serviceName)) {
      const cacheEntry = this.cache.get(serviceName);
      if (Date.now() - cacheEntry.timestamp < 5000) {
        return cacheEntry.instances;
      }
    }

    try {
      const services = await this.consul.health.service({
        service: serviceName,
        passing: healthy
      });

      const instances = services.map(entry => ({
        id: entry.Service.ID,
        address: entry.Service.Address,
        port: entry.Service.Port,
        tags: entry.Service.Tags,
        status: entry.Checks.every(c => c.Status === 'passing') ? 'healthy' : 'unhealthy'
      }));

      // Update cache
      this.cache.set(serviceName, {
        instances,
        timestamp: Date.now()
      });

      return instances;
    } catch (error) {
      console.error(`Failed to discover ${serviceName}:`, error);

      // Return cached if available
      if (this.cache.has(serviceName)) {
        return this.cache.get(serviceName).instances;
      }

      return [];
    }
  }

  async watch(serviceName, callback) {
    if (this.watchers.has(serviceName)) {
      return;
    }

    const watch = this.consul.watch({
      method: this.consul.health.service,
      options: {
        service: serviceName,
        passing: true
      }
    });

    watch.on('change', (data) => {
      const instances = data.map(entry => ({
        id: entry.Service.ID,
        address: entry.Service.Address,
        port: entry.Service.Port
      }));

      this.cache.set(serviceName, {
        instances,
        timestamp: Date.now()
      });

      callback(instances);
    });

    watch.on('error', (error) => {
      console.error(`Watch error for ${serviceName}:`, error);
    });

    this.watchers.set(serviceName, watch);
  }

  stopWatch(serviceName) {
    const watcher = this.watchers.get(serviceName);
    if (watcher) {
      watcher.end();
      this.watchers.delete(serviceName);
    }
  }
}

Load Balancer with Discovery

Once you have a live list of healthy instances, you still need to decide which one to call. This is client-side load balancing. The simplest strategy is round-robin (cycle through the list), which works fine for stateless requests of similar cost. Random selection is similar but avoids the “thundering herd” problem where all clients sync up on the same rotation. Least-connections is better when request costs vary wildly; weighted strategies help with heterogeneous instances (some pods with more CPU than others). A production-grade service client wraps the discovery cache with a circuit breaker per instance. If a specific instance fails three requests in a row, mark it as “open” and skip it for the next 30 seconds, even if Consul still reports it as healthy. This handles the case where an instance is technically responding to health checks but failing real business requests (a slow dependency, a deadlock, a full disk). The discovery layer gives you the possible instances; the circuit breaker layer decides which to actually trust.
// discovery/ServiceClient.js
class ServiceClient {
  constructor(discovery, serviceName, options = {}) {
    this.discovery = discovery;
    this.serviceName = serviceName;
    this.strategy = options.strategy || 'round-robin';
    this.instances = [];
    this.currentIndex = 0;
    this.circuitBreakers = new Map();

    // Start watching for changes
    this.discovery.watch(serviceName, (instances) => {
      console.log(`Service ${serviceName} updated:`, instances.length, 'instances');
      this.instances = instances;
    });

    // Initial load
    this.refresh();
  }

  async refresh() {
    this.instances = await this.discovery.discover(this.serviceName);
  }

  selectInstance() {
    const healthyInstances = this.instances.filter(i =>
      !this.isCircuitOpen(i.id)
    );

    if (healthyInstances.length === 0) {
      throw new Error(`No healthy instances of ${this.serviceName}`);
    }

    switch (this.strategy) {
      case 'round-robin':
        return this.roundRobin(healthyInstances);
      case 'random':
        return this.random(healthyInstances);
      case 'least-connections':
        return this.leastConnections(healthyInstances);
      default:
        return healthyInstances[0];
    }
  }

  roundRobin(instances) {
    const instance = instances[this.currentIndex % instances.length];
    this.currentIndex++;
    return instance;
  }

  random(instances) {
    const index = Math.floor(Math.random() * instances.length);
    return instances[index];
  }

  async request(path, options = {}) {
    const instance = this.selectInstance();
    const url = `http://${instance.address}:${instance.port}${path}`;

    try {
      const response = await fetch(url, {
        ...options,
        timeout: 5000
      });

      this.recordSuccess(instance.id);
      return response;
    } catch (error) {
      this.recordFailure(instance.id);
      throw error;
    }
  }

  // Circuit breaker integration
  isCircuitOpen(instanceId) {
    const cb = this.circuitBreakers.get(instanceId);
    return cb && cb.state === 'OPEN';
  }

  recordSuccess(instanceId) {
    const cb = this.getOrCreateCircuitBreaker(instanceId);
    cb.failures = 0;
  }

  recordFailure(instanceId) {
    const cb = this.getOrCreateCircuitBreaker(instanceId);
    cb.failures++;

    if (cb.failures >= 3) {
      cb.state = 'OPEN';
      cb.openTime = Date.now();

      // Auto-reset after 30 seconds
      setTimeout(() => {
        cb.state = 'HALF_OPEN';
      }, 30000);
    }
  }

  getOrCreateCircuitBreaker(instanceId) {
    if (!this.circuitBreakers.has(instanceId)) {
      this.circuitBreakers.set(instanceId, {
        state: 'CLOSED',
        failures: 0
      });
    }
    return this.circuitBreakers.get(instanceId);
  }
}

// Usage
const discovery = new ConsulDiscovery(consul);
const paymentClient = new ServiceClient(discovery, 'payment-service', {
  strategy: 'round-robin'
});

const response = await paymentClient.request('/api/payments', {
  method: 'POST',
  body: JSON.stringify({ amount: 100 })
});

Consul-Based Discovery Caveats and Interview Deep-Dive

Consul-based discovery traps practitioners fall into:
  1. Instances stuck as “healthy” after a crash. A pod dies without deregistering (SIGKILL, OOM, node failure). Consul’s health check polls every 10 seconds; the instance remains in the registry for up to 30 seconds. Callers send real requests to a dead instance and get connection refused.
  2. Health check that only checks “is HTTP server up”. The /health endpoint returns 200 the moment the HTTP listener starts, before the database connection pool is initialized or the cache is warm. Consul marks the instance healthy; traffic arrives; every request fails because the service is not actually ready.
  3. Discovery client without graceful degradation. Consul itself has a brief blip. The discovery client throws an exception. Every caller depending on it returns 503. A registry hiccup becomes a fleet-wide outage.
  4. Aggressive per-request registry queries. Naive client hits Consul’s HTTP API on every outbound request. Adds 10-50 ms of latency per call. Consul cluster gets hammered; becomes its own bottleneck.
Solutions and patterns:
  • Graceful shutdown with explicit deregistration. SIGTERM handler calls consul.agent.service.deregister(id) before stopping the HTTP server, then waits 5-10 seconds for in-flight requests to drain. Without this, every rolling restart produces 10-30 seconds of stale-endpoint errors.
  • Deep health checks, not shallow ones. /health verifies database reachability, critical cache, and downstream connectivity. Return 503 when any critical dependency is degraded. Use deregisterCriticalServiceAfter so Consul removes instances that fail checks for a configured duration.
  • Local cache with long-polling watch. Discovery client caches results with a short TTL (5-10 seconds) and also subscribes to a Consul watch (long-poll blocking query). When topology changes, cache refreshes within a second. Normal reads hit the cache; registry is only queried on change.
  • Degrade to cached data on Consul failure. If Consul is unreachable, return the last-known-good instance list rather than failing. Stale data beats no data. Log the event so ops knows the registry had a blip.
Strong Answer Framework:
  1. Distinguish between “is registered” and “is serving correctly”. Registration means Consul’s health check passes. That check is usually shallow — HTTP returns 200. Serving correctly means the service can actually handle real business requests. These are not the same.
  2. Improve the health check. Replace shallow /health (HTTP up) with deep /health that verifies every critical dependency: database connection pool has a healthy connection; cache is reachable; authentication provider is reachable. Make the check honest.
  3. Use separate liveness and readiness signals (if on k8s). Liveness: “is this process alive?” (restart on fail). Readiness: “is this pod serving correctly?” (remove from load balancer on fail but do not restart). Consul’s equivalent: passing vs warning state with your load balancer treating warning as “do not route.”
  4. Instrument the discovery layer. Every failed request logs the instance ID that handled it. After the incident, correlate failures by instance — is one instance much worse than the others? If yes, Consul’s health check did not catch it. Add a business-level metric (success rate) to the health check.
  5. Add client-side per-instance circuit breakers. Even with perfect Consul health checks, an instance can degrade between check cycles. The discovery client tracks failures per instance and skips instances that have recently failed, even if Consul still reports them healthy.
  6. Consider ejection via outlier detection. Istio / Envoy outlier detection automatically removes instances that return high error rates from the load-balancing pool. This is discovery that reacts to actual request outcomes, not just health check outcomes.
Real-World Example: In 2017, a well-known travel booking company had exactly this failure. Their /health endpoint returned 200 as long as the Spring Boot actuator was alive. The Postgres connection pool would silently exhaust during traffic spikes; health checks kept passing, but 30 percent of real requests hit database timeout. Fix: new /health that ran SELECT 1 against the primary database and failed the check if it took over 500 ms. Deployed on a Friday; the weekend’s Saturday-morning traffic spike showed the new health check correctly removing degraded pods within 20 seconds.Senior Follow-up Questions:
  1. “Why not just check the downstream dependencies directly from the load balancer?” You can, but it creates cascading failures. If the dependency is slow for everyone, the load balancer thinks every instance is unhealthy and removes all of them. Now you have zero capacity. Health checks must be local (does this instance work?) not global (is the downstream up?).
  2. “How do you handle a partial outage where instance A can reach dependency X but instance B cannot?” Per-instance deep health checks catch this. Instance B’s health check fails because B cannot reach X; Consul marks B unhealthy; discovery routes only to A. Instance A keeps serving. Without per-instance checks, you would route uniformly and 50 percent of requests fail.
  3. “What’s the trade-off of making health checks expensive (deep checks)?” Deep checks query the database, cache, and dependencies, which adds load. Consul polls every 10 seconds per instance; with 100 instances, that is 10 DB queries per second just for health. Mitigate by (a) caching health state for a few seconds within the service, and (b) using a dedicated lightweight query that does not impact production load.
Common Wrong Answers:
  1. “Add more replicas to compensate for the 30 percent failure rate.” Ignores the root cause. If the health check lies, adding more instances means more lying instances, same failure rate. Throwing capacity at a symptom does not fix the diagnosis problem.
  2. “Lower the Consul check interval so we detect failures faster.” Helps marginally but the shallow-check problem remains. A faster shallow check still does not catch database degradation. The check needs to be deeper, not just more frequent.
Further Reading:
  • HashiCorp Consul documentation, “Service Health Checks.”
  • Kubernetes documentation, “Configure Liveness, Readiness and Startup Probes.”
  • Envoy documentation, “Outlier detection” for per-host ejection.
Strong Answer Framework:
  1. Decide what “unavailable” means. Total outage (no response), slow response, or stale data? Each has a different fix.
  2. Cache aggressively. Every discovery client maintains a local cache with TTL. When Consul is reachable, cache is refreshed via watch. When Consul is unreachable, serve from cache — stale data is better than no data.
  3. Last-known-good fallback. Cache entries do not simply expire; they are marked stale after TTL but kept as fallback. If a fresh lookup fails, return the stale cache. This means instances that went away during the Consul blip may still be in the routing list — handle that at the call layer with circuit breakers.
  4. Failover to secondary registry or DNS. Some teams run a Consul datacenter as primary and a DNS-based backup as fallback. When Consul is unreachable, discovery falls back to DNS. The topology changes are less fresh but service continues.
  5. Do not abort existing connections. Existing TCP connections to already-discovered instances should continue to work. Consul outage affects new connections, not open ones. As long as you have connection reuse / HTTP keep-alive, the immediate blast radius is small.
  6. Alert on Consul staleness. A metric “seconds since last successful Consul watch” crossing a threshold (say, 30 seconds) pages on-call. Silent staleness is the failure mode to avoid.
Real-World Example: Airbnb’s SmartStack (predecessor to many service-mesh patterns) used an architecture where every host ran a local Synapse process that watched ZooKeeper and wrote to a local HAProxy config. Applications always called localhost:<port>. ZooKeeper outages meant configs stopped updating, but HAProxy kept routing to the last-known-good set. Applications never experienced the outage.Senior Follow-up Questions:
  1. “What if Consul is up but is returning wrong data (corrupted state)?” Rarer but possible. Mitigation: client-side sanity checks. If a discovery response contains zero instances of a service that was at 50 instances moments ago, treat it as suspicious and keep the previous list. Emit a metric.
  2. “How do you bootstrap if Consul is down when your pod starts?” First choice: retry Consul for up to N seconds during startup. Fail readiness until you have an initial roster. Second choice: ship a default endpoint list with the container (environment variable) as a last-resort bootstrap. Accept that fresh deployments during a Consul outage will fail to start, and consider this cost when designing.
  3. “Does a service mesh solve this?” Partially. Istio / Linkerd use their own control plane (Pilot, destination rules). If the control plane is down, sidecars continue routing with last-known config. Data plane continues independently of control plane, by design. Similar resilience pattern to cached-local-proxy.
Common Wrong Answers:
  1. “Fail requests when Consul is unreachable.” Turns a registry hiccup into a user-facing outage. The whole point of a registry is to decouple identity from address; if the registry being down takes down the app, you got the coupling wrong.
  2. “Hard-code instance IPs as fallback.” Defeats the purpose of dynamic discovery. Those IPs will be wrong before you need them. Cache + last-known-good is the right pattern.
Further Reading:
  • HashiCorp Consul documentation, “Outage Recovery.”
  • Airbnb Engineering Blog, “SmartStack: Service Discovery in the Cloud.”
  • Istio documentation, “Pilot and data plane resilience.”

DNS-Based Discovery

DNS is the oldest service discovery mechanism in existence. Instead of running a dedicated registry client in every service, you rely on DNS, which every programming language and network library already speaks natively. The registry (Consul, CoreDNS, Route 53) exposes a DNS interface, and when you resolve payment.service.consul, you get back one or more A records (IP addresses) for healthy instances. If the registry also provides SRV records, you can even get the port along with the IP in a single lookup. The appeal of DNS-based discovery is that it works with zero code changes in most cases. Any language, any HTTP client, any ORM, any legacy tool can consume it. The downside is DNS’s weakest feature: caching behavior. Clients (and especially OS resolvers and JVM runtimes) aggressively cache DNS results based on TTL. If you set TTL to 60 seconds to reduce DNS query load, your client keeps hitting a dead IP for up to 60 seconds after the instance dies. If you set TTL to 1 second to stay fresh, you multiply DNS query volume by 60x. DNS also cannot express rich health information (only “in the list” or “not in the list”) and gives only round-robin load balancing. It is simple and universal, but it is a coarse tool.
┌─────────────────────────────────────────────────────────────────────────────┐
│                      DNS-BASED DISCOVERY                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌────────────┐        ┌─────────────┐        ┌──────────────────┐          │
│  │   Order    │──DNS──▶│    DNS      │───────▶│ payment.service  │          │
│  │  Service   │ lookup │   Server    │        │ → 10.0.0.5       │          │
│  └────────────┘        └─────────────┘        │ → 10.0.0.6       │          │
│                              ▲                 │ → 10.0.0.7       │          │
│                              │                 └──────────────────┘          │
│                     ┌────────┴────────┐                                      │
│                     │ Service Registry│                                      │
│                     │ updates DNS     │                                      │
│                     └─────────────────┘                                      │
│                                                                              │
│  A Records:                                                                  │
│  payment.service.consul  →  10.0.0.5                                        │
│  payment.service.consul  →  10.0.0.6                                        │
│  payment.service.consul  →  10.0.0.7                                        │
│                                                                              │
│  SRV Records (with ports):                                                   │
│  _payment._tcp.service.consul  →  10.0.0.5:3000                             │
│  _payment._tcp.service.consul  →  10.0.0.6:3000                             │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

DNS Discovery Implementation

When you implement a DNS-based discovery client, you still want a small in-process cache to avoid hammering the resolver on every request, but the cache TTL should be short (5-10 seconds) to stay responsive to topology changes. SRV records are strictly better than A records when your registry supports them because they deliver both the address and the port, which matters when instances of the same service listen on different ports.
// discovery/DNSDiscovery.js
const dns = require('dns').promises;

class DNSDiscovery {
  constructor(options = {}) {
    this.suffix = options.suffix || '.service.consul';
    this.dnsServer = options.dnsServer || '127.0.0.1';
    this.dnsPort = options.dnsPort || 8600;
    this.cache = new Map();
    this.cacheTtl = options.cacheTtl || 10000;

    // Configure DNS resolver
    dns.setServers([`${this.dnsServer}:${this.dnsPort}`]);
  }

  async discover(serviceName) {
    const hostname = `${serviceName}${this.suffix}`;

    // Check cache
    const cached = this.cache.get(hostname);
    if (cached && Date.now() - cached.timestamp < this.cacheTtl) {
      return cached.instances;
    }

    try {
      // A record lookup for IP addresses
      const addresses = await dns.resolve4(hostname);

      const instances = addresses.map(address => ({
        address,
        port: 3000 // Default port, or use SRV records
      }));

      this.cache.set(hostname, {
        instances,
        timestamp: Date.now()
      });

      return instances;
    } catch (error) {
      if (error.code === 'ENOTFOUND') {
        console.warn(`Service not found: ${serviceName}`);
        return [];
      }
      throw error;
    }
  }

  async discoverWithPort(serviceName) {
    // SRV record lookup for address AND port
    const hostname = `_${serviceName}._tcp${this.suffix}`;

    try {
      const records = await dns.resolveSrv(hostname);

      return records.map(record => ({
        address: record.name,
        port: record.port,
        priority: record.priority,
        weight: record.weight
      }));
    } catch (error) {
      console.error('SRV lookup failed:', error);
      return this.discover(serviceName); // Fallback to A records
    }
  }
}

// Usage
const dnsDiscovery = new DNSDiscovery({
  dnsServer: 'consul',
  dnsPort: 8600
});

const paymentInstances = await dnsDiscovery.discover('payment');
// [{ address: '10.0.0.5', port: 3000 }, { address: '10.0.0.6', port: 3000 }]

DNS-Based Discovery Caveats and Interview Deep-Dive

DNS-based discovery traps practitioners fall into:
  1. DNS TTL caching causing stale endpoints. You set TTL=60 seconds to reduce DNS traffic. An instance dies; its IP keeps getting returned for up to 60 seconds. Every caller routes to the dead IP for that window. Worse: some clients (JVM default, glibc under certain conditions) cache DNS forever, ignoring TTL entirely.
  2. OS resolver cache on top of application cache. Your app caches DNS responses for 5 seconds. The OS caches for another 30. Effective TTL is 35 seconds, not 5. Topology changes take much longer to propagate than you think.
  3. DNS cannot express rich health. A record for payment.service.consul is either present or not. A slow-but-up instance looks identical to a fast-and-up instance. No preference routing, no weighted distribution, no connection draining.
  4. Round-robin only, and poorly randomized. OS resolvers typically return A records in rotation. Multiple clients hitting the same resolver tend to pick the same instance first, creating load imbalance. For gRPC / HTTP/2 with long-lived connections, a single round-robin pick means all traffic pins to one instance.
Solutions and patterns:
  • Set TTL to match topology change velocity. Static infrastructure: 300 seconds is fine. Dynamic (autoscaling, rolling deploys): 5-10 seconds. Below 5 seconds, query volume overwhelms most DNS infrastructure.
  • Configure JVM DNS caching. Set networkaddress.cache.ttl to something sane (5-30 seconds). Default of forever is a production bug waiting to happen.
  • Use a modern resolver in the application. Languages with sane defaults (Go, Rust, Node with dns.resolve* APIs) respect TTL. Languages with inherited C-library resolvers (JVM, some Python paths) often do not.
  • Prefer SRV records when available. SRV records carry priority, weight, port, and target in a single lookup. This gives you structured load balancing, not just round-robin.
  • Layer discovery on top, not just DNS. For fleets where topology changes matter (autoscaling, chaos testing, rolling deploys), DNS alone is not enough. Add a service mesh or a client library with watch-based updates. Use DNS as a fallback, not the primary.
Strong Answer Framework:
  1. Almost certainly DNS TTL caching. During the rolling deploy, a pod is terminated, its IP is reassigned. DNS updates take effect immediately in the registry, but clients cached the old IP. Until the cache expires, they keep trying the dead IP.
  2. Check the TTL. What does the DNS authoritative server return? What does the caller’s OS or runtime cache actually honor? These may differ.
  3. Check for JVM-style infinite caching. If any callers run on the JVM with default settings, their caches may be indefinite. Single biggest DNS discovery landmine in multi-language environments.
  4. Short-term mitigation: pre-stop hook with drain. Kubernetes pre-stop hook sleeps 30-60 seconds and sends a SIGTERM to the app. During that time, the pod is removed from Service endpoints, but it continues accepting existing connections. Gives the DNS cache time to expire before the pod actually dies.
  5. Medium-term: lower TTL. Move from 60 seconds to 5 seconds. Trade more DNS load for faster topology propagation. Monitor DNS server CPU.
  6. Long-term: move beyond DNS for this service. If topology changes happen often (autoscaling, blue-green), DNS is the wrong primitive. Adopt a watch-based discovery (Consul watch, gRPC name resolver, service mesh). Real-time updates with zero TTL.
Real-World Example: In 2018, a prominent ride-sharing platform traced a class of 503 errors during deploys to DNS caching. Root cause: their Java service used the default JVM DNS cache (effectively infinite). Every restart picked up stale IPs from the prior deploy until the JVM was itself restarted. Fix: set networkaddress.cache.ttl=10 platform-wide via a JVM agent. Deploy-time 503s dropped by 95 percent.Senior Follow-up Questions:
  1. “Why not just set TTL to 1 second?” DNS query volume scales with 1/TTL. At 1 second, a service with 1000 client pods generates 1000 DNS queries per second just for that one name. Multiply by all services. CoreDNS / dnsmasq / unbound start hurting. 5-10 seconds is a practical lower bound.
  2. “What’s the alternative to DNS for fast-changing topologies?” Either (a) a sidecar-based service mesh that watches the registry and reroutes sub-second (Istio, Linkerd), or (b) a smart client library that maintains a watch on the registry (gRPC with Consul resolver). Both eliminate the TTL caching problem because the client is subscribed to changes, not polling.
  3. “How do you test that DNS caching actually works the way you think?” Run a tool like dig +short before and after a known topology change, and time when the new IP appears. Or set up a test service that returns its own IP; hit it continuously during a deploy and observe when the IP changes. Teams are surprised how often the observed propagation time is 3-10x the configured TTL.
Common Wrong Answers:
  1. “Disable DNS caching entirely.” Impractical; DNS caching is built into every resolver layer. You cannot turn it all off. You can only tune TTL and pick resolvers that respect it.
  2. “Increase retries on the client to mask the stale IPs.” Piles retries on top of dead endpoints. Each retry wastes a full connection-timeout. Total latency grows; root cause untouched.
Further Reading:
  • Julia Evans, “Learning DNS in 10 commands” — excellent primer on what’s actually happening.
  • Go runtime DNS resolver documentation, which explains how Go respects TTL (unlike many JVM configs).
  • AWS Route 53 documentation on TTL trade-offs at scale.

Kubernetes Service Discovery

Kubernetes provides built-in service discovery. Kubernetes bakes service discovery into the platform, and this is one of the main reasons teams adopt it. You do not install anything, you do not run a registry, you do not write registration code. You declare a Service object that selects pods by label, and Kubernetes assigns it a stable virtual IP (the ClusterIP). CoreDNS inside the cluster exposes that Service as a DNS name like payment.default.svc.cluster.local. Behind the ClusterIP, kube-proxy programs iptables or IPVS rules on every node so that traffic to that IP is transparently load-balanced across the healthy pods backing the service. Pods go up, pods go down, IPs churn, and the Service abstraction keeps the consumer’s view stable. The tradeoff is that this is L4 load balancing (random or round-robin at the TCP connection level), which is great for stateless HTTP/1.1 but a problem for HTTP/2 and gRPC because those multiplex many requests over a single long-lived connection. For gRPC you either use a headless Service (discussed below) with client-side load balancing, or you introduce a service mesh like Istio that does L7-aware balancing.
┌─────────────────────────────────────────────────────────────────────────────┐
│                   KUBERNETES SERVICE DISCOVERY                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         KUBERNETES CLUSTER                           │    │
│  │                                                                      │    │
│  │  ┌─────────────┐     ┌─────────────┐     ┌─────────────────────┐   │    │
│  │  │   Order     │────▶│  payment    │────▶│  Payment Pods       │   │    │
│  │  │    Pod      │     │  (Service)  │     │  ├── pod-1          │   │    │
│  │  └─────────────┘     │             │     │  ├── pod-2          │   │    │
│  │                      │  ClusterIP  │     │  └── pod-3          │   │    │
│  │                      │10.96.0.100  │     └─────────────────────┘   │    │
│  │                      └─────────────┘                                │    │
│  │                            ▲                                        │    │
│  │                            │                                        │    │
│  │                      ┌─────┴─────┐                                  │    │
│  │                      │  CoreDNS  │                                  │    │
│  │                      └───────────┘                                  │    │
│  │                                                                      │    │
│  │  DNS Resolution:                                                     │    │
│  │  payment.default.svc.cluster.local → 10.96.0.100                    │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Kubernetes Service Configuration

# k8s/payment-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: payment
  namespace: default
spec:
  selector:
    app: payment
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment
  template:
    metadata:
      labels:
        app: payment
    spec:
      containers:
        - name: payment
          image: payment-service:latest
          ports:
            - containerPort: 3000
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20

Using Kubernetes DNS

Once you have a Service in Kubernetes, application code becomes trivially simple. You just call the service name as if it were a hostname. No registry client, no discovery library, no health-check logic. The platform does all of it for you.
// In Kubernetes, just use the service name!
const paymentUrl = process.env.PAYMENT_URL || 'http://payment';

// Or use full DNS name
const fullDns = 'http://payment.default.svc.cluster.local';

// Cross-namespace
const orderUrl = 'http://order.other-namespace.svc.cluster.local';

Headless Services for Direct Pod Access

Sometimes you do not want Kubernetes’s ClusterIP load balancing. You want the raw list of pod IPs so your client can load-balance on its own terms. This is especially important for gRPC (as mentioned above), for stateful workloads like Cassandra or Kafka where each client needs to address specific members, and for any custom routing logic. A headless Service (clusterIP: None) skips the virtual IP entirely. DNS lookups of the service name return an A record per pod instead of the single ClusterIP. Your client receives the full list and chooses.
# k8s/payment-headless.yaml
apiVersion: v1
kind: Service
metadata:
  name: payment-headless
spec:
  clusterIP: None  # Headless service
  selector:
    app: payment
  ports:
    - port: 3000
// discovery/K8sDiscovery.js
const dns = require('dns').promises;

class K8sDiscovery {
  async discoverPods(serviceName, namespace = 'default') {
    // Headless service DNS returns all pod IPs
    const hostname = `${serviceName}-headless.${namespace}.svc.cluster.local`;

    try {
      const addresses = await dns.resolve4(hostname);
      return addresses.map(ip => ({
        address: ip,
        port: 3000
      }));
    } catch (error) {
      console.error(`Failed to discover ${serviceName}:`, error);
      return [];
    }
  }
}

// Get individual pod IPs for custom load balancing
const discovery = new K8sDiscovery();
const pods = await discovery.discoverPods('payment');
// [{ address: '10.244.0.5', port: 3000 }, { address: '10.244.0.6', port: 3000 }]

Kubernetes Client for Service Discovery

If DNS is not rich enough (you need pod labels, node placement, readiness state, or to watch for changes in real time), you can talk directly to the Kubernetes API. This is how service meshes and ingress controllers do it internally. Reading the Endpoints object for a service gives you the full list of backing pod IPs plus which ones are currently marked ready and which are not. Watching the endpoints stream lets you react to pod lifecycle events within milliseconds, avoiding DNS caching delays entirely. This approach is more powerful but more complex. You take on operational responsibility for kubeconfig handling, service account permissions (RBAC for reading endpoints), and reconnection logic on API server hiccups. Use it when you need real-time awareness or custom routing; otherwise DNS is fine.
// discovery/K8sClient.js
const k8s = require('@kubernetes/client-node');

class K8sServiceDiscovery {
  constructor() {
    const kc = new k8s.KubeConfig();
    kc.loadFromCluster(); // When running in cluster
    // kc.loadFromDefault(); // For local development

    this.coreApi = kc.makeApiClient(k8s.CoreV1Api);
  }

  async getEndpoints(serviceName, namespace = 'default') {
    try {
      const response = await this.coreApi.readNamespacedEndpoints(
        serviceName,
        namespace
      );

      const endpoints = [];

      for (const subset of response.body.subsets || []) {
        for (const address of subset.addresses || []) {
          for (const port of subset.ports || []) {
            endpoints.push({
              ip: address.ip,
              port: port.port,
              nodeName: address.nodeName,
              ready: true
            });
          }
        }

        // Include not-ready endpoints if needed
        for (const address of subset.notReadyAddresses || []) {
          for (const port of subset.ports || []) {
            endpoints.push({
              ip: address.ip,
              port: port.port,
              nodeName: address.nodeName,
              ready: false
            });
          }
        }
      }

      return endpoints;
    } catch (error) {
      console.error('Failed to get endpoints:', error);
      return [];
    }
  }

  async watchEndpoints(serviceName, namespace, callback) {
    const watch = new k8s.Watch(this.kc);

    await watch.watch(
      `/api/v1/namespaces/${namespace}/endpoints`,
      { fieldSelector: `metadata.name=${serviceName}` },
      (type, endpoint) => {
        console.log(`Endpoint ${type}:`, endpoint.metadata.name);
        callback(type, endpoint);
      },
      (err) => {
        console.error('Watch error:', err);
        // Reconnect
        setTimeout(() => this.watchEndpoints(serviceName, namespace, callback), 5000);
      }
    );
  }
}

Kubernetes Discovery Caveats and Interview Deep-Dive

Kubernetes service discovery traps practitioners fall into:
  1. Liveness vs readiness probe confusion. Using liveness where readiness is needed means Kubernetes restarts a pod that was just warming up. Using readiness where liveness is needed means a deadlocked pod stays in the endpoint list forever — load balancer does not restart it because “it is not ready,” but the stuck pod is never evicted. Swap the two semantics and the behavior becomes subtly broken.
  2. Shallow liveness probe. livenessProbe: httpGet: /health that returns 200 when the Go HTTP server is up. A deadlock inside the request handler does not stop the server from responding to /health. The pod is stuck in a crash loop that Kubernetes never notices.
  3. Graceful shutdown race with endpoint update. Pod gets SIGTERM, exits immediately. Kubernetes removes the pod from Service endpoints eventually (kube-proxy on every node has to catch up). For the few seconds between SIGTERM and full endpoint propagation, other pods route traffic to the dead pod.
  4. Headless Service misuse with HTTP/2 or gRPC. HTTP/2 multiplexes many requests over one long-lived connection. Client picks one pod at DNS-resolution time; every request for the lifetime of the connection goes to that pod. A 3-pod headless Service with 1 client means 100 percent of traffic to pod 1, 0 percent to the others.
Solutions and patterns:
  • Liveness probe: process is alive; readiness probe: can serve. Liveness should be lightweight (the process is running, main loop hasn’t deadlocked). Readiness should be meaningful (dependencies reachable, caches warm, connection pools established).
  • Separate probe endpoints. /live and /ready with different logic. Liveness checks should never fail due to a downstream being slow; readiness can.
  • Pre-stop hook with sleep + graceful drain. preStop: exec: command: ["sh", "-c", "sleep 15 && kill -TERM 1"]. The sleep gives kube-proxy time to remove the pod from endpoints before the app begins shutdown.
  • Graceful shutdown in the application. SIGTERM handler stops accepting new requests, finishes in-flight work with a timeout, cleanly closes connections. Process exit happens only when drained.
  • For gRPC / HTTP/2: use a service mesh or client-side LB. Sidecar (Envoy via Istio / Linkerd) does L7 request-level load balancing across persistent connections. Or: headless Service + gRPC’s built-in resolver + round_robin pick policy at the client.
Strong Answer Framework:
  1. Define the two distinct semantics. Liveness: “is this process alive enough to be worth keeping?” If no, restart. Readiness: “is this pod ready to serve traffic right now?” If no, remove from endpoints but leave running.
  2. Different questions, different checks. Liveness checks internal state of the process: main event loop responsive, no deadlock, no OOM. Readiness checks external dependencies: database connection pool has capacity, downstream reachable, warm caches populated.
  3. Failure modes differ. Failing liveness restarts the pod, which is disruptive. Use liveness only for conditions where a restart actually helps (deadlock, stuck GC, corrupted in-memory state). Do not fail liveness because a downstream is slow — restarting your pod does not fix the downstream.
  4. Readiness toggles traffic routing. When a pod’s database pool exhausts temporarily, failing readiness removes it from endpoints; kube-proxy redirects traffic to other pods; the affected pod’s pool recovers and it rejoins. No restart, no data loss, fast recovery.
  5. Startup probes for slow-starting apps. Kubernetes 1.20+ added startupProbe for containers that take a long time to initialize. Use it for JVM services with long warmup periods. The liveness probe does not activate until the startup probe passes.
  6. Implement business-level health, not just HTTP 200. /ready issues a SELECT 1 against the DB, pings the cache, checks a downstream. Fails if any critical dependency is unreachable. This is what catches “healthy but degraded.”
Real-World Example: In 2019, a video streaming platform had all pods in their encoder service passing /health while 40 percent of encoding requests failed. Root cause: /health returned 200 because the HTTP server was up. The real issue was a hung FFmpeg subprocess pool — new encode requests queued forever. Fix: /ready was rewritten to check “can I start a new encode within 2 seconds?” Pods with hung pools failed readiness, were removed from endpoints, and were auto-recycled by a separate cronjob that restarted pods not-ready for over 5 minutes.Senior Follow-up Questions:
  1. “How aggressive should readiness checks be? What if the downstream is flaky?” Too aggressive and every downstream hiccup removes all pods from endpoints, cascading failure. Solution: check the dependency but count recent successes; only fail readiness if recent success rate drops below a threshold. Think of readiness as “I can probably serve” not “every dependency is perfect.”
  2. “What about checking readiness in the sidecar rather than the app?” Istio’s Pilot checks readiness via the sidecar’s probe endpoints. This means the sidecar can vote on readiness independently of the app (e.g., fail readiness if the sidecar cannot reach the mesh control plane). Trade-off: more moving parts, slightly more complex to debug.
  3. “How do you handle migrations or warm-up work during startup?” Separate liveness (process alive) from readiness (service ready). Set readiness to false until warm-up completes. Kubernetes will leave the pod running but route no traffic to it. Once warm, readiness flips true and kube-proxy starts sending requests. For multi-stage startup, the startupProbe is designed exactly for this.
Common Wrong Answers:
  1. “Use the same endpoint for liveness and readiness; it is simpler.” Simpler until a slow downstream causes every pod to restart simultaneously because liveness is failing. Simpler is a false economy.
  2. “The app always returns 200 on /health if the server is up; everything else is orchestration’s problem.” Guarantees that degraded pods will stay in the load balancer. The whole point of Kubernetes health checks is that the app participates honestly in its own health reporting.
Further Reading:
  • Kubernetes documentation, “Configure Liveness, Readiness and Startup Probes.”
  • Tim Hockin’s talks on Kubernetes networking and pod lifecycle.
  • CNCF blog posts on health check anti-patterns.
Strong Answer Framework:
  1. Diagnose the cause. gRPC uses HTTP/2, which multiplexes many requests over one long-lived TCP connection. Kubernetes L4 (iptables, IPVS) load balancing picks a target when the TCP connection is established, not per request. Once a client connects to pod A, every subsequent request over that connection also goes to pod A.
  2. Explain why this is a Kubernetes Services limitation. ClusterIP does L4 random or round-robin at connection time. For HTTP/1.1 with short connections, this works fine — each request effectively picks a new target. For HTTP/2 with persistent connections, it fails.
  3. Fix Option 1: client-side load balancing with a headless Service. Set clusterIP: None. Client resolves the Service name and gets all pod IPs. gRPC’s built-in round_robin load balancer distributes requests across the pod connections. Your client needs to be gRPC-aware: grpc.WithDefaultServiceConfig('{"loadBalancingPolicy":"round_robin"}') in Go, similar options in Java and Python.
  4. Fix Option 2: service mesh. Istio or Linkerd injects an Envoy / linkerd-proxy sidecar. The sidecar does L7 request-level load balancing across persistent connections. No client code changes. Cost: added complexity of running a mesh.
  5. Fix Option 3: L7 ingress. If traffic is inbound from outside the cluster, an L7 ingress (Contour, Ambassador, Emissary, ingress-nginx with HTTP/2 upstream) handles this. For pod-to-pod, this is less relevant.
Real-World Example: When gRPC started seeing widespread production adoption around 2017-2019, exactly this issue bit many teams. Kubernetes blog post “Kubernetes: Expose gRPC Services via Ingress” explicitly documents the problem. Istio adoption at companies like Lyft and eBay was partly driven by the desire to fix gRPC load balancing without requiring every client library to implement it.Senior Follow-up Questions:
  1. “Why does HTTP/1.1 not have this problem?” HTTP/1.1 clients often open a new connection per request (or close after a few), so each connection goes through the Kubernetes LB again, getting a new target. HTTP/2 keeps the connection open indefinitely; one connection = one target for its lifetime.
  2. “If I use client-side round-robin, what happens when a pod is added or removed?” gRPC’s resolver can re-resolve DNS on connection errors or periodically. On change, the client opens new subchannels and load-balances across the new set. There is still a brief window where the client has not seen the change; during that window, traffic may skew. Service mesh sidecars have lower propagation latency than DNS-based resolution.
  3. “What’s the overhead of a service mesh at high RPS?” Envoy sidecar adds roughly 0.5-2 ms of latency per hop in most measurements, plus ~50-150 MB of memory per pod. At thousands of RPS per pod, CPU impact is typically 5-15 percent. Significant but usually acceptable given what you gain. High-frequency trading or similar latency-critical workloads may find it unacceptable.
Common Wrong Answers:
  1. “Open a new gRPC connection for every request.” Defeats the point of HTTP/2. TCP + TLS handshake overhead per request is 10-100 ms. Never do this.
  2. “Restart pods to reset connections.” Temporary fix that hides the problem. New connections during the restart period might re-balance, but as soon as traffic settles the same problem returns. Does not scale.
Further Reading:
  • Kubernetes Blog, “gRPC Load Balancing on Kubernetes.”
  • gRPC documentation, “Load Balancing Policies.”
  • Istio Blog, “How Istio Solves the Long-Lived Connection Load Balancing Problem.”
Strong Answer Framework:
  1. Quantify the overhead. Envoy sidecar adds roughly 0.5-2 ms p50 latency per call, 1-5 ms p99. Memory: 50-150 MB per pod. CPU: 5-15 percent overhead on request-heavy workloads. Numbers vary by config (mTLS on/off, observability features, RBAC).
  2. Identify when the overhead is significant. For an internal call that takes 50 ms, a 1 ms sidecar hop is noise (2 percent). For a call that takes 2 ms, a 1 ms hop is 50 percent overhead — and you have two sidecars (source and destination) so 2 ms total. Latency-critical paths (HFT, real-time bidding, ad serving sub-10ms) feel the pain.
  3. Articulate the benefit to justify the cost. What you get: automatic mTLS, zero-code-change traffic shifting (canary, A/B), per-pod circuit breaking and outlier ejection, L7 load balancing for gRPC, observability (golden metrics per call) without library work, retry/timeout policies in config rather than code.
  4. Decide based on team size and operational maturity. Small teams: mesh operational cost exceeds the benefit. Large orgs with hundreds of services: mesh pays for itself by standardizing what would otherwise be dozens of ad-hoc retry/timeout/TLS implementations.
  5. Consider mesh alternatives for specific use cases. gRPC’s built-in load balancing + client-side retries + mTLS via cert-manager can cover much of the value without the sidecar. App-side circuit breakers exist in every language.
Real-World Example: In 2021, a large fintech ran a carefully-measured experiment comparing their service-to-service traffic with and without Istio. Latency overhead p99 was 2.1 ms with mTLS, 0.9 ms without. For their average 120 ms API latency, this was acceptable. For their internal trading path (8 ms end-to-end), it was not, and they kept Istio out of those services. Hybrid mesh adoption is common and often correct.Senior Follow-up Questions:
  1. “How do you decide when to add a mesh to a service vs. keep it out?” Rule of thumb: if sidecar overhead is under 10 percent of end-to-end latency, cost is acceptable. Below 10 ms of baseline latency, the math gets tight; you need to measure specifically.
  2. “What’s the difference between a sidecar mesh (Istio, Linkerd) and a sidecarless mesh (Cilium, Istio Ambient Mesh)?” Sidecar injects a proxy per pod; sidecarless runs at the node level (eBPF or a shared per-node proxy). Sidecarless reduces memory overhead significantly (no per-pod proxy) and often reduces latency. Trade-off: less per-pod isolation, node-level failure domain. Ambient Mesh and Cilium service mesh are the current evolution.
  3. “What if you only need mTLS and metrics, not the full mesh?” Use a lighter solution: SPIFFE/SPIRE for identity + mTLS, Prometheus exporters in-process for metrics. Many teams discover that their “mesh needs” are 20 percent of what Istio offers, and they can assemble a simpler stack.
Common Wrong Answers:
  1. “Install Istio; it is the standard.” Skips the cost-benefit analysis. Plenty of teams have deployed Istio and regretted it because their workload did not need it and operational cost was high.
  2. “Mesh overhead is negligible at any scale.” Simply wrong for latency-critical paths and for teams at the scale where mesh control plane CPU / memory starts to become a noticeable budget line.
Further Reading:
  • Linkerd’s published performance benchmarks (they explicitly focus on low overhead).
  • Istio Ambient Mesh announcement and architecture.
  • Cilium Service Mesh documentation for the sidecarless / eBPF approach.

Service Mesh Discovery

For complex scenarios, use a service mesh like Istio or Linkerd.
┌─────────────────────────────────────────────────────────────────────────────┐
│                    SERVICE MESH (ISTIO)                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌───────────────────────────────────────────────────────────────────────┐  │
│  │                              Istio Control Plane                       │  │
│  │  ┌─────────┐   ┌──────────┐   ┌───────────┐   ┌─────────────┐        │  │
│  │  │  Pilot  │   │ Citadel  │   │  Galley   │   │   Mixer     │        │  │
│  │  │(routing)│   │ (certs)  │   │ (config)  │   │ (telemetry) │        │  │
│  │  └────┬────┘   └──────────┘   └───────────┘   └─────────────┘        │  │
│  └───────┼───────────────────────────────────────────────────────────────┘  │
│          │                                                                   │
│  ┌───────┼───────────────────────────────────────────────────────────────┐  │
│  │       ▼                    Data Plane                                  │  │
│  │  ┌────────────────┐              ┌────────────────┐                   │  │
│  │  │  Order Pod     │              │  Payment Pod   │                   │  │
│  │  │ ┌────────────┐ │              │ ┌────────────┐ │                   │  │
│  │  │ │   Envoy    │◀├──────────────┤▶│   Envoy    │ │                   │  │
│  │  │ │   Proxy    │ │  mTLS +      │ │   Proxy    │ │                   │  │
│  │  │ └────────────┘ │  Discovery   │ └────────────┘ │                   │  │
│  │  │ ┌────────────┐ │              │ ┌────────────┐ │                   │  │
│  │  │ │   Order    │ │              │ │  Payment   │ │                   │  │
│  │  │ │  Service   │ │              │ │  Service   │ │                   │  │
│  │  │ └────────────┘ │              │ └────────────┘ │                   │  │
│  │  └────────────────┘              └────────────────┘                   │  │
│  └───────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
│  • Service discovery handled by Envoy sidecars                              │
│  • No code changes needed                                                   │
│  • Automatic mTLS, retries, circuit breaking                               │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Comparison

AspectConsulDNS-BasedKubernetesService Mesh
SetupMediumSimpleBuilt-inComplex
Health ChecksYesLimitedProbesYes
Load BalancingClientDNS round-robinkube-proxyEnvoy
Dynamic UpdatesReal-timeTTL-basedReal-timeReal-time
Multi-clusterYesLimitedWith toolsYes
Code ChangesSomeMinimalMinimalNone

Interview Questions

Answer:Client-Side Discovery:
  • Client queries registry directly
  • Client implements load balancing logic
  • More control, but more complexity per client
  • Examples: Netflix Ribbon, custom implementations
Server-Side Discovery:
  • Client calls load balancer/router
  • Router queries registry and forwards request
  • Simpler clients, centralized control
  • Examples: AWS ALB, Kubernetes Services, Nginx
Trade-offs:
  • Client-side: Better for smart routing, custom logic
  • Server-side: Simpler clients, potential bottleneck
Answer:Components:
  1. Service - Stable IP (ClusterIP) for pod group
  2. CoreDNS - Resolves service names to IPs
  3. kube-proxy - Maintains iptables/IPVS rules
  4. Endpoints - Tracks healthy pod IPs
Flow:
  1. Pod calls payment.default.svc.cluster.local
  2. CoreDNS resolves to Service ClusterIP
  3. kube-proxy routes to healthy pod IP
  4. Readiness probes ensure only healthy pods receive traffic
Headless Services:
  • clusterIP: None
  • DNS returns individual pod IPs
  • For stateful apps or custom load balancing
Answer:Registration Methods:
  1. Self-registration - Service registers itself
  2. Third-party - Orchestrator registers service
  3. Sidecar - Proxy handles registration
Failure Handling:
  1. Health checks - Registry removes unhealthy instances
  2. TTL/Heartbeat - Auto-deregister on missed beats
  3. Graceful shutdown - Deregister before terminating
  4. Circuit breakers - Client-side protection
Best Practices:
  • Always implement health endpoints
  • Use graceful shutdown hooks
  • Cache discovery results with TTL
  • Have fallback for registry unavailability

Summary

Key Takeaways

  • Service discovery enables dynamic routing
  • Client-side vs server-side trade-offs
  • Consul provides full-featured discovery
  • Kubernetes has built-in discovery
  • Health checks ensure traffic goes to healthy instances

Next Steps

In the next chapter, we’ll explore Observability - logging, metrics, and distributed tracing.

Interview Deep-Dive

Strong Answer:This is a readiness probe misconfiguration combined with a service discovery timing issue. The core problem is that Kubernetes (or your service registry) is sending traffic to pods before the application is ready to handle requests.The immediate fix is proper readiness probes. The readiness probe should not return 200 until the service has completed all initialization: database connection pool is established, configuration is loaded, any warmup caches are populated, and the health check endpoint confirms the service can process requests end-to-end. I have seen teams use a simple “can you reach port 3000?” TCP probe, which returns healthy the moment the HTTP server starts listening — before the database connection is established. A proper readiness probe hits an endpoint like /ready that verifies all dependencies.The second fix is the pre-stop lifecycle hook. When Kubernetes terminates a pod during rolling update, there is a race condition: the pod is removed from the Service endpoints, but in-flight requests and cached DNS might still route to it. I add a pre-stop hook that sleeps for 5-10 seconds, giving time for the endpoints update to propagate to all kube-proxy instances and ingress controllers. Only after that sleep does the pod begin shutting down.The third fix is graceful shutdown in the application. When the pod receives SIGTERM, it should stop accepting new requests, finish processing in-flight requests (up to a timeout), close database connections cleanly, and then exit. This is often missed in Node.js applications where process.on('SIGTERM', () => process.exit(0)) kills in-flight requests immediately.In Consul-based discovery, the equivalent problem is the deregister delay. When a service instance shuts down, it must deregister from Consul before stopping. If it crashes without deregistering, Consul relies on health check failures to remove it, which can take 30-60 seconds of failed requests.Follow-up: “How do you handle the cold start problem where a newly deployed instance is slower for its first few hundred requests?”This is the JVM warmup problem (or in Node.js, V8’s JIT compilation warmup). I use two strategies. First, I configure the rolling update to use a slow startup probe — Kubernetes 1.20+ supports startupProbe with a longer interval and more retries than the readiness probe, giving the container time to warm up. Second, I add a warmup step to the container startup: before marking itself as ready, the service sends synthetic traffic to itself (a few hundred representative requests) to warm up connection pools, JIT-compile hot paths, and populate any in-process caches. Only after warmup completes does the readiness probe return healthy.
Strong Answer:In a Kubernetes environment, the default answer is server-side discovery via Kubernetes Services and CoreDNS — and for most teams, this is the right choice. Kubernetes Services give you a stable ClusterIP that load-balances across healthy pods via kube-proxy (iptables or IPVS rules). Your application code just calls http://payment-service and Kubernetes handles the rest. Zero client-side complexity.I would switch to client-side discovery in three specific scenarios. First, when you need smarter load balancing than round-robin. Kubernetes Services do basic round-robin (or random, depending on kube-proxy mode), but if you need least-connections, latency-based routing, or weighted distribution, you need client-side logic. A headless Kubernetes Service (clusterIP: None) returns all pod IPs via DNS, and your client library implements the routing algorithm.Second, when you are using gRPC. Because gRPC multiplexes many requests over a single long-lived HTTP/2 connection, Kubernetes L4 load balancing sends all requests from one client to one pod. The other pods sit idle. You need either client-side load balancing (gRPC’s built-in resolver with round-robin pick-first policy) or a service mesh (Envoy in Istio does L7 gRPC-aware load balancing).Third, when you need cross-cluster service discovery. If Payment Service runs in cluster A and Order Service runs in cluster B, Kubernetes Services do not span clusters natively. You need a service mesh (Istio multi-cluster) or an external registry (Consul) that spans both clusters, with client-side resolution.For most microservices teams running in a single Kubernetes cluster with REST APIs, server-side discovery via Kubernetes Services is sufficient and dramatically simpler. Do not adopt Consul or custom client-side discovery unless you have a specific requirement that Kubernetes-native discovery cannot meet.Follow-up: “What happens if CoreDNS goes down in your Kubernetes cluster?”CoreDNS is a critical component — if it goes down, no new DNS resolutions work, meaning services cannot discover each other for new connections. However, existing TCP connections continue to function, and kube-proxy’s iptables rules (which are independent of DNS) still route traffic to the ClusterIP. The practical impact depends on how frequently your services open new connections. With HTTP keep-alive and connection pooling, most services continue functioning for minutes. But any new pods (scaling up, deployments) will not be reachable. CoreDNS runs as a Deployment with multiple replicas, so a single pod failure is handled by the remaining replicas. A full CoreDNS outage is a cluster-level incident that requires immediate response.
Strong Answer:The key is routing control at the discovery layer. During a blue-green deployment, both versions are running simultaneously, but only one should receive production traffic at any given time.In Kubernetes, I use label selectors. The Service’s selector points to app: payment, color: blue. When I deploy the green version, it runs alongside blue but the Service does not route to it because the label does not match. After green passes health checks and smoke tests, I update the Service selector to color: green. Traffic shifts atomically. If green has issues, I revert the selector to color: blue in seconds.For Consul-based discovery, I register both versions with different tags: payment-blue and payment-green. The service client or API gateway queries for the active tag, which is stored as a configuration value in Consul’s KV store. Switching production traffic means updating one KV value, which all clients pick up within their next watch cycle (typically under a second).The tricky part is database schema compatibility. Both blue and green versions must work with the same database schema, which means database migrations must be backward-compatible. I use the expand-and-contract pattern: the green deployment adds new columns/tables but does not remove old ones. After green is confirmed stable and blue is decommissioned, a follow-up migration removes the deprecated columns.Follow-up: “How do you test the green environment with real production traffic before switching over completely?”I use canary testing within the blue-green framework. Instead of switching 100% of traffic at once, I configure the load balancer or service mesh to send 5% of traffic to green while blue handles 95%. I monitor error rates, latency percentiles, and business metrics (conversion rate, payment success rate) for both versions. If green’s metrics are comparable to blue’s after 15-30 minutes, I increase to 25%, then 50%, then 100%. If any metric degrades, I route 100% back to blue. Istio’s VirtualService makes this trivial with weighted routing rules.