Service Discovery
In a microservices architecture, services need to find and communicate with each other dynamically. Service discovery solves the problem of locating services in a constantly changing environment.- Understand service discovery patterns
- Implement Consul-based discovery
- Use DNS-based service discovery
- Configure Kubernetes native discovery
- Build health-aware load balancing
Why Service Discovery?
In a monolith, components talk to each other via in-process function calls. In a microservices system, components are separate processes, often on separate machines, sometimes in separate data centers. Traditionally you would hardcode IP addresses or hostnames into configuration files, but this falls apart the moment your infrastructure becomes dynamic. Containers are ephemeral. Auto-scalers spin instances up and down. Cloud providers reschedule workloads onto new VMs without warning. A service that was at192.168.1.10 this morning might be at 10.0.4.77 this afternoon, and the IP you cached an hour ago now points to nothing.
Service discovery is the mechanism that lets a caller ask “where is the payment service right now?” and get back a fresh, healthy answer. It decouples logical service identity (the name “payment-service”) from physical network location (IP and port). Without it, every deployment, scaling event, or node failure becomes a config-change emergency.
Service Discovery Patterns
There are two fundamental architectures for how a caller locates a callee: client-side discovery and server-side discovery. They represent a classic tradeoff between client complexity and infrastructure complexity. In client-side discovery, each service embeds a discovery library that queries a registry, picks an instance, and makes the call directly. The client gets total control (custom load-balancing, latency-aware routing, circuit breaking per instance) but every language you use needs a compatible client library, and the logic lives in every service you ship. In server-side discovery, clients call a single well-known address (usually a load balancer or ingress), and that intermediary is responsible for looking up healthy instances and forwarding the request. Clients become dumb; they do not even need to know the registry exists. The tradeoff is an extra network hop and a potential bottleneck, but you gain uniform cross-language behavior and simpler app code. Kubernetes Services, AWS ALBs, and Nginx Plus all implement server-side discovery. Most modern teams on Kubernetes start with server-side and only reach for client-side when they have a specific need (gRPC load balancing, smart routing, cross-cluster traffic).Consul-Based Discovery
Consul is a full-featured service discovery solution with health checking.Setting Up Consul
Consul is HashiCorp’s distributed service registry. It solves two problems at once: it stores a live catalog of every service instance (address, port, tags, health status) and it actively probes those instances with health checks so unhealthy ones are automatically hidden from lookups. You run a small Consul cluster (three or five servers for production) and each application host or container runs a lightweight Consul agent. This is the classic third-party registry pattern, and it works equally well outside Kubernetes (VMs, bare-metal, mixed environments) where Kubernetes-native discovery is not an option.Service Registration
Registration is how an instance announces itself to the world. At startup, the service calls Consul’s agent API with its name, address, port, and a description of how Consul should check its health. Consul immediately begins polling that health endpoint (usually every 10 seconds) and marks the instance aspassing, warning, or critical accordingly. Only passing instances show up in discovery queries by default.
The critical detail most teams miss is deregistration on shutdown. If your pod receives SIGTERM and exits without telling Consul, your instance stays in the registry as “healthy” until the next health check fails, typically 10-30 seconds later. During that window, callers send real requests to a dead instance and get connection refused errors. The fix is a graceful shutdown hook that explicitly deregisters before the process dies. Here is the pattern in both Node.js and Python; in Python we use FastAPI’s lifespan context manager, which is the idiomatic place to put “run at startup, run at shutdown” code.
- Node.js
- Python
Service Discovery Client
Registration is only half the story. The other half is discovery: callers need a way to ask “give me a list of healthy payment-service instances” and get a fresh answer. A naive implementation would hit the Consul API on every outbound request, but that turns Consul into a per-request bottleneck and adds tens of milliseconds of latency. The standard fix is a local cache with a short TTL (5-10 seconds) combined with a background watch that streams changes from Consul in real time. When an instance goes unhealthy or a new one registers, Consul pushes the update and the client cache refreshes within a second. This gives you near-real-time visibility without hammering the registry. A well-designed discovery client also degrades gracefully. If Consul itself is unreachable, return the last-known-good cached list rather than failing. It is almost always better to serve a stale instance list than to take the entire caller offline because the registry had a hiccup.- Node.js
- Python
Load Balancer with Discovery
Once you have a live list of healthy instances, you still need to decide which one to call. This is client-side load balancing. The simplest strategy is round-robin (cycle through the list), which works fine for stateless requests of similar cost. Random selection is similar but avoids the “thundering herd” problem where all clients sync up on the same rotation. Least-connections is better when request costs vary wildly; weighted strategies help with heterogeneous instances (some pods with more CPU than others). A production-grade service client wraps the discovery cache with a circuit breaker per instance. If a specific instance fails three requests in a row, mark it as “open” and skip it for the next 30 seconds, even if Consul still reports it as healthy. This handles the case where an instance is technically responding to health checks but failing real business requests (a slow dependency, a deadlock, a full disk). The discovery layer gives you the possible instances; the circuit breaker layer decides which to actually trust.- Node.js
- Python
Consul-Based Discovery Caveats and Interview Deep-Dive
Your service registers as healthy in Consul but 30 percent of requests fail. How does your discovery layer help or hurt?
Your service registers as healthy in Consul but 30 percent of requests fail. How does your discovery layer help or hurt?
- Distinguish between “is registered” and “is serving correctly”. Registration means Consul’s health check passes. That check is usually shallow — HTTP returns 200. Serving correctly means the service can actually handle real business requests. These are not the same.
- Improve the health check. Replace shallow
/health(HTTP up) with deep/healththat verifies every critical dependency: database connection pool has a healthy connection; cache is reachable; authentication provider is reachable. Make the check honest. - Use separate liveness and readiness signals (if on k8s). Liveness: “is this process alive?” (restart on fail). Readiness: “is this pod serving correctly?” (remove from load balancer on fail but do not restart). Consul’s equivalent:
passingvswarningstate with your load balancer treatingwarningas “do not route.” - Instrument the discovery layer. Every failed request logs the instance ID that handled it. After the incident, correlate failures by instance — is one instance much worse than the others? If yes, Consul’s health check did not catch it. Add a business-level metric (success rate) to the health check.
- Add client-side per-instance circuit breakers. Even with perfect Consul health checks, an instance can degrade between check cycles. The discovery client tracks failures per instance and skips instances that have recently failed, even if Consul still reports them healthy.
- Consider ejection via outlier detection. Istio / Envoy outlier detection automatically removes instances that return high error rates from the load-balancing pool. This is discovery that reacts to actual request outcomes, not just health check outcomes.
/health endpoint returned 200 as long as the Spring Boot actuator was alive. The Postgres connection pool would silently exhaust during traffic spikes; health checks kept passing, but 30 percent of real requests hit database timeout. Fix: new /health that ran SELECT 1 against the primary database and failed the check if it took over 500 ms. Deployed on a Friday; the weekend’s Saturday-morning traffic spike showed the new health check correctly removing degraded pods within 20 seconds.Senior Follow-up Questions:- “Why not just check the downstream dependencies directly from the load balancer?” You can, but it creates cascading failures. If the dependency is slow for everyone, the load balancer thinks every instance is unhealthy and removes all of them. Now you have zero capacity. Health checks must be local (does this instance work?) not global (is the downstream up?).
- “How do you handle a partial outage where instance A can reach dependency X but instance B cannot?” Per-instance deep health checks catch this. Instance B’s health check fails because B cannot reach X; Consul marks B unhealthy; discovery routes only to A. Instance A keeps serving. Without per-instance checks, you would route uniformly and 50 percent of requests fail.
- “What’s the trade-off of making health checks expensive (deep checks)?” Deep checks query the database, cache, and dependencies, which adds load. Consul polls every 10 seconds per instance; with 100 instances, that is 10 DB queries per second just for health. Mitigate by (a) caching health state for a few seconds within the service, and (b) using a dedicated lightweight query that does not impact production load.
- “Add more replicas to compensate for the 30 percent failure rate.” Ignores the root cause. If the health check lies, adding more instances means more lying instances, same failure rate. Throwing capacity at a symptom does not fix the diagnosis problem.
- “Lower the Consul check interval so we detect failures faster.” Helps marginally but the shallow-check problem remains. A faster shallow check still does not catch database degradation. The check needs to be deeper, not just more frequent.
- HashiCorp Consul documentation, “Service Health Checks.”
- Kubernetes documentation, “Configure Liveness, Readiness and Startup Probes.”
- Envoy documentation, “Outlier detection” for per-host ejection.
Explain how your discovery layer should handle Consul itself being unavailable.
Explain how your discovery layer should handle Consul itself being unavailable.
DNS-Based Discovery
DNS is the oldest service discovery mechanism in existence. Instead of running a dedicated registry client in every service, you rely on DNS, which every programming language and network library already speaks natively. The registry (Consul, CoreDNS, Route 53) exposes a DNS interface, and when you resolvepayment.service.consul, you get back one or more A records (IP addresses) for healthy instances. If the registry also provides SRV records, you can even get the port along with the IP in a single lookup.
The appeal of DNS-based discovery is that it works with zero code changes in most cases. Any language, any HTTP client, any ORM, any legacy tool can consume it. The downside is DNS’s weakest feature: caching behavior. Clients (and especially OS resolvers and JVM runtimes) aggressively cache DNS results based on TTL. If you set TTL to 60 seconds to reduce DNS query load, your client keeps hitting a dead IP for up to 60 seconds after the instance dies. If you set TTL to 1 second to stay fresh, you multiply DNS query volume by 60x. DNS also cannot express rich health information (only “in the list” or “not in the list”) and gives only round-robin load balancing. It is simple and universal, but it is a coarse tool.
DNS Discovery Implementation
When you implement a DNS-based discovery client, you still want a small in-process cache to avoid hammering the resolver on every request, but the cache TTL should be short (5-10 seconds) to stay responsive to topology changes. SRV records are strictly better than A records when your registry supports them because they deliver both the address and the port, which matters when instances of the same service listen on different ports.- Node.js
- Python
DNS-Based Discovery Caveats and Interview Deep-Dive
Your service uses DNS-based discovery. During a deploy, you see 503 errors for 45 seconds. What's the diagnosis?
Your service uses DNS-based discovery. During a deploy, you see 503 errors for 45 seconds. What's the diagnosis?
- Almost certainly DNS TTL caching. During the rolling deploy, a pod is terminated, its IP is reassigned. DNS updates take effect immediately in the registry, but clients cached the old IP. Until the cache expires, they keep trying the dead IP.
- Check the TTL. What does the DNS authoritative server return? What does the caller’s OS or runtime cache actually honor? These may differ.
- Check for JVM-style infinite caching. If any callers run on the JVM with default settings, their caches may be indefinite. Single biggest DNS discovery landmine in multi-language environments.
- Short-term mitigation: pre-stop hook with drain. Kubernetes pre-stop hook sleeps 30-60 seconds and sends a SIGTERM to the app. During that time, the pod is removed from Service endpoints, but it continues accepting existing connections. Gives the DNS cache time to expire before the pod actually dies.
- Medium-term: lower TTL. Move from 60 seconds to 5 seconds. Trade more DNS load for faster topology propagation. Monitor DNS server CPU.
- Long-term: move beyond DNS for this service. If topology changes happen often (autoscaling, blue-green), DNS is the wrong primitive. Adopt a watch-based discovery (Consul watch, gRPC name resolver, service mesh). Real-time updates with zero TTL.
networkaddress.cache.ttl=10 platform-wide via a JVM agent. Deploy-time 503s dropped by 95 percent.Senior Follow-up Questions:- “Why not just set TTL to 1 second?” DNS query volume scales with 1/TTL. At 1 second, a service with 1000 client pods generates 1000 DNS queries per second just for that one name. Multiply by all services. CoreDNS / dnsmasq / unbound start hurting. 5-10 seconds is a practical lower bound.
- “What’s the alternative to DNS for fast-changing topologies?” Either (a) a sidecar-based service mesh that watches the registry and reroutes sub-second (Istio, Linkerd), or (b) a smart client library that maintains a watch on the registry (gRPC with Consul resolver). Both eliminate the TTL caching problem because the client is subscribed to changes, not polling.
-
“How do you test that DNS caching actually works the way you think?” Run a tool like
dig +shortbefore and after a known topology change, and time when the new IP appears. Or set up a test service that returns its own IP; hit it continuously during a deploy and observe when the IP changes. Teams are surprised how often the observed propagation time is 3-10x the configured TTL.
- “Disable DNS caching entirely.” Impractical; DNS caching is built into every resolver layer. You cannot turn it all off. You can only tune TTL and pick resolvers that respect it.
- “Increase retries on the client to mask the stale IPs.” Piles retries on top of dead endpoints. Each retry wastes a full connection-timeout. Total latency grows; root cause untouched.
- Julia Evans, “Learning DNS in 10 commands” — excellent primer on what’s actually happening.
- Go runtime DNS resolver documentation, which explains how Go respects TTL (unlike many JVM configs).
- AWS Route 53 documentation on TTL trade-offs at scale.
Kubernetes Service Discovery
Kubernetes provides built-in service discovery. Kubernetes bakes service discovery into the platform, and this is one of the main reasons teams adopt it. You do not install anything, you do not run a registry, you do not write registration code. You declare aService object that selects pods by label, and Kubernetes assigns it a stable virtual IP (the ClusterIP). CoreDNS inside the cluster exposes that Service as a DNS name like payment.default.svc.cluster.local. Behind the ClusterIP, kube-proxy programs iptables or IPVS rules on every node so that traffic to that IP is transparently load-balanced across the healthy pods backing the service. Pods go up, pods go down, IPs churn, and the Service abstraction keeps the consumer’s view stable.
The tradeoff is that this is L4 load balancing (random or round-robin at the TCP connection level), which is great for stateless HTTP/1.1 but a problem for HTTP/2 and gRPC because those multiplex many requests over a single long-lived connection. For gRPC you either use a headless Service (discussed below) with client-side load balancing, or you introduce a service mesh like Istio that does L7-aware balancing.
Kubernetes Service Configuration
Using Kubernetes DNS
Once you have a Service in Kubernetes, application code becomes trivially simple. You just call the service name as if it were a hostname. No registry client, no discovery library, no health-check logic. The platform does all of it for you.- Node.js
- Python
Headless Services for Direct Pod Access
Sometimes you do not want Kubernetes’s ClusterIP load balancing. You want the raw list of pod IPs so your client can load-balance on its own terms. This is especially important for gRPC (as mentioned above), for stateful workloads like Cassandra or Kafka where each client needs to address specific members, and for any custom routing logic. A headless Service (clusterIP: None) skips the virtual IP entirely. DNS lookups of the service name return an A record per pod instead of the single ClusterIP. Your client receives the full list and chooses.
- Node.js
- Python
Kubernetes Client for Service Discovery
If DNS is not rich enough (you need pod labels, node placement, readiness state, or to watch for changes in real time), you can talk directly to the Kubernetes API. This is how service meshes and ingress controllers do it internally. Reading theEndpoints object for a service gives you the full list of backing pod IPs plus which ones are currently marked ready and which are not. Watching the endpoints stream lets you react to pod lifecycle events within milliseconds, avoiding DNS caching delays entirely.
This approach is more powerful but more complex. You take on operational responsibility for kubeconfig handling, service account permissions (RBAC for reading endpoints), and reconnection logic on API server hiccups. Use it when you need real-time awareness or custom routing; otherwise DNS is fine.
- Node.js
- Python
Kubernetes Discovery Caveats and Interview Deep-Dive
A pod passes its health check but is actually degraded. How do you distinguish liveness from readiness, and why does it matter?
A pod passes its health check but is actually degraded. How do you distinguish liveness from readiness, and why does it matter?
- Define the two distinct semantics. Liveness: “is this process alive enough to be worth keeping?” If no, restart. Readiness: “is this pod ready to serve traffic right now?” If no, remove from endpoints but leave running.
- Different questions, different checks. Liveness checks internal state of the process: main event loop responsive, no deadlock, no OOM. Readiness checks external dependencies: database connection pool has capacity, downstream reachable, warm caches populated.
- Failure modes differ. Failing liveness restarts the pod, which is disruptive. Use liveness only for conditions where a restart actually helps (deadlock, stuck GC, corrupted in-memory state). Do not fail liveness because a downstream is slow — restarting your pod does not fix the downstream.
- Readiness toggles traffic routing. When a pod’s database pool exhausts temporarily, failing readiness removes it from endpoints; kube-proxy redirects traffic to other pods; the affected pod’s pool recovers and it rejoins. No restart, no data loss, fast recovery.
- Startup probes for slow-starting apps. Kubernetes 1.20+ added
startupProbefor containers that take a long time to initialize. Use it for JVM services with long warmup periods. The liveness probe does not activate until the startup probe passes. - Implement business-level health, not just HTTP 200.
/readyissues aSELECT 1against the DB, pings the cache, checks a downstream. Fails if any critical dependency is unreachable. This is what catches “healthy but degraded.”
/health while 40 percent of encoding requests failed. Root cause: /health returned 200 because the HTTP server was up. The real issue was a hung FFmpeg subprocess pool — new encode requests queued forever. Fix: /ready was rewritten to check “can I start a new encode within 2 seconds?” Pods with hung pools failed readiness, were removed from endpoints, and were auto-recycled by a separate cronjob that restarted pods not-ready for over 5 minutes.Senior Follow-up Questions:- “How aggressive should readiness checks be? What if the downstream is flaky?” Too aggressive and every downstream hiccup removes all pods from endpoints, cascading failure. Solution: check the dependency but count recent successes; only fail readiness if recent success rate drops below a threshold. Think of readiness as “I can probably serve” not “every dependency is perfect.”
- “What about checking readiness in the sidecar rather than the app?” Istio’s Pilot checks readiness via the sidecar’s probe endpoints. This means the sidecar can vote on readiness independently of the app (e.g., fail readiness if the sidecar cannot reach the mesh control plane). Trade-off: more moving parts, slightly more complex to debug.
-
“How do you handle migrations or warm-up work during startup?” Separate liveness (process alive) from readiness (service ready). Set readiness to false until warm-up completes. Kubernetes will leave the pod running but route no traffic to it. Once warm, readiness flips true and kube-proxy starts sending requests. For multi-stage startup, the
startupProbeis designed exactly for this.
- “Use the same endpoint for liveness and readiness; it is simpler.” Simpler until a slow downstream causes every pod to restart simultaneously because liveness is failing. Simpler is a false economy.
- “The app always returns 200 on /health if the server is up; everything else is orchestration’s problem.” Guarantees that degraded pods will stay in the load balancer. The whole point of Kubernetes health checks is that the app participates honestly in its own health reporting.
- Kubernetes documentation, “Configure Liveness, Readiness and Startup Probes.”
- Tim Hockin’s talks on Kubernetes networking and pod lifecycle.
- CNCF blog posts on health check anti-patterns.
Your gRPC service is deployed in Kubernetes. Traffic is unbalanced: one pod has 80 percent of requests. Why and how to fix it?
Your gRPC service is deployed in Kubernetes. Traffic is unbalanced: one pod has 80 percent of requests. Why and how to fix it?
- Diagnose the cause. gRPC uses HTTP/2, which multiplexes many requests over one long-lived TCP connection. Kubernetes L4 (iptables, IPVS) load balancing picks a target when the TCP connection is established, not per request. Once a client connects to pod A, every subsequent request over that connection also goes to pod A.
- Explain why this is a Kubernetes Services limitation. ClusterIP does L4 random or round-robin at connection time. For HTTP/1.1 with short connections, this works fine — each request effectively picks a new target. For HTTP/2 with persistent connections, it fails.
- Fix Option 1: client-side load balancing with a headless Service. Set
clusterIP: None. Client resolves the Service name and gets all pod IPs. gRPC’s built-inround_robinload balancer distributes requests across the pod connections. Your client needs to be gRPC-aware:grpc.WithDefaultServiceConfig('{"loadBalancingPolicy":"round_robin"}')in Go, similar options in Java and Python. - Fix Option 2: service mesh. Istio or Linkerd injects an Envoy / linkerd-proxy sidecar. The sidecar does L7 request-level load balancing across persistent connections. No client code changes. Cost: added complexity of running a mesh.
- Fix Option 3: L7 ingress. If traffic is inbound from outside the cluster, an L7 ingress (Contour, Ambassador, Emissary, ingress-nginx with HTTP/2 upstream) handles this. For pod-to-pod, this is less relevant.
- “Why does HTTP/1.1 not have this problem?” HTTP/1.1 clients often open a new connection per request (or close after a few), so each connection goes through the Kubernetes LB again, getting a new target. HTTP/2 keeps the connection open indefinitely; one connection = one target for its lifetime.
- “If I use client-side round-robin, what happens when a pod is added or removed?” gRPC’s resolver can re-resolve DNS on connection errors or periodically. On change, the client opens new subchannels and load-balances across the new set. There is still a brief window where the client has not seen the change; during that window, traffic may skew. Service mesh sidecars have lower propagation latency than DNS-based resolution.
- “What’s the overhead of a service mesh at high RPS?” Envoy sidecar adds roughly 0.5-2 ms of latency per hop in most measurements, plus ~50-150 MB of memory per pod. At thousands of RPS per pod, CPU impact is typically 5-15 percent. Significant but usually acceptable given what you gain. High-frequency trading or similar latency-critical workloads may find it unacceptable.
- “Open a new gRPC connection for every request.” Defeats the point of HTTP/2. TCP + TLS handshake overhead per request is 10-100 ms. Never do this.
- “Restart pods to reset connections.” Temporary fix that hides the problem. New connections during the restart period might re-balance, but as soon as traffic settles the same problem returns. Does not scale.
- Kubernetes Blog, “gRPC Load Balancing on Kubernetes.”
- gRPC documentation, “Load Balancing Policies.”
- Istio Blog, “How Istio Solves the Long-Lived Connection Load Balancing Problem.”
How does a service mesh sidecar affect latency and throughput at high RPS, and when is the cost not worth it?
How does a service mesh sidecar affect latency and throughput at high RPS, and when is the cost not worth it?
- Quantify the overhead. Envoy sidecar adds roughly 0.5-2 ms p50 latency per call, 1-5 ms p99. Memory: 50-150 MB per pod. CPU: 5-15 percent overhead on request-heavy workloads. Numbers vary by config (mTLS on/off, observability features, RBAC).
- Identify when the overhead is significant. For an internal call that takes 50 ms, a 1 ms sidecar hop is noise (2 percent). For a call that takes 2 ms, a 1 ms hop is 50 percent overhead — and you have two sidecars (source and destination) so 2 ms total. Latency-critical paths (HFT, real-time bidding, ad serving sub-10ms) feel the pain.
- Articulate the benefit to justify the cost. What you get: automatic mTLS, zero-code-change traffic shifting (canary, A/B), per-pod circuit breaking and outlier ejection, L7 load balancing for gRPC, observability (golden metrics per call) without library work, retry/timeout policies in config rather than code.
- Decide based on team size and operational maturity. Small teams: mesh operational cost exceeds the benefit. Large orgs with hundreds of services: mesh pays for itself by standardizing what would otherwise be dozens of ad-hoc retry/timeout/TLS implementations.
- Consider mesh alternatives for specific use cases. gRPC’s built-in load balancing + client-side retries + mTLS via cert-manager can cover much of the value without the sidecar. App-side circuit breakers exist in every language.
- “How do you decide when to add a mesh to a service vs. keep it out?” Rule of thumb: if sidecar overhead is under 10 percent of end-to-end latency, cost is acceptable. Below 10 ms of baseline latency, the math gets tight; you need to measure specifically.
- “What’s the difference between a sidecar mesh (Istio, Linkerd) and a sidecarless mesh (Cilium, Istio Ambient Mesh)?” Sidecar injects a proxy per pod; sidecarless runs at the node level (eBPF or a shared per-node proxy). Sidecarless reduces memory overhead significantly (no per-pod proxy) and often reduces latency. Trade-off: less per-pod isolation, node-level failure domain. Ambient Mesh and Cilium service mesh are the current evolution.
- “What if you only need mTLS and metrics, not the full mesh?” Use a lighter solution: SPIFFE/SPIRE for identity + mTLS, Prometheus exporters in-process for metrics. Many teams discover that their “mesh needs” are 20 percent of what Istio offers, and they can assemble a simpler stack.
- “Install Istio; it is the standard.” Skips the cost-benefit analysis. Plenty of teams have deployed Istio and regretted it because their workload did not need it and operational cost was high.
- “Mesh overhead is negligible at any scale.” Simply wrong for latency-critical paths and for teams at the scale where mesh control plane CPU / memory starts to become a noticeable budget line.
- Linkerd’s published performance benchmarks (they explicitly focus on low overhead).
- Istio Ambient Mesh announcement and architecture.
- Cilium Service Mesh documentation for the sidecarless / eBPF approach.
Service Mesh Discovery
For complex scenarios, use a service mesh like Istio or Linkerd.Comparison
| Aspect | Consul | DNS-Based | Kubernetes | Service Mesh |
|---|---|---|---|---|
| Setup | Medium | Simple | Built-in | Complex |
| Health Checks | Yes | Limited | Probes | Yes |
| Load Balancing | Client | DNS round-robin | kube-proxy | Envoy |
| Dynamic Updates | Real-time | TTL-based | Real-time | Real-time |
| Multi-cluster | Yes | Limited | With tools | Yes |
| Code Changes | Some | Minimal | Minimal | None |
Interview Questions
Q1: Explain client-side vs server-side discovery
Q1: Explain client-side vs server-side discovery
- Client queries registry directly
- Client implements load balancing logic
- More control, but more complexity per client
- Examples: Netflix Ribbon, custom implementations
- Client calls load balancer/router
- Router queries registry and forwards request
- Simpler clients, centralized control
- Examples: AWS ALB, Kubernetes Services, Nginx
- Client-side: Better for smart routing, custom logic
- Server-side: Simpler clients, potential bottleneck
Q2: How does Kubernetes service discovery work?
Q2: How does Kubernetes service discovery work?
- Service - Stable IP (ClusterIP) for pod group
- CoreDNS - Resolves service names to IPs
- kube-proxy - Maintains iptables/IPVS rules
- Endpoints - Tracks healthy pod IPs
- Pod calls
payment.default.svc.cluster.local - CoreDNS resolves to Service ClusterIP
- kube-proxy routes to healthy pod IP
- Readiness probes ensure only healthy pods receive traffic
clusterIP: None- DNS returns individual pod IPs
- For stateful apps or custom load balancing
Q3: What is service registration and how to handle failures?
Q3: What is service registration and how to handle failures?
- Self-registration - Service registers itself
- Third-party - Orchestrator registers service
- Sidecar - Proxy handles registration
- Health checks - Registry removes unhealthy instances
- TTL/Heartbeat - Auto-deregister on missed beats
- Graceful shutdown - Deregister before terminating
- Circuit breakers - Client-side protection
- Always implement health endpoints
- Use graceful shutdown hooks
- Cache discovery results with TTL
- Have fallback for registry unavailability
Summary
Key Takeaways
- Service discovery enables dynamic routing
- Client-side vs server-side trade-offs
- Consul provides full-featured discovery
- Kubernetes has built-in discovery
- Health checks ensure traffic goes to healthy instances
Next Steps
Interview Deep-Dive
'You deploy a new version of Payment Service. During the rolling update, some requests fail because they are routed to pods that are starting up but not yet ready. How do you prevent this?'
'You deploy a new version of Payment Service. During the rolling update, some requests fail because they are routed to pods that are starting up but not yet ready. How do you prevent this?'
/ready that verifies all dependencies.The second fix is the pre-stop lifecycle hook. When Kubernetes terminates a pod during rolling update, there is a race condition: the pod is removed from the Service endpoints, but in-flight requests and cached DNS might still route to it. I add a pre-stop hook that sleeps for 5-10 seconds, giving time for the endpoints update to propagate to all kube-proxy instances and ingress controllers. Only after that sleep does the pod begin shutting down.The third fix is graceful shutdown in the application. When the pod receives SIGTERM, it should stop accepting new requests, finish processing in-flight requests (up to a timeout), close database connections cleanly, and then exit. This is often missed in Node.js applications where process.on('SIGTERM', () => process.exit(0)) kills in-flight requests immediately.In Consul-based discovery, the equivalent problem is the deregister delay. When a service instance shuts down, it must deregister from Consul before stopping. If it crashes without deregistering, Consul relies on health check failures to remove it, which can take 30-60 seconds of failed requests.Follow-up: “How do you handle the cold start problem where a newly deployed instance is slower for its first few hundred requests?”This is the JVM warmup problem (or in Node.js, V8’s JIT compilation warmup). I use two strategies. First, I configure the rolling update to use a slow startup probe — Kubernetes 1.20+ supports startupProbe with a longer interval and more retries than the readiness probe, giving the container time to warm up. Second, I add a warmup step to the container startup: before marking itself as ready, the service sends synthetic traffic to itself (a few hundred representative requests) to warm up connection pools, JIT-compile hot paths, and populate any in-process caches. Only after warmup completes does the readiness probe return healthy.'Compare client-side and server-side service discovery. When would you choose one over the other in a Kubernetes environment?'
'Compare client-side and server-side service discovery. When would you choose one over the other in a Kubernetes environment?'
http://payment-service and Kubernetes handles the rest. Zero client-side complexity.I would switch to client-side discovery in three specific scenarios. First, when you need smarter load balancing than round-robin. Kubernetes Services do basic round-robin (or random, depending on kube-proxy mode), but if you need least-connections, latency-based routing, or weighted distribution, you need client-side logic. A headless Kubernetes Service (clusterIP: None) returns all pod IPs via DNS, and your client library implements the routing algorithm.Second, when you are using gRPC. Because gRPC multiplexes many requests over a single long-lived HTTP/2 connection, Kubernetes L4 load balancing sends all requests from one client to one pod. The other pods sit idle. You need either client-side load balancing (gRPC’s built-in resolver with round-robin pick-first policy) or a service mesh (Envoy in Istio does L7 gRPC-aware load balancing).Third, when you need cross-cluster service discovery. If Payment Service runs in cluster A and Order Service runs in cluster B, Kubernetes Services do not span clusters natively. You need a service mesh (Istio multi-cluster) or an external registry (Consul) that spans both clusters, with client-side resolution.For most microservices teams running in a single Kubernetes cluster with REST APIs, server-side discovery via Kubernetes Services is sufficient and dramatically simpler. Do not adopt Consul or custom client-side discovery unless you have a specific requirement that Kubernetes-native discovery cannot meet.Follow-up: “What happens if CoreDNS goes down in your Kubernetes cluster?”CoreDNS is a critical component — if it goes down, no new DNS resolutions work, meaning services cannot discover each other for new connections. However, existing TCP connections continue to function, and kube-proxy’s iptables rules (which are independent of DNS) still route traffic to the ClusterIP. The practical impact depends on how frequently your services open new connections. With HTTP keep-alive and connection pooling, most services continue functioning for minutes. But any new pods (scaling up, deployments) will not be reachable. CoreDNS runs as a Deployment with multiple replicas, so a single pod failure is handled by the remaining replicas. A full CoreDNS outage is a cluster-level incident that requires immediate response.'How do you handle service discovery during a blue-green deployment where both the old and new versions need to coexist temporarily?'
'How do you handle service discovery during a blue-green deployment where both the old and new versions need to coexist temporarily?'
app: payment, color: blue. When I deploy the green version, it runs alongside blue but the Service does not route to it because the label does not match. After green passes health checks and smoke tests, I update the Service selector to color: green. Traffic shifts atomically. If green has issues, I revert the selector to color: blue in seconds.For Consul-based discovery, I register both versions with different tags: payment-blue and payment-green. The service client or API gateway queries for the active tag, which is stored as a configuration value in Consul’s KV store. Switching production traffic means updating one KV value, which all clients pick up within their next watch cycle (typically under a second).The tricky part is database schema compatibility. Both blue and green versions must work with the same database schema, which means database migrations must be backward-compatible. I use the expand-and-contract pattern: the green deployment adds new columns/tables but does not remove old ones. After green is confirmed stable and blue is decommissioned, a follow-up migration removes the deprecated columns.Follow-up: “How do you test the green environment with real production traffic before switching over completely?”I use canary testing within the blue-green framework. Instead of switching 100% of traffic at once, I configure the load balancer or service mesh to send 5% of traffic to green while blue handles 95%. I monitor error rates, latency percentiles, and business metrics (conversion rate, payment success rate) for both versions. If green’s metrics are comparable to blue’s after 15-30 minutes, I increase to 25%, then 50%, then 100%. If any metric degrades, I route 100% back to blue. Istio’s VirtualService makes this trivial with weighted routing rules.