Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Synchronous Communication
When services need immediate responses, synchronous communication is the way to go. This chapter covers REST and gRPC — the two dominant patterns. Think of synchronous communication like a phone call: you dial, the other party picks up, you exchange information, and you both hang up. It is simple and immediate, but the caller is blocked until the conversation finishes. If the other party is slow to answer or puts you on hold, you are stuck waiting. This blocking nature is both the strength (you get an immediate answer) and the weakness (you inherit the other party’s latency and availability problems) of synchronous patterns. The fundamental question you must always ask before choosing synchronous communication is: “Does the caller genuinely need the response right now, or am I coupling two services tighter than necessary?” Every synchronous call creates a runtime dependency — if the downstream service is down, your service is effectively down too. Every synchronous call chain multiplies failure probability. Three services each at 99.9% availability, called in sequence, give you 99.7% — roughly triple the downtime. This is why senior engineers reach for async patterns whenever the business semantics allow, and reserve sync for cases where the user is literally waiting on a screen for confirmation.- Design effective REST APIs for microservices
- Implement gRPC for high-performance communication
- Choose between REST and gRPC for different scenarios
- Handle failures in synchronous communication
REST API Design
RESTful Principles for Microservices
Before jumping into code, let us be clear about what “RESTful” actually means in the microservices context. REST is not merely “HTTP endpoints that return JSON” — it is a disciplined architectural style built around resources (nouns) and a uniform interface (the HTTP verbs). The motivation is simple: if every service in your organization uses the same conventions, new engineers can onboard quickly, tooling (API gateways, documentation generators, test clients) works across all services, and the cognitive load of moving between services stays low. When teams reinvent their own conventions per service, the cost compounds linearly with the number of services. The alternative — RPC-style HTTP endpoints like/getUserById?id=123 or /createNewUser — works technically, but it leaks implementation details, resists caching (since URLs do not represent stable resources), and forces clients to learn a new vocabulary for every service. RESTful resource modeling gives you HTTP-level caching for free, plays nicely with proxies and CDNs, and turns your URL space into a self-documenting map of your domain.
A key tradeoff to be aware of: strict REST purism (HATEOAS, fully discoverable APIs) is rare in practice because most internal services are consumed by known clients, not arbitrary crawlers. Most teams land on “pragmatic REST” — resource URLs, proper verbs, meaningful status codes — without going full HATEOAS. The _links pattern you will see below is a lightweight compromise.
- Node.js
- Python
HTTP Status Codes
Status codes are the first thing clients inspect — often before parsing the body at all. Proxies, load balancers, monitoring systems, and retry libraries all make decisions purely on the status code. This means using the wrong code (for example, returning 200 with{"error": "..."} in the body) silently breaks every retry library, every metric dashboard, and every alerting rule. Getting status codes right is not a stylistic choice; it is a correctness issue.
The broad rule is: 2xx means success, 4xx means “you did something wrong” (do not retry without changes), 5xx means “I did something wrong” (retry might help). Inside those ranges, be precise. A 404 means the resource genuinely does not exist; a 403 means it exists but you cannot access it; a 401 means you did not authenticate at all. Mixing these up leaks information (400 instead of 404 tells an attacker a record exists) and confuses clients. The error body format should also be consistent across your entire organization — when every service returns errors in the same shape, client-side error handling becomes trivial.
- Node.js
- Python
API Versioning
Versioning is the single most underestimated concern in microservices REST design. It sounds boring, but every major API outage I have witnessed involved someone shipping a “small backward-incompatible change” to a v1 endpoint that was still consumed by forgotten clients. The motivation is simple: once an endpoint is public — and “public” in microservices means “called by any other service” — you cannot change its contract without risking breakage somewhere. There are three mainstream approaches. URL path versioning (/v1/users) is the most explicit and debuggable; you can see the version in every log line and curl command, and it is trivial to route different versions to different codebases via the API gateway. Header versioning (Accept: application/vnd.company.v2+json) keeps the URL stable, which some argue is “more REST,” but it makes debugging harder because the version is invisible in URLs. Query parameter versioning (?version=2) exists but is generally considered an anti-pattern — it invites clients to forget the parameter and silently fall back to v1. My strong recommendation: URL path versioning for internal microservices. The debugging benefits dwarf the theoretical purity gains.
The deeper insight is that versioning is a governance problem, not a technical one. Supporting v1 and v2 simultaneously is easy; deciding when to deprecate v1 and actually migrating every consumer is hard. Build deprecation warnings (e.g., a Deprecation header) into your v1 responses from day one, and instrument which clients still call v1 so you know when it is safe to remove.
- Node.js
- Python
Your team is shipping a new internal service. A staff engineer wants gRPC; a product engineer wants REST. How do you decide, and what does the contract look like on day one?
Your team is shipping a new internal service. A staff engineer wants gRPC; a product engineer wants REST. How do you decide, and what does the contract look like on day one?
- Start with consumers. If the service is called by browsers, third parties, or low-volume internal tools, REST wins on debuggability and ecosystem. If it is a hot path between backend services with millions of calls per minute, gRPC wins on payload size and type safety.
- Evaluate tooling readiness. gRPC needs a proto registry, CI-enforced backward compatibility, L7 load balancing, and
grpcurl-style tooling. If none of that exists, the first gRPC service will pay the cost of building it. - Lock the contract before the first line of code. Either OpenAPI with contract tests, or a
.protofile withbuf breakingchecks in CI. Contract drift is what kills both REST and gRPC integrations. - Version from day one:
/v1/URL prefix orpackage user.v1in proto. Design the first endpoint as if you already have three v2 proposals. - Document the failure model in the contract itself: which error codes mean “retry,” which mean “do not retry,” and what the error body shape is.
curl endpoints and paste examples into Slack; internally, they still use RPC-style service-to-service calls. Meanwhile, Google’s internal services have been gRPC-native since Stubby (the precursor to gRPC) in the mid-2000s, because the call volume between services justifies the tooling investment.Senior Follow-up Questions:GraphQL make more sense than either REST or gRPC?”When you have many clients with widely different field needs and over-fetching is your primary latency problem, typically at the BFF layer for mobile apps with constrained bandwidth. GraphQL is a bad fit for internal service-to-service calls because the resolver model hides fan-out, and because gRPC’s strict contracts are a better fit for machine-to-machine communication.- “gRPC is always faster, so we should use it everywhere.” Fails because the wins only materialize above a certain call volume, and debuggability matters more than raw throughput for 80% of endpoints.
- “REST is simpler, so let us just use REST.” Fails because “simple REST” without versioning, contract tests, or error conventions is usually just undocumented HTTP endpoints that will break on the first schema change.
- Martin Fowler’s “Richardson Maturity Model” essay for understanding what REST actually requires.
buf.builddocumentation on proto contract management at scale.- Phil Sturgeon’s “Build APIs You Won’t Hate” for pragmatic REST design.
You inherit a service with 40 REST endpoints, no versioning, inconsistent error shapes, and five consumers. How do you stabilize the contract without breaking anyone?
You inherit a service with 40 REST endpoints, no versioning, inconsistent error shapes, and five consumers. How do you stabilize the contract without breaking anyone?
- Inventory every consumer. Logs, API gateway traffic, and OAuth client IDs tell you who is calling what. You cannot stabilize a contract you do not know the shape of.
- Freeze the existing behavior. Snapshot current responses under contract tests so any accidental change is caught before it ships.
- Introduce
/v1/as an alias for the existing endpoints. Both paths return identical responses. This creates a stable anchor for future migration. - Ship an
/v2/that fixes the error shape, adds proper status codes, and tightens the schema. Announce a deprecation date for v1 with aDeprecationheader and per-client telemetry. - Migrate consumers one at a time, using the per-client telemetry to confirm zero v1 traffic before removal.
Deprecation header actually do?”It is advisory: RFC 8594 specifies a Deprecation response header that clients can surface in logs or monitoring. It does not enforce anything by itself; its value is that client teams can alert on it and prioritize migration. Pair it with a sunset date in the Sunset header.Follow-up 2: “How do you detect which consumers are still using v1 if they do not identify themselves?”Require authentication on every endpoint, tag requests by client identity at the gateway, and emit one metric per (version, client) pair. If a caller is anonymous, that is a separate problem worth solving first.Follow-up 3: “What if a consumer refuses to migrate?”Escalate to the business owner with cost data: keeping v1 running imposes an ongoing maintenance tax and blocks features that depend on v2 semantics. If the consumer is external, a hard cutover date with weeks of advance warning is fair; if internal, the platform team usually owns the migration for laggards.- “Just add
/v2/and email everyone.” Fails because consumers never read migration emails, and you have no way to confirm the cutover is safe. - “Rewrite the whole service and cut over in one weekend.” Fails because a big-bang migration has no rollback path and no way to catch consumer-specific regressions.
- RFC 8594 (the
SunsetHTTP header) and RFC 9745 (theDeprecationheader). - Stripe’s engineering blog on “API versioning at Stripe.”
- Jessie Frazelle’s “Versioning APIs” talks from Velocity.
Service-to-Service HTTP Clients
Building a Robust HTTP Client
A naive HTTP client — one that just callsfetch() or requests.get() — is a production incident waiting to happen. In microservices, your callers face three reliability problems that single-process applications never encounter: transient network failures (packet loss, DNS hiccups, TCP resets), cascading failures (if Service B is slow, Service A’s requests pile up and exhaust its own resources), and thundering herd problems (when a downstream recovers, all upstream callers retry simultaneously and kill it again). A robust client addresses each of these explicitly.
The three patterns you always want: timeouts ensure one slow downstream does not consume your request budget; circuit breakers stop calling a service that is obviously broken, giving it room to recover; retries with backoff and jitter handle transient failures without amplifying load. Without these, a single downstream hiccup can cascade into a full outage in minutes. With them, the same hiccup degrades gracefully to “that one feature is temporarily unavailable.”
The tradeoff to be aware of: every retry amplifies load. If your retry policy is “3 retries on any 5xx,” a downstream that fails 100% of requests sees 4x its normal traffic from you alone. In a chain of four services each retrying three times, one failure at the bottom produces 81x the original traffic. This is why retry budgets (maximum retry percentage across all calls, not per-call) and circuit breakers are essential in depth.
- Node.js
- Python
User Service Client Example
The pattern below — a service-specific client class that wraps the genericServiceClient — is what production codebases actually look like. You do not want every caller in your codebase constructing raw HTTP requests with hardcoded URLs; that creates dozens of places to update when the downstream service moves or changes its API. A single UserServiceClient class centralizes knowledge about the user service: its URL, its endpoints, its idempotency semantics, its retry policy. If the user service moves from REST to gRPC in v2, you change one class and every caller benefits.
Notice how getUser translates a 404 into null (not every caller of a “find” function wants to handle a not-found exception), while createUser explicitly retries on specific status codes. This is the kind of domain-aware logic that does not belong in a generic HTTP client — a 404 from the auth service might mean something very different than a 404 from the user service. Service-specific wrappers are where you encode these semantics cleanly.
- Node.js
- Python
Your checkout service calls six downstream services synchronously. p99 latency is 4 seconds during peak traffic and rising. What do you do -- both in the next 24 hours and in the next quarter?
Your checkout service calls six downstream services synchronously. p99 latency is 4 seconds during peak traffic and rising. What do you do -- both in the next 24 hours and in the next quarter?
- Instrument first. Open the distributed trace for a p99 request; you will almost always find one downstream dominates the budget, not six equally. Identify whether the tail is one service, one dependency (a database, a cache, a third party), or a specific query pattern.
- Tighten timeouts and add per-downstream circuit breakers in the next deploy. The goal is to fail fast on the slow path rather than letting it consume the whole budget.
- Parallelize independent reads. Fraud check, shipping estimate, tax calc, and inventory lookup rarely depend on each other.
Promise.allorasyncio.gatherdrops latency from sum to max. - Move non-critical work off the sync path. Analytics, email, and audit logging can be fire-and-forget via a queue. Fraud scoring can often run async with a default-allow and post-hoc reversal.
- For the next quarter, redesign the API to return
202 Acceptedwith an order ID, and finish the work in a saga. The user sees “Order placed, payment processing” within 200ms and the heavy lifting happens in the background.
- “Add more retries so the flaky calls succeed.” Fails because it amplifies load on an already stressed downstream, often turning a brownout into a blackout.
- “Cache everything.” Fails for checkout because inventory and pricing are the things you cannot stale. Cache solves read-heavy, tolerant-of-staleness data; checkout is neither.
- Marc Brooker’s “Timeouts, retries and backoff with jitter” on the AWS architecture blog.
- Michael Nygard’s Release It! chapters on Stability Patterns and Capacity Patterns.
- Google SRE book, chapter on “Handling Overload.”
Your team ships a REST client library used by 15 services. A junior engineer adds a retry-on-any-5xx policy. Two weeks later, a minor downstream blip causes a full outage. Walk through the postmortem.
Your team ships a REST client library used by 15 services. A junior engineer adds a retry-on-any-5xx policy. Two weeks later, a minor downstream blip causes a full outage. Walk through the postmortem.
- Describe the amplification. If 15 services each retry three times on 5xx, the downstream sees 4x traffic. If each of those 15 services has its own upstream that also retries, multiply again. In deep call graphs, load amplification is exponential in chain depth.
- Identify the missing guardrails: no retry budget, no jitter on backoff, no circuit breaker, no differentiation between retryable (timeouts, 503) and non-retryable (400, 404, 409) errors.
- Fix in layers: introduce token-bucket retry budgets that cap retry traffic at around 10% of base traffic, add full-jitter backoff, circuit-break at the pool level, and only retry on specific error codes.
- Bake guardrails into the shared library so no individual service can opt out accidentally. Expose metrics per (caller, callee, error class) so the next amplification is visible before it is catastrophic.
- Write a runbook: when metrics show retry ratio over 20%, oncall turns the retry budget down globally and investigates before turning it back up.
Retry-After), 500, 502, 503, 504. Never retry on 400, 401, 403, 404, 409, 410, 422 — the server has told you the request is wrong, not that it failed transiently. For idempotent methods only, retry on connection errors; for POST without an idempotency key, do not.Follow-up 2: “How does Retry-After interact with exponential backoff?”Retry-After wins. The server is explicitly telling you when to try again; ignoring it is what causes thundering herds on recovery. If the header is not present, fall back to exponential backoff with full jitter.Follow-up 3: “What is a retry budget, concretely?”A token-bucket per (caller, callee) where each retry consumes a token and tokens refill at, say, 10% of the success rate. When the bucket is empty, retries are disabled entirely until healthy traffic refills it. Envoy and gRPC both support this natively via retry_budget config.- “The downstream should scale up to handle retries.” Fails because retries during a brownout are the wrong signal; the caller should back off, not push harder.
- “Turn off retries entirely.” Fails because transient failures do exist, and without retries your error rate on flaky networks is worse than necessary.
- “Exponential Backoff and Jitter” on the AWS architecture blog.
- Envoy documentation on retry policies and retry budgets.
- Google SRE workbook, chapter on “Addressing Cascading Failures.”
gRPC Communication
gRPC offers high-performance, type-safe communication between services. Before diving into.proto files, let us establish what gRPC actually gives you that REST does not. gRPC is built on three ideas: contracts defined in .proto files (not in English prose or OpenAPI YAML), binary serialization via Protocol Buffers (smaller and faster than JSON), and HTTP/2 as the transport (multiplexing, streaming, header compression). The combined effect is typically 3-7x smaller payloads and 5-10x higher throughput for small, high-frequency RPCs. The type safety is equally important: your client and server are generated from the same .proto file, so a field name mismatch is a compile-time error, not a production incident.
The tradeoff is debuggability and ecosystem reach. You cannot curl a gRPC endpoint; you need grpcurl or similar. Browsers cannot speak gRPC natively without a proxy (gRPC-Web). CDNs and standard HTTP caches do not understand it. The upshot: gRPC is ideal for internal, high-volume, polyglot service-to-service communication where the performance wins are real. It is a poor fit for public APIs, browser clients, or any endpoint you want casual users to be able to inspect with standard tools.
A pattern that many mature organizations converge on: REST at the edge (public APIs, mobile, web), gRPC internally (service-to-service). This gives you the debuggability and broad compatibility of REST where those matter, and the performance and type safety of gRPC where high call volumes dominate.
Protocol Buffers Definition
The.proto file is the canonical source of truth for your service contract. Everything else — client stubs, server stubs, documentation, even test fixtures — is derived from this file. This is a much stronger contract than “we document our REST API in Confluence”; the compiler enforces it. When you change a .proto file, both client and server know immediately.
Key discipline: never reuse field numbers, never change field types, and always add new fields as optional. Field numbers are permanent — they are what gets serialized on the wire. If you rename field number 3 from email to email_address, old clients still send email as field 3 and your server silently reads it as email_address. Protobuf is designed for backward compatibility, but only if you follow the rules rigorously.
gRPC Server Implementation
gRPC servers differ from REST servers in a few subtle ways that matter in production. First, error handling is code-based, not status-code-based: you returngrpc.status.NOT_FOUND from the callback, not an HTTP 404. Second, streaming is first-class — you write to a stream object and call end() when done, rather than constructing a full response body in memory. Third, the server is typically stateful in a way REST servers are not: connections are long-lived HTTP/2 streams, so server shutdown needs care (drain in-flight RPCs before closing).
The four RPC types — unary, server-streaming, client-streaming, bidirectional-streaming — are not equivalent alternatives; each fits a different problem. Unary is for “give me one answer” (most CRUD). Server-streaming fits “give me all matching rows” without loading them into memory (pagination at scale). Client-streaming fits “I have a batch of inputs, tell me the result” (batch writes, telemetry). Bidirectional is for real-time interaction (chat, trading feeds). Choosing the wrong type — e.g., using unary with a huge array instead of server-streaming — can cost you orders of magnitude in memory.
- Node.js
- Python
gRPC Client Implementation
On the client side, the most common footgun is failing to reuse the channel. A gRPC channel manages the HTTP/2 connection pool to the server; creating a new channel per request defeats all the connection-reuse benefits and adds handshake latency to every call. The right pattern is to construct the channel once (per target address) at application startup, share it across the application, and close it only on shutdown. Thekeepalive parameters you see below are what keep the connection healthy across idle periods — without them, load balancers often terminate idle HTTP/2 connections silently, causing the next request to fail with a cryptic “connection reset.”
Error handling deserves careful attention. gRPC uses status codes (NOT_FOUND, ALREADY_EXISTS, DEADLINE_EXCEEDED) that map loosely — but not identically — to HTTP status codes. Translating them into your existing error taxonomy at the client boundary means your calling code does not have to learn two parallel vocabularies.
- Node.js
- Python
Your team has 30 services using gRPC. A senior engineer proposes renaming a field in the User proto 'because the name is confusing.' What do you say in code review, and what is your governance model?
Your team has 30 services using gRPC. A senior engineer proposes renaming a field in the User proto 'because the name is confusing.' What do you say in code review, and what is your governance model?
- Clarify whether this is a wire-breaking change. Renaming a field with the same number is technically wire-compatible in proto3, but it breaks source code that references the old name. If the number also changes, it is wire-breaking and not acceptable in place.
- Block the PR if
buf breakingwould flag it. Even source-only breaks cause churn across 30 service repos and deploy pipelines. - Propose the additive pattern: add the new name as a new field, deprecate the old one, migrate consumers over a release cycle, then remove. This is slower but eliminates coordination risk.
- Articulate the governance model: a proto registry (Buf Schema Registry or a Git monorepo), CI breaking-change detection, a deprecation policy with a minimum bake time, and a paved-path for major version bumps.
- If this pattern comes up often, propose an RFC process for proto changes with explicit owner sign-off on breaking migrations.
tag,wire_type,value triples. However, JSON transcoding via grpc-gateway does care about names, so if you expose a REST interface, a rename is breaking there.Follow-up 2: “How do you handle a truly breaking change like changing a field type?”Version the service: user.v1.UserService stays stable, user.v2.UserService introduces the new type. Clients migrate on their own schedule. You can run both services behind a single binary if the code path is small.Follow-up 3: “What goes into a proto registry?”A registry enforces ownership (who can change user.proto), version history (what the proto looked like at every release), and consumer tracking (who depends on which version). Buf Schema Registry gives you all three plus auto-generated clients for Go, Python, Node, Java, and others.- “It is just a name change, ship it.” Fails because downstream codegen breaks and the coordination cost across 30 services is enormous.
- “Use a global rename script to update all consumers at once.” Fails because not every consumer is under your control, and atomic-across-all-services is not achievable in a real deploy pipeline.
- Google’s “Proto Best Practices” documentation.
- Buf Schema Registry documentation and the
buf breakingrule set. - Uber Engineering’s blog on gRPC migration and contract governance.
You deploy gRPC with AWS NLB in front of a Kubernetes service with 20 pods. One pod is getting 80% of traffic. Why, and how do you fix it?
You deploy gRPC with AWS NLB in front of a Kubernetes service with 20 pods. One pod is getting 80% of traffic. Why, and how do you fix it?
- Diagnose: gRPC uses long-lived HTTP/2 connections. L4 load balancers (NLB, classic ELB) balance connections, not requests. A client that opens one connection sends all its requests to one pod until the connection dies.
- Short-term mitigation: enable connection max-age or max-requests on the server so connections naturally rotate.
MaxConnectionAgein gRPC-Go, ormax_connection_durationin Envoy. - Real fix: move to L7 load balancing. Either Envoy as a sidecar (Istio, Linkerd) or client-side load balancing via gRPC’s built-in round-robin resolver with a headless Kubernetes service. AWS ALB supports HTTP/2 but still balances per-connection, so it is not a full fix.
- Verify with metrics: request count per pod should be within 5-10% of the mean. If the distribution is still skewed, check client connection pooling (one channel per client is fine; one channel shared across all clients is not).
- Document the pattern in your service template so new gRPC services do not repeat the mistake.
clusterIP: None) skips the kube-proxy load balancer and returns the full list of pod IPs via DNS. gRPC’s DNS resolver then opens a connection to each pod and round-robins across them on the client side. This works well for small-to-medium fleets; above a few hundred pods, the connection count gets expensive and you want a mesh instead.Follow-up 3: “When is connection-level load balancing acceptable?”When you have many more clients than server pods and each client has short-lived connections. For REST traffic, this is often true and L4 works fine. For gRPC with long-lived connections and few high-volume clients, it is almost never true.- “Scale up the hot pod.” Fails because the hot pod is a symptom; the load imbalance continues and scaling just masks it.
- “Switch to ALB.” Fails because ALB still balances per-connection for HTTP/2; it is better than NLB for HTTP/1.1 but does not solve the gRPC case.
- Linkerd’s “gRPC load balancing on Kubernetes without tears” blog post.
- Kubernetes documentation on headless services.
- Envoy’s documentation on load balancing policies.
REST vs gRPC Comparison
When to Use What
The real trade-off is debuggability and ecosystem (REST) vs. performance and type safety (gRPC). Many teams use both: REST at the edge (public APIs, webhooks) and gRPC internally (service-to-service). This is the pattern Netflix, Google, and Uber converged on independently, which is a strong signal that it works. Production pitfall: Do not adopt gRPC purely for performance gains if your bottleneck is database queries. If your service spends 200ms in Postgres and 2ms serializing JSON, switching to gRPC saves you 1.5ms — a rounding error. Profile first, then decide.- Use REST
- Use gRPC
- Public APIs consumed by browsers
- Third-party integrations
- Simple CRUD operations
- When human readability matters
- Mobile apps (better tooling)
- When HTTP caching is valuable
- Customer-facing API
- Webhook integrations
- Admin dashboards
- Simple service communication
Handling Failures
Timeout Configuration
Timeouts are the single most important reliability setting in any distributed system, and they are also the one most teams get wrong. The default behavior of most HTTP libraries is “no timeout” or “5 minutes” — both of which are catastrophic in microservices. Without a proper timeout, a slow downstream does not just make one request slow; it occupies a worker thread, a database connection, and memory for the entire wait. Under load, a downstream that goes from 200ms to 30 seconds does not just “slow down your service” — it removes your service from production because every worker is stuck waiting. The conceptual model is timeout budgets. If your user-facing API has a 3-second SLA, and you call three downstream services sequentially, each downstream gets at most 1 second. If one downstream takes 2 seconds, the others need to fit in 1 second combined. This means downstream timeouts must be strictly less than upstream timeouts, with room for network overhead. A timeout hierarchy like “API Gateway: 10s, Service A: 9s, Service B: 8s” is far better than “everyone has 30s” — when something goes wrong, it fails at the right layer with a clear attribution. The anti-pattern to avoid: one global timeout for every downstream call. A cache lookup should timeout at 100ms, but a payment processor might legitimately need 15 seconds. Timeouts should be calibrated per operation, based on the p99 latency of that operation under normal load.- Node.js
- Python
Retry Strategies
Retries feel like a no-brainer, but they are a double-edged sword: they improve the success rate of any individual request at the cost of multiplying load during outages. The math is unforgiving. If every caller retries 3 times on failure, a downstream at 50% success rate sees 4x its normal traffic — exactly when it is already overwhelmed. Compound this across three layers of a call chain and you can 64x the load on the struggling service. The two non-negotiable rules: only retry idempotent operations (GET is safe; POST without an idempotency key is not) and always include jitter in your backoff. Without jitter, 1000 clients that all fail at the same moment will all retry at exactly 200ms, then 400ms, then 800ms — a synchronized thundering herd that crushes the recovering downstream. With jitter, retries spread across time and the service can recover gracefully. Exponential backoff (doubling the delay each attempt) is the default because it is biased toward giving the downstream breathing room. Linear backoff is appropriate for rate-limited operations where the downstream has told you the exact retry-after time. Immediate retry is almost never right unless you know the first failure was a truly transient network glitch (and even then, jittered exponential is safer).- Node.js
- Python
Fallback Strategies
When a downstream fails, the naive response is to propagate the failure upward: “User service is down, so I return 503 to my caller.” This is sometimes the right answer, but it is usually lazy. The better question is: does my caller actually need fresh data from the user service to make progress, or can I degrade gracefully? Graceful degradation — returning partial, stale, or default data instead of failing — is often the difference between a minor incident and a company-wide outage. The canonical fallback hierarchy, from best to worst, is: primary service -> cache -> stale cache (beyond its normal TTL) -> partial data from a different source -> hardcoded defaults -> failure. Each step down is worse data but better availability. The tradeoff: stale data can cause wrong business outcomes, so fallbacks must be conscious choices, not silent defaults. Always mark fallback responses (e.g., with_fromCache: true) so the caller knows to treat them differently, and log fallback usage so you know when you are silently operating in degraded mode.
A subtle pattern worth knowing: “cache-aside with stale-while-revalidate.” The primary call succeeds -> populate the cache with a 1-hour TTL. The primary call fails -> return cached data even if older than the TTL, while kicking off a background refresh. This gives you the latest data when possible and the freshest-available data when the primary is unavailable.
- Node.js
- Python
Request Correlation & Tracing
Distributed systems are impossible to debug without correlation. A single user action — “place order” — can fan out to a dozen service calls, database queries, and async events. If each log line stands alone, you cannot answer basic questions like “what happened with order X?” without scanning every service’s logs and manually correlating timestamps. A correlation ID threads a single UUID through every service involved in a request, so one search across your log aggregator returns the full story. The implementation relies on context propagation: generate the ID at the edge (API gateway), put it in a header (X-Correlation-ID), and propagate it in every downstream call. On the server side, pull the ID out of the incoming request and attach it to every log line automatically. The code below uses Node’s async_hooks to thread the context through async calls without plumbing it through every function signature; in Python, contextvars does the same job more cleanly.
A critical subtlety: correlation context must also cross async message queues. If your service publishes a Kafka event in response to an HTTP request, the event handler (possibly in another service, possibly running minutes later) should inherit the same correlation ID. Without that, the async side of your architecture becomes a debugging black hole.
- Node.js
- Python
A downstream service you call is in a slow brownout: p99 jumped from 200ms to 4 seconds. Your circuit breaker has not tripped because errors are under 10%. What is happening and what do you do?
A downstream service you call is in a slow brownout: p99 jumped from 200ms to 4 seconds. Your circuit breaker has not tripped because errors are under 10%. What is happening and what do you do?
- Recognize the pattern: slow is worse than broken. An error-rate circuit breaker misses latency brownouts completely, so worker threads pile up waiting on slow responses while the breaker stays closed.
- Add a latency-based breaker: trip when p99 exceeds, say, 2x the normal baseline for 30 seconds. Resilience4j, Polly, and Envoy all support this.
- Drop the per-call timeout below the p99 of the brownout. If normal p99 is 200ms and brownout p99 is 4 seconds, setting the timeout to 500ms means you fail fast on brownouts and succeed on normal traffic.
- Consider hedged requests for idempotent reads: send the request to a second instance after, say, 300ms if the first has not returned. This trades slightly more load for dramatically better tail latency.
- Post-brownout, look at root cause (GC, noisy neighbor, upstream dependency of theirs) and push for a fix with the owning team.
- “The breaker will eventually trip when errors climb.” Fails because during a pure latency brownout, errors never climb — requests just queue up forever.
- “Increase our worker pool to handle the slower responses.” Fails because you are burning resources to wait on a broken dependency instead of failing fast.
- Resilience4j documentation on slow call rate thresholds.
- “The Tail at Scale” paper by Dean and Barroso (Google, 2013) on hedged requests.
- Netflix Tech Blog on Hystrix and its retirement in favor of Resilience4j.
Your service calls an idempotent read endpoint. The network times out after 3 seconds. Your retry succeeds. Later, you find out both requests actually reached the server. What went wrong and what is the fix?
Your service calls an idempotent read endpoint. The network times out after 3 seconds. Your retry succeeds. Later, you find out both requests actually reached the server. What went wrong and what is the fix?
- Diagnose: the timeout fired before the server’s response arrived, but the server completed the request. This is normal — timeouts are about the caller giving up, not about the server being aware.
- For reads, this is usually benign, but it produces duplicate load and doubles metrics. For writes, it is a correctness problem.
- The fix for writes: idempotency keys. The caller generates a UUID, sends it in a header (
Idempotency-Key: ...), and the server stores the result keyed by (client_id, idempotency_key). Subsequent retries return the same result instead of re-executing. - For reads, accept the duplicate load but reduce timeout or add hedging to reduce the probability.
- Document the pattern and make it the default in the HTTP client library so no individual service has to remember.
- “Only retry on connection errors, not timeouts.” Fails because timeouts are the most common retry trigger and this leaves you with no retry safety on the most important case.
- “Use transaction semantics on the server to detect duplicates.” Fails because the server cannot distinguish “retry of same operation” from “new operation with same data” without a client-supplied token.
- Stripe’s blog post “Designing robust and predictable APIs with idempotency.”
- RFC 7231 on idempotency semantics.
- Adrian Cockcroft’s talks on retry amplification.
Interview Questions
Q1: How do you handle partial failures in synchronous communication?
Q1: How do you handle partial failures in synchronous communication?
- Circuit Breaker: Stop calling failing services temporarily
- Timeouts: Fail fast rather than waiting forever
- Retries with Backoff: Retry transient failures with exponential backoff
- Fallbacks: Return cached/default data when service is down
- Graceful Degradation: Partial functionality is better than complete failure
Q2: When would you choose gRPC over REST?
Q2: When would you choose gRPC over REST?
- Internal service-to-service communication
- Performance is critical (10x+ improvement)
- Need streaming (bidirectional)
- Strict API contracts required
- Polyglot services (auto-generated clients)
- Public-facing APIs
- Browser clients (without gRPC-Web)
- Simple CRUD operations
- Need HTTP caching
- Debugging simplicity matters
Q3: How do you version APIs in microservices?
Q3: How do you version APIs in microservices?
- URL Path:
/api/v1/users(most common) - Query Parameter:
/api/users?version=1 - Header:
Accept: application/vnd.api.v1+json - Content Negotiation: Custom media types
- Support 2-3 versions simultaneously
- Deprecate gracefully with warnings
- Document migration paths
- Use semantic versioning
- Breaking changes = major version
Q4: How do you implement request tracing across services?
Q4: How do you implement request tracing across services?
- Correlation ID: Unique ID that flows through all services for one request
- Request ID: Unique per service call (for individual tracing)
- Propagation: Headers like
X-Correlation-ID,X-Request-ID - Distributed Tracing: Tools like Jaeger, Zipkin, or OpenTelemetry
X-Correlation-ID: Same for entire request chainX-Request-ID: Unique per hopX-Parent-Request-ID: Previous service’s request IDX-B3-TraceId: Zipkin trace ID
Summary
Key Takeaways
- REST for public/browser APIs, gRPC for internal high-performance
- Always implement timeouts, retries, and fallbacks
- Use correlation IDs for tracing
- Version your APIs properly
- Circuit breakers prevent cascade failures
Next Steps
Interview Deep-Dive
'Your Order Service makes synchronous calls to User Service, Inventory Service, and Payment Service during order creation. On Black Friday, latency spikes from 200ms to 8 seconds and orders start timing out. What do you do -- both immediately and long-term?'
'Your Order Service makes synchronous calls to User Service, Inventory Service, and Payment Service during order creation. On Black Friday, latency spikes from 200ms to 8 seconds and orders start timing out. What do you do -- both immediately and long-term?'
X-Request-Deadline header with the absolute timestamp when the request expires. Each downstream service checks this header and fast-fails if the deadline has already passed, rather than starting work that will be discarded anyway. gRPC has this built in with its deadline propagation mechanism, which is one reason I prefer gRPC for internal service communication.'When would you choose gRPC over REST for internal service communication, and what are the gotchas that teams hit when adopting gRPC?'
'When would you choose gRPC over REST for internal service communication, and what are the gotchas that teams hit when adopting gRPC?'
- Debugging is harder. You cannot curl a gRPC endpoint. You need grpcurl or Evans or a custom tool. When something breaks at 2 AM, the on-call engineer who is used to “curl the endpoint and read the JSON” is going to be frustrated. Invest in tooling upfront — grpcui provides a web interface that helps.
- Load balancing is tricky. Because gRPC uses long-lived HTTP/2 connections, traditional L4 load balancers (like AWS NLB) do not distribute traffic well — one connection goes to one backend and stays there. You need L7 load balancing (Envoy, Istio, or client-side load balancing with gRPC’s built-in resolver). This catches almost every team on their first gRPC deployment.
- Proto file management becomes a real problem at scale. When you have 30 services, each with their own .proto files, versioning and distribution becomes a mini-infrastructure project. You need a proto registry (like Buf) and CI checks that validate backward compatibility on every PR.
- Browser support requires gRPC-Web. If any client is a browser, you need gRPC-Web proxy (Envoy) or you maintain a REST gateway alongside gRPC. This dual-protocol setup adds operational complexity.
buf breaking against the previous version, and if it detects a breaking change, the PR is blocked. For legitimate breaking changes (rare), we version the service — user.v1.UserService and user.v2.UserService — and run both simultaneously during migration.'Explain the difference between a Correlation ID and a Trace ID. Why do you need both, and what happens if you only implement one?'
'Explain the difference between a Correlation ID and a Trace ID. Why do you need both, and what happens if you only implement one?'