Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
REST & Microservices Interview Questions (55+ Deep-Dive Q&A)
1. REST API Design
1. Six Constraints of REST
1. Six Constraints of REST
- Client-Server — Separation of concerns. The client handles UI/UX, the server handles data storage and business logic. They evolve independently. Example: your React SPA can be rewritten to React Native without touching the API.
- Stateless — Every request from client to server must contain all the information needed to understand it. The server stores no session state between requests. This is what makes horizontal scaling possible — any server can handle any request because there is no “sticky” state. In practice, this means auth tokens (JWTs) travel with every request instead of relying on server-side sessions.
- Cacheable — Responses must implicitly or explicitly label themselves as cacheable or non-cacheable. This enables clients, CDNs, and intermediary proxies to cache responses and reduce server load. Headers like
Cache-Control,ETag, andLast-Modifieddrive this. - Uniform Interface — The most fundamental constraint. It requires: resource identification via URIs, manipulation through representations (JSON/XML), self-descriptive messages (content-type headers), and HATEOAS (hypermedia links in responses). This is what distinguishes REST from plain HTTP RPC.
- Layered System — The client cannot tell whether it is connected to the end server or an intermediary. Load balancers, CDNs, API gateways, and WAFs can sit between client and server transparently. This enables security policies, caching layers, and load distribution without client awareness.
- Code on Demand (Optional) — Servers can temporarily extend client functionality by transferring executable code (e.g., JavaScript). This is the only optional constraint and is rarely cited in API design.
POST /carts creates a cart resource with an ID. The client sends that cart ID with subsequent requests. The server is still stateless — it does not “remember” the client between requests. The cart data lives in the database, not in server memory. Multi-step workflows work the same way: each step creates or updates a resource, and the client tracks its own progress.2. HTTP Verbs & Idempotency
2. HTTP Verbs & Idempotency
| Verb | Purpose | Safe | Idempotent | Request Body |
|---|---|---|---|---|
GET | Retrieve resource | Yes | Yes | No |
POST | Create new resource | No | No | Yes |
PUT | Full replacement | No | Yes | Yes |
PATCH | Partial update | No | It depends | Yes |
DELETE | Remove resource | No | Yes | Optional |
HEAD | Like GET, headers only | Yes | Yes | No |
OPTIONS | Capabilities/CORS preflight | Yes | Yes | No |
- Safe means the method does not modify server state. GET and HEAD are safe. You should never design a
GET /deleteUser/123endpoint — it violates safety and will be triggered by crawlers and prefetch. - Idempotent means making the same request N times produces the same result as making it once.
DELETE /users/123returns 200 the first time and 404 after, but the server state is the same (user is gone). That is still idempotent. - PATCH is tricky: A JSON Merge Patch (
{"email": "new@example.com"}) is idempotent. But a JSON Patch operation like[{"op": "add", "path": "/tags/-", "value": "new"}]appends an element each time — that is NOT idempotent. The idempotency of PATCH depends entirely on the patch format used. - POST is never idempotent by default, which is why payment APIs (Stripe, PayPal) require
Idempotency-Keyheaders — to make POST behave idempotently for critical operations.
POST /v1/charges request requires an Idempotency-Key. If the client retries (network timeout), Stripe returns the cached response from the first successful processing. Without this, a flaky network could charge a customer twice. They store idempotency keys for 24 hours.Red flag answer: “PUT and PATCH are basically the same thing” or “Idempotent means the response is always the same” (it means the server state is the same, not the response code).Follow-up questions:Q: A mobile client sends a POST to create an order but gets a network timeout. It retries. How do you prevent duplicate orders without idempotency keys?
You could use a unique constraint on a business-level identifier (e.g., a client-generated order reference), but this is fragile. The robust solution is idempotency keys: the client generates a UUID before the first attempt and sends it with every retry. The server checks a key-value store (Redis with a TTL of 24-48 hours) — if the key exists, return the stored response. If not, process the request and store the result keyed by that UUID. This pattern is used by Stripe, Square, and Amazon Pay. The key expiry should match your business context — Stripe uses 24 hours, which covers any reasonable retry window.Q: Can a GET request ever modify server state? When might this be acceptable?
Technically it should never modify state per the HTTP spec. In practice, tracking endpoints like GET /emails/track.png in marketing emails update a “read” counter — the side effect is non-destructive and the response (a 1x1 pixel) is always the same. Analytics logging on GET is another common case. The principle: incidental side effects (logging, metrics) are acceptable, but the GET should never be the primary mechanism for state mutation.Q: You are designing an API for a banking transfer. Which verb do you use and why?
POST /transfers with an idempotency key. Not PUT because we are creating a new resource, not replacing an existing one. The idempotency key is non-negotiable for financial operations. The response should include the transfer ID and status. The endpoint should validate funds, create the transfer record, and enqueue the actual money movement asynchronously. Return 201 Created on success with a Location header pointing to the transfer resource.3. Status Codes
3. Status Codes
200 OK with {"error": "Something broke"} in the body.Answer:2xx — Success:200 OK— Standard success. Use for GET, PUT, PATCH responses with a body.201 Created— Resource created. Must includeLocationheader pointing to the new resource. Use after successful POST.202 Accepted— Request received but not yet processed. Used for async operations (batch jobs, long-running tasks). Client should poll or receive a webhook.204 No Content— Success with no response body. Typical for DELETE or PUT when you do not need to return the updated resource.
301 Moved Permanently— Resource permanently moved. Browsers and clients cache this. Use when you deprecate an endpoint permanently.302 Found— Temporary redirect. Used in OAuth flows to redirect to the authorization server.304 Not Modified— Client’s cached version is still valid. Server checksIf-None-Match(ETag) orIf-Modified-Sinceheaders and skips sending the body. Saves bandwidth — critical for mobile APIs.
400 Bad Request— Malformed request (invalid JSON, missing required fields). Include a structured error body with field-level details.401 Unauthorized— Actually means unauthenticated. No valid credentials provided. The naming is historically misleading.403 Forbidden— Authenticated but not authorized. You know who they are, but they lack permission.404 Not Found— Resource does not exist. Also commonly used to hide the existence of resources from unauthorized users (security through ambiguity).409 Conflict— State conflict. Example: trying to create a user with an email that already exists, or updating a resource that has been modified since you last read it (optimistic concurrency).422 Unprocessable Entity— Request is syntactically valid JSON but semantically wrong (e.g.,"age": -5). Rails and many modern APIs prefer this over 400 for validation errors.429 Too Many Requests— Rate limit exceeded. Must includeRetry-Afterheader. Clients that respect this are good citizens.
500 Internal Server Error— Unhandled exception. Your monitoring should alert on these. Never leak stack traces to clients in production.502 Bad Gateway— The gateway/proxy received an invalid response from upstream. Common when a microservice behind an API gateway crashes.503 Service Unavailable— Server is overloaded or in maintenance. IncludeRetry-After. Used during deployments or circuit breaker activations.504 Gateway Timeout— Upstream service did not respond in time. Common in microservices when a downstream dependency is slow.
200 OK for every response and put the actual status in the JSON body as {"status": "error", "code": 500}. Their monitoring showed 0% error rate because it only checked HTTP status codes. When a database outage hit, they had no alerts for 45 minutes. Proper status codes are not just for clients — they drive your entire observability pipeline (Datadog, Grafana dashboards, alerting rules, SLO tracking).Red flag answer: “I just use 200 for success and 500 for errors” or confusing 401 with 403.Follow-up questions:Q: When should you return 404 vs 403 for a resource the user cannot access?
It depends on whether revealing the resource’s existence is a security concern. For a banking API, if user A requests user B’s account, return 404 — do not confirm the account exists. For an internal admin panel, 403 is more helpful for debugging. GitHub does this: if you request a private repo you do not have access to, you get 404, not 403. This prevents enumeration attacks where an attacker probes for valid resource IDs.Q: Your API returns 400 for every validation error. A frontend developer complains they cannot tell malformed JSON from a missing required field. How do you fix this?
Adopt a structured error response format. RFC 7807 (Problem Details for HTTP APIs) is the standard. Return {"type": "validation_error", "title": "Invalid request", "status": 422, "errors": [{"field": "email", "message": "must be a valid email address", "code": "invalid_format"}]}. Use 400 for truly malformed requests (unparseable JSON) and 422 for semantically invalid data. This separation lets frontend teams build field-level error highlighting without parsing error message strings.Q: How do status codes interact with circuit breakers in a microservices architecture?
Circuit breakers (Resilience4j, Polly, Hystrix) typically count 5xx responses and timeouts toward the failure threshold. A spike of 502s or 503s from a downstream service will trip the breaker, causing the calling service to fail fast with its own error (typically 503 with a fallback response). The key configuration decision is: should 4xx errors count toward the circuit breaker threshold? Usually no — 4xx means the client sent a bad request, not that the downstream service is unhealthy. But watch out for 429s at scale: if your downstream aggressively rate-limits you, a flood of 429s might warrant a brief back-off that resembles circuit-breaking behavior.4. REST vs SOAP
4. REST vs SOAP
| Aspect | REST | SOAP |
|---|---|---|
| Protocol | HTTP (architecturally bound) | Transport-independent (HTTP, SMTP, JMS, TCP) |
| Data Format | JSON, XML, or any format | XML only (with strict envelope structure) |
| Contract | Informal (OpenAPI is optional) | Formal WSDL (auto-generates clients) |
| Error Handling | HTTP status codes | SOAP Faults (structured XML errors) |
| Security | HTTPS + OAuth/JWT | WS-Security (message-level encryption, signing) |
| Statefulness | Stateless by design | Can be stateful (WS-ReliableMessaging) |
| Transactions | No built-in support | WS-AtomicTransaction, WS-Coordination |
| Performance | Lightweight, fast (JSON is 30-50% smaller than XML) | Heavier (XML parsing, SOAP envelope overhead) |
| Tooling | Manual or code-gen from OpenAPI | Auto-generated strongly-typed clients from WSDL |
- Financial services — WS-Security provides message-level encryption (the message is encrypted even if TLS terminates at a load balancer). Banks processing SWIFT transfers use SOAP.
- Healthcare — HL7 FHIR is moving to REST, but legacy hospital systems use SOAP for compliance.
- Enterprise integration — When you need guaranteed delivery (WS-ReliableMessaging) and distributed transactions (WS-AtomicTransaction) across organizational boundaries.
- Formal contracts — WSDL allows auto-generation of strongly-typed client libraries. This matters when your API has 500+ operations and multiple consumer teams.
5. HATEOAS
5. HATEOAS
"shipped", the cancel link would disappear — the client knows cancellation is no longer possible without any client-side business logic.Why it matters in theory:- Decoupling — URLs can change without breaking clients (clients follow links, not hardcoded paths).
- Discoverability — A new client can explore the API starting from just the root URL.
- Server-driven workflows — The server controls what transitions are valid, reducing client-side state management.
- Most frontend teams prefer explicit documentation (Swagger UI) over dynamically discovering endpoints.
- Adding link generation to every response increases response size by 20-40% for simple resources.
- Client SDKs are typically auto-generated from OpenAPI specs, making URL hardcoding a non-issue.
- The industry standard (GitHub, Stripe, Twilio) is partial HATEOAS: pagination links and Location headers, but not full hypermedia-driven navigation.
- Long-lived APIs with many consumer teams (banking, government) where URLs may evolve over decades.
- Workflow-driven APIs where valid next actions depend on current state (order fulfillment, insurance claims processing).
- HAL, JSON:API, Siren are established media types that standardize how links are embedded.
_links and _embedded to JSON responses. Great for simple APIs. JSON:API is a full specification that standardizes filtering, sorting, pagination, sparse fieldsets, and relationships. It is opinionated and reduces bikeshedding but adds payload overhead. Siren goes furthest — it includes actions (forms) with field definitions, so the client knows what data to send for each action. Siren is closest to true HATEOAS but is the most complex. For most teams, HAL or simple link objects (like GitHub uses) are the pragmatic choice.6. Pagination Strategies
6. Pagination Strategies
- The database executes
SELECT * FROM users ORDER BY id LIMIT 20 OFFSET 1000— this means the DB reads and discards 1000 rows before returning 20. - Performance degrades linearly with depth. Page 1 is fast. Page 500 (offset 10,000) forces the DB to scan 10,000 rows.
- Supports “jump to page 50” and total count (
SELECT COUNT(*)). - Data drift problem: If items are inserted or deleted between page requests, users see duplicates or miss items. If a new user is inserted at position 5 while you are on page 2, user #20 from page 1 appears again on page 2.
- The cursor is typically a Base64-encoded pointer (e.g., the last seen
idorcreated_atvalue). - The database executes
SELECT * FROM users WHERE id > 42 ORDER BY id LIMIT 20— uses an index seek, consistently fast regardless of depth. - No page jumping — you can only go forward (or backward with a
beforecursor). No total count without a separate query. - Stable under mutations — inserting or deleting rows does not cause drift because the cursor is anchored to a specific record.
- A variant of cursor pagination using timestamps. Works well for event streams and audit logs.
- Gotcha: If multiple events share the same timestamp, you can miss records. Solution: use composite cursors (timestamp + tie-breaking ID).
- Offset — Internal admin dashboards where users expect “Page 3 of 50” and datasets are small (< 100K rows).
- Cursor — Public APIs, mobile feeds, any endpoint where deep pagination is expected or data mutates frequently. Used by Facebook, Twitter, Slack, and Stripe.
- Seek/Time — Event sourcing, audit logs, changelog endpoints.
SELECT COUNT(*) on large tables is expensive (full table scan on InnoDB, though PostgreSQL is slightly better with its visibility map). Options: (1) Cache the count with a short TTL (30 seconds) and accept slight inaccuracy. (2) Use a COUNT(*) estimate from EXPLAIN (Postgres: SELECT reltuples FROM pg_class WHERE relname = 'users' — fast but approximate). (3) Display “Showing 1-20 of ~14.5K” with an approximate count. (4) Track the count in a separate materialized counter that updates asynchronously. Slack’s API returns "has_more": true instead of a total count — a UX-friendly alternative.Q: How does cursor pagination work when sorting by a non-unique field like created_at?
If two records have the same created_at, a cursor pointing to that timestamp is ambiguous. Solution: use a compound cursor that includes both the sort field and a unique tie-breaker (typically the primary key). The query becomes WHERE (created_at, id) > ('2024-01-15 10:30:00', 42) ORDER BY created_at, id LIMIT 20. This guarantees deterministic ordering. The cursor encodes both values: base64({"created_at": "2024-01-15T10:30:00Z", "id": 42}).Q: A client paginates through a dataset that is being actively written to. How do you guarantee consistency?
With cursor pagination, new records inserted after the cursor position will appear on subsequent pages (which is usually desired for feeds). Records inserted before the cursor are naturally excluded. If the client needs a frozen snapshot, you can use a versioned cursor that includes a timestamp filter: WHERE created_at <= :snapshot_time AND (sort_key, id) > (:cursor_sort, :cursor_id). Alternatively, use database-level snapshot reads (PostgreSQL’s REPEATABLE READ isolation) for short-lived pagination sessions.7. Versioning Strategies
7. Versioning Strategies
- Most common in practice (used by Stripe, Twilio, Google).
- Simple, explicit, cacheable (CDNs can cache
/v1/and/v2/separately). - Downside: pollutes the URI space. Every version is essentially a new API surface to maintain.
- Keeps URLs clean. GitHub uses media type versioning (
application/vnd.github.v3+json). - Harder to test (cannot just paste a URL in a browser). Requires client education.
- Easy to implement and test. Used by some Google APIs.
- Pollutes query strings. Can cause caching issues (CDNs might not key on query params by default).
- Never remove or rename fields. Only add new ones.
- Used by LinkedIn (Restli) and some internal APIs.
- Requires rigorous discipline: every change must be backward-compatible.
- Works best when you control most clients or have a small number of consumers.
- Non-breaking changes (add a field, add an endpoint): no version bump. Ever.
- Breaking changes (remove a field, change a field type, rename a field): version bump.
- Stripe’s approach: version pinning. Each API key is pinned to a version. Stripe maintains every version since 2011. They use an internal compatibility layer that transforms requests/responses between versions. This is engineering-expensive but customer-friendly.
/v1/ in the URL” without discussing when to actually bump versions, how to deprecate, or the cost of maintaining multiple versions.Follow-up questions:Q: Your API has 200 consumers and you need to make a breaking change. Walk me through your strategy.
(1) Ship the new version alongside the old one with a minimum 6-month overlap. (2) Add Sunset and Deprecation response headers to v1 responses (RFC 8594). (3) Notify consumers via email, developer portal announcement, and dashboard warnings. (4) Track v1 usage per consumer in your analytics (Datadog/Mixpanel). (5) Offer a migration guide and ideally a compatibility shim. (6) As the sunset date approaches, personally reach out to the top 10 consumers still on v1. (7) Return 410 Gone after the sunset date. Stripe gives 12+ months and personally helps large customers migrate.Q: How does versioning interact with your CI/CD pipeline? How do you test multiple versions?
Each version needs its own contract tests. Use consumer-driven contract testing (Pact) so that every consumer’s expected contract is verified against the current provider. In CI, run the test suite for every supported version. Maintain version-specific fixtures/snapshots. API version translation logic (if using Stripe-style version pinning) should have its own unit tests. Integration environments should support routing by version header or URL path.8. Content Negotiation
8. Content Negotiation
- Client sends
Accept: application/jsonin the request header. - Server checks if it can produce JSON. If yes, responds with
Content-Type: application/json. - If the server cannot produce the requested format, it returns
406 Not Acceptable. - If the client sends no
Acceptheader, the server uses its default format (typically JSON).
Accept header features:q parameter (quality factor, 0 to 1) indicates client preference. The server should try to satisfy the highest-quality preference it supports. Most developers never use this, but load testing tools and enterprise integrations sometimes do.Content-Type for requests:
The Content-Type header on the request tells the server what format the body is in. If a client sends Content-Type: application/xml but the server only accepts JSON, return 415 Unsupported Media Type.Real-world patterns:- Format suffix:
GET /users/42.jsonorGET /users/42.csv. Used by Rails and some legacy APIs. Generally discouraged in modern API design because it mixes resource identity with representation. - Vendor media types:
Accept: application/vnd.mycompany.user.v2+json— combines versioning and content negotiation in one header. Used by GitHub. - API response envelope: Some APIs always return JSON but allow
?format=csvas a query parameter for data export endpoints. This is not standard content negotiation but is pragmatic for download features.
Content-Type but not Accept, or not knowing what a 406 or 415 status code means.Follow-up questions:Q: Your API needs to support JSON for the web app and Protocol Buffers for internal gRPC clients. How do you handle this?
Use content negotiation at the gateway level. The API gateway (Kong, Envoy) inspects the Accept header. For application/json, route to the REST handler. For application/protobuf, either route to a gRPC backend directly, or have the REST handler serialize the same domain model using a Protobuf serializer. Frameworks like gRPC-Gateway (Go) can automatically serve both REST/JSON and gRPC/Protobuf from the same service definition. The key is to keep the business logic format-agnostic and serialize at the edge.Q: A client sends a request with no Accept header. What should your API do?
Return your default format (JSON for most modern APIs) with the appropriate Content-Type header. Do not return an error. The HTTP spec says the absence of an Accept header means the client accepts any media type. However, document your default clearly. Some older APIs default to XML, which surprises modern clients.9. PATCH vs PUT
9. PATCH vs PUT
role, the server should set it to null/default — you sent the full representation, so missing fields mean “intentionally absent.” The server replaces the entire resource with what you sent.PATCH — Partial modification:email field is updated. All other fields remain unchanged.Two PATCH standards:-
JSON Merge Patch (RFC 7386) — Simple. Send a partial JSON object. Fields present are updated, fields absent are untouched, fields set to
nullare deleted. The gotcha: you cannot distinguish “set this field to null” from “do not touch this field” for optional fields. - JSON Patch (RFC 6902) — Operation-based. A sequence of operations:
test operation enables optimistic concurrency: {"op": "test", "path": "/version", "value": 5} — the patch fails if the version has changed.Concurrency implications:- Two clients simultaneously PUT the full user object. Client A reads version 3, Client B reads version 3. Client A PUTs with updated email. Client B PUTs with updated name. Client B’s PUT overwrites Client A’s email change (lost update). Solution:
If-Matchheader with ETags. - With PATCH, the same scenario is safer if both clients only send the fields they changed. Client A patches email, Client B patches name — no conflict. But this is not guaranteed safe for all operations.
ETag: "abc123" (a hash of the resource state). On PUT, the client sends If-Match: "abc123". The server checks if the current ETag matches. If yes, process the update and return the new ETag. If no, return 412 Precondition Failed — the resource was modified between the client’s read and write. The client must re-read, merge changes, and retry. This is the standard HTTP mechanism for preventing lost updates and is used by S3, CosmosDB, and Google Cloud Storage.Q: When would you choose JSON Patch over JSON Merge Patch?
JSON Patch when you need: (1) Array manipulation (add/remove specific elements by index). (2) Atomic multi-step operations (move a field from one location to another). (3) The test operation for server-side precondition checking. (4) Distinguishing between “set to null” and “do not modify.” JSON Merge Patch when: you have a simple flat object structure and want minimal complexity. For most CRUD APIs, JSON Merge Patch is sufficient.10. Handling Long Running Operations
10. Handling Long Running Operations
- Client sends
POST /reports/generatewith parameters. - Server validates the request, enqueues the work, and immediately returns:
- Client polls
GET /tasks/abc-123:
- When complete:
- Webhooks — Server POSTs the result to a client-provided callback URL when done. More efficient than polling but requires the client to expose an endpoint and handle retries/failures.
- Server-Sent Events (SSE) — Client opens a long-lived HTTP connection and receives status updates as a stream. Good for dashboards and progress bars.
- WebSockets — Bidirectional. Overkill for simple task status but useful if the client needs to send cancellation requests or adjust parameters mid-processing.
- Idempotency — The initial POST should be idempotent (use an idempotency key). If the client retries, return the existing task ID, not create a duplicate.
- Task expiry — Tasks should have a TTL. Do not store completed task results forever. Return
410 Goneafter the TTL expires. - Cancellation — Expose
POST /tasks/abc-123/cancel. The worker must check for cancellation at reasonable intervals. - Error handling — If the task fails, the status should include error details:
{"status": "failed", "error": {"code": "TIMEOUT", "message": "Upstream payment processor timed out"}}. - Rate limiting on polls — Return
Retry-Afterheaders to prevent aggressive polling. If a client polls every 100ms, throttle them.
Operation resource schemas.Red flag answer: “Just make the endpoint synchronous with a longer timeout” — this blocks threads, wastes resources, and fails at scale. Or designing the async pattern without considering cancellation, expiry, or error states.Follow-up questions:Q: You have a report generation endpoint that takes 2-30 minutes. Some users call it 50 times a day. How do you design for efficiency?
(1) Deduplicate identical requests — hash the input parameters and return a cached result if the same report was generated recently. (2) Queue with priority — use a priority queue (SQS with priority, or separate high/low priority queues in RabbitMQ) so that small reports do not get blocked behind large ones. (3) Progressive delivery — if possible, stream partial results as they are available. (4) Resource limits — cap concurrent tasks per user (e.g., 3 concurrent report generations) and return 429 with a clear message. (5) Pre-computation — for popular reports, generate them on a schedule (cron) and serve cached results.Q: The polling pattern wastes bandwidth. How would you design a push-based alternative?
Implement webhooks with a retry policy. The client registers a callback URL: POST /webhooks {"url": "https://client.com/callback", "events": ["task.completed", "task.failed"]}. When the task completes, your system POSTs the result to the callback URL. Implement exponential backoff retries (1s, 2s, 4s, 8s… up to 5 attempts) with a dead letter queue for permanently failed deliveries. Include HMAC signatures in the webhook payload (X-Webhook-Signature header) so the client can verify authenticity. Stripe, GitHub, and Shopify all use this exact pattern.2. Microservices Architecture
11. Monolith vs Microservices
11. Monolith vs Microservices
| Aspect | Monolith | Microservices |
|---|---|---|
| Deployment | Single artifact (WAR/JAR/binary) | Independent per-service deployments |
| Scaling | Vertical or clone the whole app | Scale individual services independently |
| Team structure | One team, shared codebase | Team-per-service (Conway’s Law) |
| Data | Single shared database | Database-per-service |
| Complexity | In the code (big codebase) | In the infrastructure (networking, orchestration) |
| Debugging | Stack trace, single process | Distributed tracing across services |
| Transactions | ACID (simple) | Sagas, eventual consistency (complex) |
| Latency | In-process function calls (nanoseconds) | Network calls (milliseconds) |
| Ideal for | Small teams (< 10 devs), new products, startups | Large organizations, high-scale systems, independent team scaling |
- Start monolith, extract later. Shopify is a 3M+ line Ruby monolith serving $200B+ in GMV. Basecamp runs on a monolith. The “microservices first” approach has killed startups by drowning small teams in operational complexity before finding product-market fit.
- The modular monolith is often the right middle ground — a single deployment unit with strict module boundaries, separate databases per module, and well-defined internal APIs. When a module needs independent scaling, extract it.
- Microservices are an organizational scaling strategy, not a technical one. You adopt them when you have enough teams that deploy independence and service ownership reduce coordination costs. Amazon’s “two-pizza team” rule was the driver, not a technical requirement.
- The hidden cost: Microservices require: service mesh or API gateway, distributed tracing (Jaeger/Zipkin), centralized logging (ELK/Datadog), container orchestration (Kubernetes), CI/CD per service, contract testing, saga orchestration. This is easily 2-3 dedicated platform engineers just to maintain the infrastructure.
12. Database per Service
12. Database per Service
- Loose coupling — Services can change their schema without coordinating with other teams. If the User service switches from PostgreSQL to MongoDB, no other service is affected.
- Independent scaling — The Search service can use Elasticsearch while the Order service uses PostgreSQL. Each database is sized for its own workload.
- Encapsulation — The service controls its data invariants. No other service can put the data in an inconsistent state by writing directly.
-
Cross-service queries / Reporting:
- You cannot
JOIN users ON orders.user_id = users.idacross two databases. - Solutions: (a) API Composition — the caller queries both services and joins in memory. Works for simple cases but creates latency and coupling. (b) CQRS read store — publish events to build a denormalized read model (Elasticsearch, data warehouse) that contains pre-joined data. (c) Data lake / ETL — for analytics, replicate data into a warehouse (Snowflake, BigQuery) where analysts can query freely.
- You cannot
-
Distributed transactions:
- You cannot use a single database transaction across two services.
- Solution: Sagas (covered in Q15). Accept eventual consistency where possible.
-
Data duplication:
- The Order service needs the user’s name for the invoice. Should it call the User service on every request?
- Pragmatic approach: Store a denormalized copy of frequently needed data (user name, email) in the Order service, updated via events. Accept that it may be slightly stale (eventual consistency).
-
Referential integrity:
- No foreign keys across services. If the User service deletes a user, the Order service still has orders referencing that user ID. Handle with soft deletes, event-driven cleanup, or orphan detection jobs.
PENDING state and publishes an OrderCreated event. The Inventory service reserves the items and publishes InventoryReserved. The Order service then confirms the order. If the Inventory service fails, it publishes InventoryReservationFailed, and the Order service executes a compensating transaction (cancels the order). Use the Outbox pattern (write the event to a local outbox table in the same transaction as the state change, then publish from the outbox) to avoid the dual-write problem where the DB write succeeds but the event publish fails.13. API Gateway Pattern
13. API Gateway Pattern
- Request routing — Routes
/users/*to the User service,/orders/*to the Order service. Can route based on path, headers, or query parameters. - Authentication & Authorization — Validates JWT tokens, API keys, or OAuth tokens before the request reaches any backend service. Centralizes auth logic.
- Rate limiting & Throttling — Enforces per-client, per-endpoint, or per-tier rate limits. Returns
429withRetry-After. - SSL/TLS termination — Handles HTTPS at the edge. Internal traffic can use plain HTTP or mTLS.
- Request/Response transformation — Translates between external and internal formats (e.g., REST to gRPC, JSON to Protobuf).
- Request aggregation — Combines multiple backend calls into a single client response (reduces mobile network round-trips).
- Caching — Caches GET responses at the gateway layer. Reduces load on backend services.
- Observability — Logs all requests, adds trace IDs, emits metrics (latency, error rates, request counts per endpoint).
- Kong — Open-source, plugin-based, Lua/Go. Strong plugin ecosystem.
- AWS API Gateway — Managed, serverless-friendly, integrates with Lambda.
- Envoy — High-performance C++ proxy, foundation of Istio service mesh.
- Traefik — Cloud-native, auto-discovers services in Docker/K8s.
- NGINX / HAProxy — Traditional reverse proxies that can function as lightweight gateways.
- Apigee (Google) — Enterprise API management with developer portal, monetization.
if (userType === 'premium') in the gateway, that logic belongs in a backend service. (2) Horizontal scaling — gateways should be stateless so you can run 10+ instances behind a load balancer. (3) Async where possible — use non-blocking I/O (Envoy, Kong, NGINX are all event-loop based). (4) Caching — cache auth token validation results (short TTL) and GET responses. (5) Monitor gateway latency as a first-class SLI — if p99 gateway overhead exceeds 10ms, investigate.14. Service Registry & Discovery
14. Service Registry & Discovery
- Each service registers itself with a Service Registry (Consul, Eureka, etcd, ZooKeeper) on startup with its IP, port, and health check URL.
- When Service A needs to call Service B, it queries the registry for healthy instances of Service B.
- Service A’s client-side load balancer (Ribbon in Spring Cloud) picks an instance and makes the call directly.
- Pro: No extra network hop through a load balancer. Lower latency.
- Con: Every service needs a discovery client library. Language-coupling (Java-centric in Spring Cloud ecosystem).
- Services register with the registry (or are automatically registered).
- Service A calls a load balancer / reverse proxy (NGINX, Envoy, AWS ALB).
- The load balancer queries the registry and forwards the request to a healthy instance.
- Pro: Services are dumb — they just make HTTP calls to a known address. Language-agnostic.
- Con: Extra network hop. Load balancer is a potential bottleneck.
- Kubernetes provides built-in server-side discovery. Each Service object gets a DNS name:
user-service.default.svc.cluster.local. - kube-proxy handles load balancing across healthy pods.
- No external registry needed. This is why Consul/Eureka usage has declined in K8s-native environments.
- Gotcha: K8s DNS uses TTLs. If a pod dies and a new one starts, DNS may briefly resolve to the dead pod. Solutions: shorter DNS TTL, readiness probes, or using headless services with client-side load balancing.
/health or /readyz). (3) The registry deregisters instances that miss multiple heartbeats. Consul uses both TTL-based and script-based health checks.Red flag answer: “Just hardcode the IP addresses” or not knowing how Kubernetes handles service discovery.Follow-up questions:Q: Your service calls another service via Consul discovery. The target service deploys a new version, and for 30 seconds, some requests fail. What happened?
During deployment, old instances deregister and new instances register. If the new instances have not passed their health checks yet (startup time + initial health check interval), there is a window where the registry has fewer healthy instances — or none, causing failures. Solutions: (1) Use rolling deployments with readiness probes — new pods only receive traffic after health checks pass. (2) Increase the minimum number of healthy instances during deployment. (3) Use blue/green deployment so the old version keeps serving until the new version is fully healthy. (4) Client-side retry with a different instance on failure.Q: How does service discovery interact with a service mesh like Istio?
In Istio, the sidecar proxy (Envoy) handles all service discovery transparently. Your application code does not need a discovery client — it makes calls to http://user-service:8080 and the Envoy sidecar intercepts the call, resolves the destination via Istio’s Pilot control plane (which watches K8s API server for endpoint changes), applies load balancing, retries, circuit breaking, and mTLS — all without any application code changes. This is the “service mesh” approach to discovery: push all networking concerns into the infrastructure layer.15. Saga Pattern (Distributed Transactions)
15. Saga Pattern (Distributed Transactions)
- Order Service: Create order (status: PENDING)
- Payment Service: Charge customer’s credit card
- Inventory Service: Reserve items
- Shipping Service: Schedule delivery
- Compensate step 2: Refund the credit card
- Compensate step 1: Cancel the order (status: CANCELLED)
- Each service publishes events after completing its step. The next service reacts to that event.
- Order Service publishes
OrderCreated—> Payment Service listens, charges card, publishesPaymentCompleted—> Inventory Service listens, reserves items, publishesInventoryReserved—> etc. - Pro: No single coordinator. Services are loosely coupled. Simple for 3-4 step sagas.
- Con: Hard to track the overall saga state. Debugging requires correlating events across services. Difficult to reason about as the number of steps grows. Cyclic event dependencies can emerge.
- A Saga Orchestrator (dedicated service or state machine) tells each service what to do and handles responses.
- Orchestrator sends
ChargeCardcommand to Payment Service, waits for response, then sendsReserveInventoryto Inventory Service, etc. - Pro: Clear flow. Easy to debug and monitor. Handles complex branching logic (if payment type is X, skip step Y).
- Con: Orchestrator is a single point of failure and a coupling point. Can become a “god service” if not carefully scoped.
- Idempotency of compensations — Compensating transactions must be idempotent. If the “refund” message is delivered twice (at-least-once delivery), it should not refund twice.
- The Outbox Pattern — Write the event/command to a local
outboxtable in the same database transaction as the state change. A separate process (Debezium, polling publisher) reads the outbox and publishes to the message broker. This prevents the dual-write problem (DB commits but message publish fails). - Semantic vs. technical compensation — Refunding a charge is not the same as “undoing” it. The customer sees two transactions on their statement. Some operations are not reversible (email sent, physical goods shipped). Design compensations for what is business-meaningful, not technically perfect.
- Timeout handling — If a service does not respond, the orchestrator must decide: retry, compensate, or alert for manual intervention. Use a dead letter queue for permanently failed steps.
16. CQRS (Command Query Responsibility Segregation)
16. CQRS (Command Query Responsibility Segregation)
- Write side — Receives commands (
CreateOrder,UpdateAddress). Validates business rules. Writes to a normalized, consistent database (PostgreSQL, DynamoDB). - Sync mechanism — Changes are propagated to the read side via events (Kafka, RabbitMQ) or Change Data Capture (Debezium watching the write DB’s WAL).
- Read side — Maintains denormalized, query-optimized projections. Could be Elasticsearch (full-text search), Redis (fast lookups), a materialized view in PostgreSQL, or a dedicated read replica.
- Read/write ratio is heavily skewed — Most apps are 90%+ reads. Optimize the read model for query patterns without compromising write consistency.
- Different read patterns — The search API needs Elasticsearch, the dashboard needs pre-aggregated analytics, the detail page needs a document store. Each is a different “projection” of the same data.
- High-scale systems — Scale read replicas independently of the write database. Netflix serves billions of reads from cached/denormalized stores while writes go to a consistent master.
- Complex domain models — When the write model (rich domain objects with business rules) is very different from what clients need to read (flat DTOs).
- Simple CRUD apps with basic query patterns.
- Small teams where the operational complexity of maintaining two models and a sync mechanism is not justified.
- When strong consistency is required everywhere (no tolerance for eventual consistency on reads).
17. Event Sourcing
17. Event Sourcing
account_balance = 500), you store every event that ever happened to it:0 + 1000 - 300 - 200 = 500.Key characteristics:- Events are immutable — Once written, never modified or deleted. This is an append-only log.
- Complete audit trail — You know not just the current state but every change that ever happened, when, and why.
- Temporal queries — “What was the account balance at 3 PM yesterday?” Replay events up to that timestamp.
- Event replay — If you add a new projection (e.g., a fraud detection model), you can replay all historical events through it to backfill the data.
- Event schema evolution — Events are stored forever. What happens when you need to add a field to
MoneyDeposited? You need event versioning and upcasting (transforming old events to new schemas on read). - Replay performance — An entity with 1 million events takes a long time to replay. Solution: Snapshots — periodically store the current state (e.g., every 100 events). Replay only events after the snapshot.
- Eventual consistency — Projections (read models) are rebuilt from the event stream and may lag behind the write model.
- Complexity — Developers must think in events, not state. Debugging requires replaying event sequences. This is a fundamental mindset shift.
- Storage growth — Events accumulate forever. For high-volume systems, this can be significant. Use event archiving strategies (move old events to cold storage).
- Financial systems — Regulatory requirement for full audit trails. Banks, trading platforms, insurance.
- Collaborative editing — Google Docs stores operations (events), not document state. Enables undo/redo and conflict resolution.
- Gaming — Replay systems (watch a game replay) are event sourcing.
- Compliance-heavy domains — Healthcare, legal, government where you must prove what happened and when.
DepositCorrected { original_event_id: "xyz", corrected_amount: 500, reason: "Data entry error" }. The projection logic handles the correction. You NEVER delete or modify existing events — that would break the audit trail and any downstream consumers that have already processed the original event. In extreme cases (GDPR right to erasure), use “crypto-shredding” — encrypt events with a per-user key and destroy the key when erasure is required, making the events unreadable.18. Circuit Breaker
18. Circuit Breaker
- Closed (normal) — Requests flow through. Failures are counted. If failure count exceeds threshold within a time window (e.g., 5 failures in 10 seconds), transition to Open.
- Open (tripped) — All requests immediately fail without calling the downstream service. Returns a fallback response or error. After a timeout period (e.g., 30 seconds), transition to Half-Open.
- Half-Open (testing) — Allow a limited number of requests through (e.g., 1-3). If they succeed, transition back to Closed. If they fail, transition back to Open.
- 5xx responses from downstream
- Connection timeouts (TCP connect timeout, typically 1-3 seconds)
- Read timeouts (no response within expected time, typically 5-30 seconds)
- NOT 4xx responses — those indicate client errors, not downstream health issues (exception: 429 may warrant circuit breaking)
- Return cached data (stale but available)
- Return a default/degraded response (e.g., show a generic product recommendation instead of personalized ones)
- Return an error with a meaningful message (
503 Service Unavailable, please retry) - Queue the request for later processing (if the operation is not time-sensitive)
19. Strangler Fig Pattern
19. Strangler Fig Pattern
- Place a facade (API Gateway/Reverse Proxy) in front of the monolith. All traffic routes through this facade. Initially, 100% of requests go to the monolith. This is zero-risk — behavior does not change.
- Identify the first module to extract. Choose based on: independent business domain, high change frequency, distinct scaling needs, clean data boundaries. Usually a peripheral feature (notifications, search) rather than core domain logic.
- Build the new service. Implement the same functionality in the new microservice. Write the same API contract (or an improved version with backwards compatibility).
-
Route traffic gradually.
- Start with 1% canary traffic (shadow mode: send to both, compare responses).
- Increase to 10%, 50%, 100% as confidence grows.
- Use feature flags or gateway routing rules to control traffic splitting.
- Decommission the old module. Once 100% of traffic goes to the new service and the old code path has been inactive for a safe period (2-4 weeks), remove the old code from the monolith.
- Repeat for the next module.
- Parallel run / Shadow testing — Run both old and new implementations simultaneously, compare outputs, and log discrepancies. Do this before routing real traffic to the new service.
- Shared data migration — The hardest part. If the old module read/wrote to the monolith’s shared database, the new service needs its own database. Options: dual-write during transition (risky), event-based sync, or database replication with a cutover.
- Strangling the database — Often harder than strangling the code. Use views or database triggers to redirect reads/writes during the transition.
- Timeline — Real migrations take 12-24 months for a medium monolith. Amazon’s migration took years. Be patient.
20. Bulkhead Pattern
20. Bulkhead Pattern
-
Thread pool isolation:
- Instead of one shared thread pool for all outgoing HTTP calls, create separate pools per downstream service.
- Service A gets 20 threads, Service B gets 30 threads, Service C gets 10 threads.
- If Service A is slow and saturates its 20 threads, Services B and C are unaffected — their pools are independent.
- Netflix Hystrix popularized this approach.
-
Connection pool isolation:
- Separate database connection pools for different operations (reads vs writes, critical vs non-critical).
- If a batch reporting query exhausts its connection pool, the user-facing API still has connections available.
-
Semaphore isolation:
- Lighter than thread pools. Uses a counter to limit concurrent calls to a resource.
- If the semaphore limit is 10 and 10 calls are in-flight, the 11th call fails immediately.
- Lower overhead than thread pools (no context switching) but no timeout control.
-
Infrastructure-level isolation:
- Deploy critical services on dedicated nodes (Kubernetes node affinity/taints).
- Separate Kubernetes namespaces with resource quotas (CPU, memory limits).
- Separate database instances for critical vs non-critical services.
- Multi-AZ deployment with per-AZ isolation.
pool_size = (requests_per_second * p99_latency_seconds) + buffer. Example: if Service A receives 100 RPS and p99 latency is 200ms, you need 100 * 0.2 = 20 threads + 30% buffer = 26 threads.Production example: An e-commerce platform had a shared thread pool of 200 threads for all downstream calls. The payment service had a transient issue causing 5-second timeouts. All 200 threads were blocked waiting for payment responses, meaning product search, inventory checks, and user authentication all failed. After implementing bulkheads — 50 threads for payment, 50 for product, 50 for inventory, 50 for auth — a payment service issue only affected checkout, while browsing continued normally.Red flag answer: Describing bulkheads only as “separate thread pools” without discussing connection pools, infrastructure isolation, or how to size the pools.Follow-up questions:Q: How do bulkheads interact with circuit breakers?
They are complementary. The bulkhead limits resource consumption (prevents thread exhaustion). The circuit breaker prevents making futile calls (stops calling a known-failing service). Together: the bulkhead ensures that even if the circuit breaker is not yet tripped (failure threshold not reached), the slow service cannot consume more than its allocated resources. Think of bulkheads as preventive (limit blast radius) and circuit breakers as reactive (stop bleeding when failure is detected).Q: You have a Kubernetes cluster with 10 services. How do you apply the bulkhead pattern at the infrastructure level?
(1) Set resource requests and limits on every pod (resources.requests.cpu: 500m, resources.limits.cpu: 1000m). This prevents one service from consuming all cluster CPU. (2) Use ResourceQuotas per namespace to cap total resource consumption per team. (3) Use PodDisruptionBudgets to ensure minimum availability during node maintenance. (4) For critical services, use node taints and tolerations to dedicate nodes. (5) Use PriorityClasses so that critical pods evict non-critical pods under resource pressure, not the other way around.3. Communication Protocols
21. gRPC vs REST
21. gRPC vs REST
| Aspect | gRPC | REST |
|---|---|---|
| Transport | HTTP/2 (required) | HTTP/1.1 or HTTP/2 |
| Serialization | Protocol Buffers (binary) | JSON (text), XML |
| Schema | Strict .proto contract (required) | Optional (OpenAPI) |
| Streaming | Bidirectional, server, client streaming | Request/response only (SSE for server push) |
| Code generation | Auto-generated clients/servers in 10+ languages | Manual or OpenAPI codegen |
| Browser support | Limited (requires gRPC-Web proxy like Envoy) | Native |
| Performance | 5-10x faster serialization, 30-50% smaller payloads | Human-readable, easier to debug |
| Load balancing | Requires L7 LB (Envoy) — HTTP/2 multiplexes on one TCP connection | Standard L4/L7 LB works |
| Debugging | Requires special tools (grpcurl, BloomRPC, Postman) | curl, browser, any HTTP client |
- Internal service-to-service communication where performance matters and both sides are controlled by your org.
- Streaming use cases — real-time data feeds, event streams, chat backends.
- Polyglot environments — auto-generated clients eliminate hand-written HTTP client code in each language.
- High-throughput systems — at thousands of RPS, the serialization savings compound. Google reports saving significant CPU by using Protobuf internally.
- Public/External APIs — every developer knows HTTP + JSON. No learning curve.
- Browser-facing — gRPC does not work natively in browsers (gRPC-Web requires a proxy).
- Simple CRUD — the overhead of Protobuf schemas, code generation, and HTTP/2 infrastructure is not justified.
.proto files. Use gRPC-Gateway (Go), grpc-spring (Java), or similar libraries to auto-generate a REST/JSON reverse proxy that translates HTTP/JSON requests into gRPC calls. The business logic lives in the gRPC service implementation. External clients hit the REST gateway, internal services call gRPC directly. Google Cloud APIs use this pattern — every API has both REST and gRPC interfaces generated from the same .proto definitions.Q: You switch from REST to gRPC for an internal service and latency increases. What went wrong?
Most likely causes: (1) TLS handshake overhead — gRPC requires HTTP/2 which requires TLS. If you were using unencrypted HTTP/1.1 internally, the TLS handshake adds latency to the first connection. Solution: use connection pooling and keep-alives. (2) Protobuf serialization of very small payloads may be comparable to JSON. The performance benefit shows at scale, not for individual small messages. (3) DNS resolution overhead — gRPC’s built-in name resolver may behave differently than your HTTP client’s DNS caching. (4) Missing connection reuse — the gRPC client is not reusing channels (connections), establishing a new TCP + TLS + HTTP/2 handshake per call.22. GraphQL vs REST
22. GraphQL vs REST
| Aspect | GraphQL | REST |
|---|---|---|
| Endpoint | Single (/graphql) | Multiple (/users, /orders) |
| Data fetching | Client specifies exact fields needed | Server determines response shape |
| Over-fetching | Eliminated (get only what you asked for) | Common (fixed response includes all fields) |
| Under-fetching | Eliminated (one query can traverse relationships) | Common (need multiple requests for related data) |
| Versioning | Typically no versioning (deprecate fields) | URL or header versioning |
| Caching | Complex (POST requests, dynamic queries) | Simple (HTTP caching on GET URLs) |
| File uploads | Not natively supported (multipart workaround) | Standard multipart form data |
| Error handling | Always returns 200 with errors array | HTTP status codes |
| N+1 problem | DataLoader pattern required | Does not apply (server controls queries) |
| Rate limiting | Difficult (query complexity varies wildly) | Simple (per-endpoint limits) |
- Mobile apps with constrained bandwidth — fetch exactly what the screen needs in one request.
- Aggregation layers (BFF) — one GraphQL server in front of multiple REST microservices.
- Rapid frontend iteration — frontend teams can change data requirements without backend changes.
- Complex, interconnected data — social graphs, e-commerce catalogs with deep relationships.
- Caching — REST endpoints are trivially cacheable (CDN caches
GET /users/42). GraphQL uses POST for queries, making CDN caching hard. Solutions: persisted queries (hash the query, cache by hash), Apollo cache. - Security — A malicious query can request deeply nested data:
{ users { posts { comments { author { posts { comments } } } } } }. Without query depth limiting and cost analysis, this can DoS your database. Tools: graphql-depth-limit, graphql-cost-analysis. - N+1 queries — A naive GraphQL resolver fetches related data one record at a time. If a query returns 100 users and each needs their orders, that is 1 query for users + 100 queries for orders = 101 DB queries. DataLoader (Facebook’s library) batches these into 2 queries.
- Monitoring/Observability — All requests go to
POST /graphqlwith status 200. Traditional HTTP monitoring (error rates by endpoint) does not work. You need GraphQL-aware monitoring (Apollo Studio, GraphQL Hive).
23. Webhooks
23. Webhooks
- Registration — Consumer registers a callback URL and specifies event types:
- Event occurrence — An order is created on the provider’s side.
- Delivery — Provider sends:
- Acknowledgment — Consumer returns
200 OKwithin a timeout (typically 5-30 seconds).
- Security (HMAC signatures) — Sign the payload with a shared secret. Consumer verifies the signature before processing. Without this, anyone can POST to the callback URL. Stripe uses
HMAC-SHA256; GitHub usesHMAC-SHA256with a configurable secret. - Retry policy — If the consumer returns a non-2xx or times out, retry with exponential backoff: 1 min, 5 min, 30 min, 2 hours, 24 hours. After max retries (typically 5-10), disable the webhook and notify the consumer.
- Idempotency — Webhooks may be delivered more than once (retry after timeout, network glitch). Include a unique event ID (
X-Webhook-ID). Consumers must check for duplicate event IDs and skip already-processed events. - Ordering — Webhook delivery order is NOT guaranteed. Events may arrive out of order, especially with retries. Include a timestamp or sequence number. Consumers should use the timestamp or sequence to resolve conflicts, not assume ordering.
- Timeout — If the consumer takes 60 seconds to process, the provider’s HTTP client times out and retries, causing duplicate processing. Best practice: consumer immediately returns
200 OKand processes the event asynchronously (enqueue to a local job queue).
GET /events?since=2024-01-15T04:00:00Z. (5) The provider stores events for a retention window (7-30 days) to support replay. Stripe and Shopify both provide event replay APIs for exactly this scenario.Q: How do you prevent a slow webhook consumer from affecting your system’s throughput?
(1) Use a separate delivery queue (SQS, RabbitMQ) per consumer or per consumer tier. (2) Set aggressive timeouts (5-10 seconds) on the delivery HTTP call. (3) Implement per-consumer rate limiting on delivery attempts. (4) Use a circuit breaker on each consumer’s endpoint — if a consumer consistently fails, stop attempting delivery and notify them. (5) Never deliver webhooks synchronously in the main request path — always enqueue for async delivery.24. Server-Sent Events (SSE)
24. Server-Sent Events (SSE)
- Client opens a connection:
GET /eventswithAccept: text/event-stream. - Server keeps the connection open and sends events as they occur:
- The
EventSourcebrowser API auto-reconnects on disconnection and sendsLast-Event-IDheader to resume from where it left off.
| Aspect | SSE | WebSockets |
|---|---|---|
| Direction | Unidirectional (server to client) | Bidirectional |
| Protocol | Standard HTTP | Custom protocol (ws://) |
| Reconnection | Automatic (built into EventSource API) | Manual (must implement) |
| Data format | Text (UTF-8) only | Text or binary |
| HTTP/2 compatibility | Excellent (multiplexed with other requests) | Separate TCP connection |
| Load balancer / Proxy support | Works through standard HTTP infrastructure | Requires WebSocket-aware proxies |
| Browser support | All modern browsers (no IE) | All modern browsers |
| Max connections | ~6 per domain in HTTP/1.1, unlimited in HTTP/2 | No browser limit |
- Live feeds — stock prices, news, social media feeds, notifications. Server pushes, client displays.
- Progress updates — long-running task progress, build logs, deployment status.
- Dashboard updates — metrics, monitoring data, real-time charts.
- The key signal: if the client only needs to receive data and never sends data through the same channel, SSE is simpler and more robust than WebSockets.
- Chat / Collaborative editing — requires bidirectional communication.
- Gaming — requires low-latency bidirectional binary data.
- Binary data — SSE is text-only (though you can Base64-encode, at a size penalty).
Last-Event-ID resumption — if the client disconnects (mobile network switch), it automatically reconnects and catches up on missed events. WebSockets would require implementing reconnection, resumption, and message buffering manually. SSE also works through corporate proxies and CDNs that may block WebSocket upgrades. The only reason to choose WebSockets here would be if the client also needs to send acknowledgments (read receipts) through the same channel.Q: How does SSE scale to 100,000 concurrent connections?
Each SSE connection holds an open HTTP connection, consuming server resources (file descriptor, small memory per connection). At 100K connections: (1) Use an event-loop based server (Node.js, Go, Nginx push module) that handles many concurrent connections on few threads. (2) Fan out events via a pub/sub system (Redis Pub/Sub, Kafka) — the SSE server subscribes to event channels and pushes to connected clients. (3) Horizontally scale SSE servers behind a load balancer with sticky sessions or connection-aware routing. (4) With HTTP/2, multiple SSE streams multiplex over a single TCP connection, reducing resource usage. (5) Consider a managed service (AWS AppSync, Pusher, Ably) for very high connection counts.25. WebSockets
25. WebSockets
ws:// or wss://).Lifecycle:- Handshake — Client sends HTTP
GETwithUpgrade: websocketheader. Server responds with101 Switching Protocols. - Communication — Both sides can send text or binary frames at any time. No request/response pattern. Extremely low overhead per message (2-14 bytes header vs ~700+ bytes for HTTP headers).
- Keep-alive — Ping/pong frames maintain the connection. If no pong received, connection is considered dead.
- Close — Either side sends a close frame. Clean shutdown with status codes.
- Real-time chat — bidirectional, low-latency message exchange.
- Multiplayer gaming — sub-10ms latency for game state updates.
- Collaborative editing — Google Docs-style real-time cursor and text sync.
- Financial trading — order book updates and trade execution.
- Live sports / auction — bidirectional interaction (bidding) with real-time updates.
- Scaling — Each WebSocket connection is stateful (tied to a specific server). If you have 10 servers and a user connects to server 3, their messages go through server 3. If server 3 crashes, the connection is lost. Solutions: Redis Pub/Sub or Kafka for cross-server message broadcasting, sticky sessions with graceful reconnection.
- Load balancing — Standard HTTP load balancers round-robin each request. WebSocket connections are long-lived, so the initial connection routing determines which server handles it for the duration. Use session affinity or an L7 load balancer that understands WebSocket upgrade.
- Authentication — WebSockets do not natively support headers after the handshake. Options: (a) authenticate during the HTTP upgrade handshake (check cookie/token), (b) send an auth message as the first frame after connection.
- Reconnection — Connections will drop (mobile network switches, server deploys). The client must implement reconnection logic with exponential backoff and state recovery (what messages did I miss?).
- Proxies and firewalls — Some corporate proxies terminate long-lived connections.
wss://(WebSocket Secure) usually passes through because it looks like HTTPS.
26. Protocol Buffers (Protobuf)
26. Protocol Buffers (Protobuf)
- Define a schema in a
.protofile:
- Run the
protoccompiler to generate code in your target language (Go, Java, Python, C++, etc.). - Serialize/deserialize using the generated code. The binary format uses field numbers (1, 2, 3) not field names — this is why renaming a field is free but changing its number breaks compatibility.
- Size — Protobuf messages are 30-80% smaller than equivalent JSON. Field names are not transmitted (only numbers). Integers use variable-length encoding (small numbers use fewer bytes).
- Speed — Binary parsing is 5-100x faster than JSON text parsing. No string-to-number conversion, no quote parsing, no whitespace handling.
- Schema enforcement — The
.protofile is the contract. Both sides must agree on the schema. Type mismatches are caught at compile time, not runtime. - Code generation — Strongly-typed serialization/deserialization code generated for every supported language. No hand-written JSON parsing.
- Safe: Add a new field (with a new field number). Old clients ignore it. New clients see default value from old messages.
- Safe: Remove a field (mark it
reservedso the number is not reused). Old clients sending this field are fine — new code ignores it. - Safe: Rename a field (field numbers are used in the binary format, not names).
- Dangerous: Change a field’s type (e.g.,
int32tostring). Causes deserialization errors. - Dangerous: Reuse a field number for a different purpose. Old data with that field number will be misinterpreted.
- Avro — Schema is sent with the data (or fetched from a schema registry). Better for big data pipelines (Kafka + Schema Registry). No field numbers needed.
- MessagePack — JSON-like but binary. No schema required. Simpler but less type-safe.
- FlatBuffers — Zero-copy deserialization (read fields directly from the buffer without parsing). Used in gaming and performance-critical mobile apps.
kafkacat with schema registry integration help). For low-volume internal messaging or rapid prototyping, JSON with a JSON Schema registry is simpler.27. Message Queues (Async Communication)
27. Message Queues (Async Communication)
- One message is consumed by exactly one consumer (competing consumers pattern).
- Once consumed, the message is removed from the queue.
- Tools: RabbitMQ, Amazon SQS, ActiveMQ.
- Use case: Task distribution — 10 worker pods each pulling jobs from the queue.
- Messages are appended to a durable, ordered log. Multiple consumer groups each read all messages independently.
- Messages persist for a configurable retention period (not removed on consumption).
- Tools: Apache Kafka, Amazon Kinesis, Redis Streams, Pulsar.
- Use case: Event bus — Order service publishes
OrderCreated, and Payment, Inventory, Notifications, and Analytics services all consume it independently.
| Guarantee | Meaning | Implementation | Use case |
|---|---|---|---|
| At-most-once | Fire and forget. Message may be lost. | No acknowledgment. | Metrics, logs (acceptable loss) |
| At-least-once | Message delivered one or more times. Duplicates possible. | Consumer acknowledges after processing. Broker redelivers unacknowledged. | Most business use cases (with idempotent consumers) |
| Exactly-once | Message processed exactly once. | Transactional outbox + idempotent consumer, or Kafka’s transactional API. | Financial transactions, inventory |
- RabbitMQ — Smart broker, dumb consumer. Broker tracks which messages are delivered and acknowledged. Supports complex routing (topic exchange, headers exchange, dead letter exchange). Lower throughput (~50K messages/sec) but more flexible routing. Best for task queues, RPC-style messaging.
- Kafka — Dumb broker, smart consumer. Consumers track their offset (position in the log). Messages persist for days/weeks. Massive throughput (millions of messages/sec per cluster). Best for event streaming, log aggregation, real-time analytics.
- Dead Letter Queue (DLQ) — Messages that fail processing after max retries go to a DLQ for manual investigation. Without this, poison messages (that always fail) block the queue forever.
- Backpressure — When consumers cannot keep up, the queue grows. Monitor queue depth and consumer lag as key metrics. Alert when lag exceeds SLA.
- Message ordering — RabbitMQ guarantees ordering per queue. Kafka guarantees ordering per partition. To maintain order for a specific entity (all events for user 42), use a consistent partition key.
2 * expected_max_consumers partitions.Q: A consumer processes a message and crashes before acknowledging it. What happens?
With at-least-once delivery (the standard): the broker (RabbitMQ) redelivers the message to another consumer, or (Kafka) the consumer group rebalances and another consumer picks up from the last committed offset. The message is processed again — potentially a duplicate. This is why consumers must be idempotent: use a unique message ID to detect duplicates, or design operations to be naturally idempotent (set a value, not increment it). In Kafka, the un-acknowledged messages between the last committed offset and the crash point (“uncommitted offset gap”) will all be reprocessed.28. HTTP/2 vs HTTP/1.1
28. HTTP/2 vs HTTP/1.1
- Multiplexing — Multiple requests and responses are interleaved over a single TCP connection. In HTTP/1.1, browsers open 6-8 parallel TCP connections per domain because each connection handles only one request at a time. HTTP/2 eliminates this — one connection carries hundreds of concurrent streams. This eliminates head-of-line blocking at the HTTP layer (but not at the TCP layer — see HTTP/3).
-
Header Compression (HPACK) — HTTP headers are repetitive (same
Cookie,User-Agent,Accepton every request). HPACK compresses headers using a static dictionary (common headers), dynamic table (previously seen headers), and Huffman encoding. Reduces header overhead by 30-90%. Critical for mobile where every byte matters. -
Server Push — Server proactively sends resources it knows the client will need (e.g., push CSS and JS along with the HTML response). In practice, this feature was rarely used correctly and is being deprecated in some contexts (Chrome removed support in 2022). Early hints (
103 Early Hints) is the modern replacement. - Stream Prioritization — Clients can assign priority to streams (e.g., CSS is more important than images). Servers can use this to optimize resource delivery order. In practice, browser implementations and server support vary, making this unreliable.
- Binary framing — HTTP/2 messages are split into binary frames, making parsing more efficient and eliminating the ambiguities of HTTP/1.1 text parsing.
- HTTP/2 typically reduces page load times by 15-30% for asset-heavy web pages.
- API-to-API communication sees less dramatic improvement unless making many parallel requests (gRPC benefits significantly).
- Connection setup is faster because one TCP + TLS handshake serves all requests (vs 6 handshakes for 6 connections).
29. HTTP/3 (QUIC)
29. HTTP/3 (QUIC)
- Faster handshake — TCP + TLS requires 2-3 round trips before data can flow. QUIC combines the transport and crypto handshake into 1 round trip. For repeat connections, 0-RTT resumption sends data in the very first packet (at the cost of replay attack risk for the 0-RTT data).
- Connection migration — QUIC connections are identified by a Connection ID, not the 4-tuple (source IP, source port, dest IP, dest port). When your phone switches from WiFi to cellular, the IP address changes, but the QUIC connection survives. TCP connections break on IP change.
- Built-in encryption — QUIC always encrypts (TLS 1.3 is mandatory). Even the packet headers are partially encrypted, making it resistant to middlebox interference.
- Userspace implementation — QUIC is implemented in user space (not the OS kernel like TCP). This enables faster iteration and deployment of improvements without waiting for OS updates.
- Google services (YouTube, Search) serve ~30% of traffic over QUIC.
- Cloudflare, Fastly, and Akamai support HTTP/3 on their CDNs.
- Chrome, Firefox, Safari, and Edge all support HTTP/3.
- NGINX and Caddy support HTTP/3. HAProxy support is experimental.
- Some corporate firewalls block UDP (they only allow TCP 80/443). HTTP/3 implementations fall back to HTTP/2 over TCP when UDP is blocked.
- Debugging is harder — existing TCP analysis tools (tcpdump, Wireshark) need QUIC-specific dissectors.
- CPU usage can be higher than TCP due to userspace processing and per-stream encryption.
30. Idempotency Keys
30. Idempotency Keys
- Receive request. Extract
Idempotency-Key. - Check key-value store (Redis, DynamoDB) for the key.
- Key not found — Process the request. Store
{key: "550e...", response: {status: 200, body: ...}, created_at: now}with a TTL (Stripe uses 24 hours). Return the response. - Key found, request still processing — Return
409 Conflictor425 Too Earlyto prevent concurrent duplicate execution. - Key found, request completed — Return the stored response verbatim. No re-processing.
- Key storage must be atomic. Use
SET key value NX EX 86400in Redis (set-if-not-exists with TTL). This prevents race conditions where two concurrent retries both see “key not found.” - Store the full response, not just “processed.” The client expects the same response on retry, including the created resource ID.
- TTL decision: 24 hours covers any reasonable retry window. Shorter TTLs risk the client retrying after key expiry and creating a duplicate. Longer TTLs waste storage.
- Key scope: The key should be scoped to the API key or user. Two different users sending the same key should be treated as different requests.
- What happens if the request partially succeeds? Example: payment is charged but the response never reaches the client. On retry, the server sees the key with a stored response (success) and returns it. The client gets the confirmation it missed. Without idempotency keys, the client would retry and the payment would be charged twice.
SETNX (or SET ... NX) ensures only one instance processes a given key. All instances read/write the same Redis cluster. For higher availability, use DynamoDB with conditional writes (attribute_not_exists(idempotency_key)) or PostgreSQL with INSERT ... ON CONFLICT DO NOTHING. The choice depends on your existing infrastructure and latency requirements.4. Security & Scaling
31. OAuth 2.0 Flows
31. OAuth 2.0 Flows
-
Authorization Code Flow (Server-side apps):
- The most secure flow for applications with a backend server.
- User is redirected to the auth server, logs in, approves, and the auth server redirects back with a short-lived
code. - The backend server exchanges the
code+client_secretfor tokens (server-to-server, the secret never touches the browser). - Use for: traditional web apps (Rails, Django, Express).
-
Authorization Code + PKCE (Mobile/SPA):
- Same as above but the
client_secretis replaced by a dynamically generatedcode_verifier+code_challenge(SHA-256 hash). - PKCE (Proof Key for Code Exchange) prevents authorization code interception attacks.
- The OAuth 2.1 draft recommends PKCE for ALL clients, even server-side.
- Use for: mobile apps, single-page applications, desktop apps — any public client that cannot safely store a secret.
- Same as above but the
-
Client Credentials (Machine-to-Machine):
- No user involved. The application authenticates itself with
client_id+client_secret. - Returns an access token directly (no user interaction).
- Use for: service-to-service communication, cron jobs, internal APIs.
- No user involved. The application authenticates itself with
-
Device Authorization (Smart TVs, IoT):
- Device displays a code. User goes to a URL on their phone, enters the code, and approves.
- Device polls the auth server until approval.
- Use for: devices without a browser or keyboard.
- Implicit — Returned tokens directly in the URL fragment. Vulnerable to token leakage via browser history and referrer headers. Replaced by Auth Code + PKCE.
- Resource Owner Password Credentials (ROPC) — Client collects username/password directly. Only for first-party apps migrating from legacy. Avoid.
- Access token — Short-lived (15 min - 1 hour). Sent with every API request.
- Refresh token — Long-lived (days to months). Used to get new access tokens without re-authenticating. Store securely (httpOnly cookie or secure storage).
32. OpenID Connect (OIDC)
32. OpenID Connect (OIDC)
- ID Token — A JWT that contains user identity claims (who they are):
-
UserInfo endpoint —
GET /userinfowith the access token returns additional user profile information. -
Standard scopes —
openid(required),profile(name, picture),email,address,phone. -
Discovery —
GET /.well-known/openid-configurationreturns the provider’s endpoints, supported scopes, signing algorithms, etc. This enables auto-configuration of OIDC clients.
- Verify the signature (using the provider’s public key from JWKS endpoint).
- Check
iss(issuer) matches the expected provider. - Check
aud(audience) matches your client ID. - Check
exp(expiration) is in the future. - Check
iat(issued at) is not too far in the past.
aud claim is your client ID, not the API. The API would need to accept your client ID as a valid audience, which breaks the security model. (2) ID Tokens may contain sensitive PII (email, name) that should not be transmitted to every API. (3) The token lifetimes and validation rules are different. Always use access tokens for API calls.33. API Rate Limiting
33. API Rate Limiting
- Fixed Window — Count requests per fixed time window (e.g., 100 requests per minute, window resets at :00, :01, :02…). Simple but has a burst problem: a client can send 100 requests at 0:59 and 100 at 1:00 — 200 requests in 2 seconds.
- Sliding Window Log — Track the timestamp of each request. Count requests in the last 60 seconds from now. Smooth but expensive (stores every timestamp).
-
Sliding Window Counter — Hybrid: combine the current fixed window count with a weighted previous window count.
rate = prev_count * (1 - elapsed/window_size) + current_count. Good balance of accuracy and memory. - Token Bucket — A bucket holds up to N tokens. Tokens are added at a fixed rate (e.g., 10/sec). Each request consumes a token. If the bucket is empty, the request is rejected. Allows controlled bursts (bucket fills up during idle periods). Used by AWS API Gateway and most CDNs.
- Leaky Bucket — Requests enter a queue (bucket) processed at a fixed rate. Excess requests overflow (rejected). Smooths bursty traffic into a constant rate. Used in network traffic shaping.
- Centralized counter (Redis) —
INCR+EXPIREon a key likerate:user_123:2024-01-15T10:30. All servers check Redis. Adds 1-2ms latency per request. Use Lua scripts for atomic check-and-increment. - Local counter with sync — Each server maintains its own counter, periodically syncing with a central store. Allows brief over-limit bursts between syncs.
- Cell-based / Approximate — Each server gets
total_limit / num_servers. Simple but brittle when servers have uneven traffic.
34. Backend for Frontend (BFF)
34. Backend for Frontend (BFF)
- Response shaping — Mobile BFF returns low-res image URLs. Web BFF returns high-res.
- Aggregation — Mobile BFF combines 3 microservice calls into 1 response (saves round trips). Web BFF can afford 3 parallel calls.
- Auth translation — Mobile BFF handles token refresh transparently. Web BFF manages session cookies. Public API uses API keys.
- Caching strategy — Mobile BFF caches aggressively (users tolerate slightly stale data). Web BFF uses shorter cache TTLs.
- Error handling — Mobile BFF returns simplified error codes (offline-friendly). Web BFF returns detailed error messages.
- Single API — 1-2 client types with similar needs, small team.
- BFF — 3+ client types with different needs, separate frontend teams, mobile + web + third-party.
35. Service Mesh (Istio/Linkerd)
35. Service Mesh (Istio/Linkerd)
- mTLS — Mutual TLS between every service pair. Automatic certificate rotation. Zero-trust networking without application code changes.
- Traffic management — Retries, timeouts, circuit breaking, load balancing algorithms — all configured via YAML, not code.
- Observability — The proxy emits metrics (latency, error rates, request counts), traces (distributed tracing spans), and access logs for every request. No instrumentation code needed.
- Traffic splitting — Route 5% of traffic to v2 (canary deployment), 95% to v1. A/B testing by header value. Blue/green deployments.
- Access policy — “Service A can call Service B but not Service C.” Enforced at the mesh level.
- Data plane — The sidecar proxies (Envoy). Handles actual traffic. Runs on every pod.
- Control plane — The management layer (Istio Pilot, Citadel, Galley; or Linkerd’s control plane). Configures the proxies, manages certificates, collects telemetry.
| Aspect | Istio | Linkerd |
|---|---|---|
| Proxy | Envoy (C++, feature-rich) | linkerd2-proxy (Rust, lightweight) |
| Complexity | High (many components, steep learning curve) | Lower (simpler architecture, easier setup) |
| Resource overhead | Higher (~50-100MB per sidecar) | Lower (~10-20MB per sidecar) |
| Features | More extensive (Wasm plugins, multi-cluster) | Focused on core features (simpler but sufficient) |
| Community | Larger, CNCF graduated | Smaller, CNCF graduated |
- 20+ microservices where managing retries, mTLS, and observability per-service is unsustainable.
- Zero-trust security requirements (mTLS everywhere).
- Complex deployment strategies (canary, traffic mirroring) needed regularly.
- Polyglot environment where you cannot mandate a single language’s resilience library.
- Fewer than 10 services — the operational overhead is not justified.
- Performance-sensitive workloads where the sidecar’s ~1-2ms latency overhead per hop matters (add up across a 5-service call chain).
- Teams without Kubernetes expertise (service meshes are deeply tied to K8s).
istio_request_duration_milliseconds). (3) Using distributed tracing to see time spent in the proxy vs the application. Linkerd’s Rust-based proxy typically adds less latency than Istio’s Envoy proxy.Q: Your team is debating between implementing resilience patterns (retries, circuit breakers) in application code vs the service mesh. What do you recommend?
Use the mesh for infrastructure-level resilience (connection-level retries, TCP circuit breaking, mTLS) and application code for business-level resilience (custom fallback responses, per-endpoint timeout tuning, semantic retry conditions like “retry on 503 but not 409”). The mesh gives you baseline protection without code changes across all services. Application-level libraries (Resilience4j) give you fine-grained control where business logic matters. They are complementary, not competing.36. API Security Top 10 (OWASP)
36. API Security Top 10 (OWASP)
-
Broken Object Level Authorization (BOLA / IDOR) — The #1 API vulnerability.
GET /api/users/42/orders— can user 43 access user 42’s orders by changing the ID? Every endpoint that takes an object ID must verify the requesting user has permission to access that specific object. Not just “is the user authenticated?” but “does this user own this resource?” - Broken Authentication — Weak password policies, missing brute-force protection, tokens in URLs, missing token expiry. Fix: use established auth frameworks (OAuth2, OIDC), enforce MFA for sensitive operations, rate-limit login attempts.
-
Broken Object Property Level Authorization — The API returns fields the user should not see (e.g.,
is_admin: truein the user profile response) or accepts fields they should not set (mass assignment:POST /users {"role": "admin"}). Fix: explicit allowlists for input and output fields. Never blindly serialize entire database rows. -
Unrestricted Resource Consumption — No rate limiting, no pagination limits, no payload size limits. An attacker sends
GET /users?limit=1000000and your server OOMs. Fix: enforce max page sizes, rate limits, request body size limits, and timeout all operations. -
Broken Function Level Authorization — Regular user accessing admin endpoints (
DELETE /admin/users/42). Fix: role-based access control enforced at every endpoint, not just the UI. - Unrestricted Access to Sensitive Business Flows — Automated attacks on business logic: bot buying limited stock, automated account creation for spam. Fix: CAPTCHA, rate limiting on business-critical flows, device fingerprinting.
-
Server Side Request Forgery (SSRF) — API accepts a URL parameter and the server fetches it:
POST /fetch {"url": "http://169.254.169.254/metadata"}— accesses the cloud metadata endpoint, potentially leaking IAM credentials. Fix: URL allowlists, block private IP ranges, use a sandboxed fetch service. - Security Misconfiguration — Missing CORS restrictions, verbose error messages in production (stack traces), default credentials, unnecessary HTTP methods enabled.
- Improper Inventory Management — Old API versions running unpatched, undocumented debug endpoints still accessible in production.
- Unsafe Consumption of APIs — Your API calls a third-party API and blindly trusts the response without validation. The third party gets compromised, and your system starts processing malicious data.
WHERE tenant_id = :current_tenant AND id = :requested_id. Never rely solely on the object ID. (2) Create a middleware/interceptor that extracts the tenant from the auth token and injects it into every query. (3) Use database row-level security (PostgreSQL RLS) as a defense-in-depth layer. (4) Write integration tests that specifically test cross-tenant access: “user in tenant A requests resource in tenant B, expect 404.” (5) Code review checklist: “Does this endpoint verify object ownership?” Stripe’s API famously caught a BOLA vulnerability in 2019 through automated testing.Q: A security audit found that your API returns is_admin and password_hash in user responses. How do you fix this systematically?
(1) Never serialize database models directly to API responses. Create explicit response DTOs (Data Transfer Objects) with only the fields clients need. (2) Use a serialization allowlist, not a blocklist. An allowlist (include: [:id, :name, :email]) is safer than a blocklist (exclude: [:password_hash]) because new sensitive fields are hidden by default. (3) Add automated tests that snapshot API responses and fail if unexpected fields appear. (4) For GraphQL, use field-level authorization that checks permissions before resolving each field.37. JWT Vulnerabilities
37. JWT Vulnerabilities
-
"alg": "none"attack — The JWT header specifies the signing algorithm. If the server accepts"alg": "none", an attacker can forge tokens by removing the signature entirely. Fix: always validate the algorithm on the server side. Maintain an explicit allowlist of acceptable algorithms (e.g., onlyRS256orES256). Never rely on the token’s self-declared algorithm. -
Algorithm confusion (RS256 to HS256) — Server uses RSA (asymmetric: private key signs, public key verifies). Attacker changes the header to
HS256(symmetric: same key signs and verifies). If the server blindly trusts the algorithm, it uses the RSA public key (which is publicly known) as the HMAC secret to verify. The attacker signs the forged token with the public key, and verification passes. Fix: the server must enforce the expected algorithm, not read it from the token. -
Weak signing secrets — HMAC secrets like
"secret"or"password123"can be brute-forced. Tools likejwt-crackerandhashcatcan crack weak secrets in minutes. Fix: use a minimum 256-bit random secret for HS256. Better: use asymmetric keys (RS256, ES256) where the private key is never exposed. -
Missing expiration / Replay attacks — Tokens without
exp(expiry) are valid forever. Stolen tokens can be replayed indefinitely. Fix: set shortexp(15 min for access tokens). Usejti(JWT ID) for one-time-use tokens. For sensitive operations, also checknbf(not before) andiat(issued at). - Token stored in localStorage — XSS attacks can read localStorage, stealing the JWT. Fix: store tokens in httpOnly cookies (inaccessible to JavaScript) or in memory (lost on refresh but most secure). Use the BFF pattern for SPAs.
-
No revocation mechanism — JWTs are stateless — once issued, they are valid until expiry. You cannot “log out” a JWT. Fix: (a) Short expiry + refresh tokens. (b) Token blocklist in Redis (checked on every request — adds statefulness). (c) Token versioning: store a
token_versionper user in the database; increment on logout; reject tokens with old version. - Overly permissive claims — Tokens containing sensitive data (SSN, credit card) in the payload. JWT payloads are Base64-encoded, not encrypted (anyone can decode them). Fix: keep claims minimal (user ID, roles, expiry). Encrypt the JWT if sensitive data must be included (JWE — JSON Web Encryption).
token_version (integer) per user in the database. Include it as a claim in the JWT. On password change, increment token_version. On every request, compare the token’s version with the database. Mismatch means the token is invalidated. (2) Alternative: add the token’s jti to a Redis blocklist on password change. Set the blocklist entry’s TTL to match the token’s remaining lifetime. (3) If using refresh tokens: revoke all refresh tokens for the user. Existing access tokens remain valid until they expire (keep access token lifetime short — 15 minutes).Q: Should you use symmetric (HS256) or asymmetric (RS256/ES256) JWT signing? What are the trade-offs?
Asymmetric (RS256/ES256) for almost all production use cases. The private key stays on the auth server; the public key is distributed to all services (via JWKS endpoint). Services can verify tokens without having the private key. This means a compromised microservice cannot forge tokens. Symmetric (HS256) means every service that needs to verify a token must have the shared secret — if any one is compromised, the attacker can forge tokens for all users. HS256 is acceptable only in single-service architectures or when the auth server is the only token verifier.38. CORS (Cross-Origin Resource Sharing)
38. CORS (Cross-Origin Resource Sharing)
* to the Access-Control-Allow-Origin header) and understand the security implications.Answer:CORS is a browser security mechanism that controls which origins (domains) can make requests to your API. It exists because of the Same-Origin Policy: by default, JavaScript running on app.example.com cannot make requests to api.different.com.How CORS works:Simple requests (GET, HEAD, POST with standard content types):- Browser sends the request with an
Origin: https://app.example.comheader. - Server responds with
Access-Control-Allow-Origin: https://app.example.com. - Browser checks the header. If it matches, JavaScript can access the response. If not, the browser blocks it.
- Browser sends an
OPTIONSrequest (preflight) with:Origin: https://app.example.comAccess-Control-Request-Method: DELETEAccess-Control-Request-Headers: Authorization, Content-Type
- Server responds with:
Access-Control-Allow-Origin: https://app.example.comAccess-Control-Allow-Methods: GET, POST, PUT, DELETEAccess-Control-Allow-Headers: Authorization, Content-TypeAccess-Control-Max-Age: 86400(cache preflight for 24 hours)
- If the preflight succeeds, the browser sends the actual request.
Access-Control-Allow-Origin— Which origins are allowed. Never use*with credentials.Access-Control-Allow-Credentials: true— Allows cookies/auth headers. When set,Allow-OriginMUST be a specific origin, not*.Access-Control-Expose-Headers— By default, JavaScript can only read a few “simple” response headers. This header exposes additional ones (e.g.,X-RateLimit-Remaining).Access-Control-Max-Age— How long the browser caches the preflight response. Set to 86400 (24h) to reduce OPTIONS request overhead.
Access-Control-Allow-Origin: *withAccess-Control-Allow-Credentials: true— browsers reject this combination. You must specify the exact origin.- Reflecting the
Originheader directly intoAccess-Control-Allow-Originwithout validation — this effectively allows any origin, defeating CORS entirely. An attacker’s site can make authenticated requests to your API. - Not handling OPTIONS preflight — returning 404 or 405 for OPTIONS breaks all non-simple requests from browsers.
Access-Control-Allow-Origin: * to fix the CORS error” without understanding the security implications, or thinking CORS applies to server-to-server communication.Follow-up questions:Q: A developer complains: “My API works in Postman but gets CORS errors in the browser.” What is happening and how do you fix it?
Postman is not a browser — it does not enforce the Same-Origin Policy or send CORS preflight requests. The browser does. The fix depends on the error: (1) If “No Access-Control-Allow-Origin header” — the server needs to return the CORS headers. Configure your framework’s CORS middleware (express cors(), Django django-cors-headers, Spring @CrossOrigin). (2) If “blocked by preflight” — the server must respond to OPTIONS requests with the correct Allow headers. (3) If the server is a third-party you do not control — use a proxy: your backend fetches from the third-party API and serves it to the frontend (bypassing CORS since the browser only talks to your origin).Q: Your API needs to support requests from https://app.example.com and https://staging.example.com. How do you configure CORS?
Maintain a server-side allowlist of permitted origins: ["https://app.example.com", "https://staging.example.com"]. On each request, read the Origin header, check it against the allowlist, and if it matches, set Access-Control-Allow-Origin to that specific origin. Include Vary: Origin in the response so CDNs and proxies cache responses per-origin, not as a single cached version. Never use regex matching on origins without extreme care — https://app.example.com.evil.com would match a naive regex for example.com.39. Horizontal Pod Autoscaling
39. Horizontal Pod Autoscaling
- CPU utilization — Scale when average CPU across pods exceeds target (e.g., 70%). Works for CPU-bound workloads (computation, image processing).
- Memory utilization — Scale when average memory exceeds target. Rarely useful as a scaling trigger because memory does not decrease when load decreases (most runtimes retain allocated memory).
- Requests per second (RPS) — Scale when RPS per pod exceeds capacity. Requires Prometheus adapter or Datadog metrics.
- Request latency (p95) — Scale when response time degrades. Catches I/O-bound overload that CPU misses.
- Queue depth — For async workers, scale based on messages in the queue (SQS queue length, Kafka consumer lag). This is the most accurate metric for workers.
- Active connections — Scale when concurrent connections per pod exceed a threshold.
- Scale up fast, scale down slow. Scale up aggressively on load spikes (50% more pods per 30 seconds). Scale down slowly (10% per minute with a 5-minute stabilization window) to avoid flapping.
- Minimum replicas — Never scale below 3 replicas for production services (1 pod update + 1 pod failure = 0 available).
- Cooldown periods — Prevent thrashing (scaling up and down repeatedly). The stabilization window considers the last N minutes of recommendations and picks the safest.
40. Distributed Tracing
40. Distributed Tracing
- Trace — The end-to-end journey of a single request. Has a unique Trace ID (e.g.,
64-bit hex: 4bf92f3577b34da6a3ce929d0e0e4736). - Span — A single operation within a trace (e.g., “user-service.getUser” or “postgres.query”). Each span records: start time, duration, status (OK/ERROR), tags (http.method, http.status_code), and logs.
- Parent-child relationship — Spans form a tree. The API gateway span is the root. It has child spans for each downstream service call. Each downstream service has child spans for its database queries, cache lookups, etc.
- The entry point (API gateway) generates a Trace ID and Span ID.
- These are passed to the next service via HTTP headers:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01(W3C Trace Context standard). - Each service extracts the trace context, creates a child span, and propagates the context to its downstream calls.
- All spans are sent (asynchronously) to a collector which assembles them into the full trace.
- W3C Trace Context (
traceparent,tracestate) — The standard. Supported by all major tracing tools. - B3 Headers (
X-B3-TraceId,X-B3-SpanId) — Zipkin’s legacy format. Still widely used. - Jaeger headers (
uber-trace-id) — Jaeger’s native format. Being migrated to W3C.
- Jaeger — Open source (Uber). CNCF graduated. Strong K8s integration.
- Zipkin — Open source (Twitter). Simpler than Jaeger.
- Datadog APM — Commercial. Combines traces with metrics and logs. Excellent correlation.
- AWS X-Ray — Managed. Integrates with Lambda and ECS.
- OpenTelemetry — The emerging standard for instrumentation. Vendor-neutral SDK that exports to any backend (Jaeger, Zipkin, Datadog, Honeycomb).
- Head-based sampling — Decide at the start whether to trace this request (e.g., 1% of requests). Simple but misses rare errors.
- Tail-based sampling — Collect all spans, then decide after the request completes whether to keep the trace. Keep all traces with errors or high latency, sample normal traces. More useful but requires buffering all spans temporarily. Jaeger and OpenTelemetry Collector support tail-based sampling.
logger.info("Processing order", extra={"trace_id": span.context.trace_id}). Configure your log aggregation (ELK, Datadog Logs) to make Trace ID a clickable link to the trace view. For metrics, use exemplars (OpenTelemetry/Prometheus feature): attach a Trace ID to specific metric data points. When a dashboard shows a latency spike, click the data point to jump to an example trace. This correlation (metrics —> trace —> logs) is the gold standard of observability. Datadog, Honeycomb, and Grafana Tempo all support this three-way correlation.5. Operations
41. Log Aggregation
41. Log Aggregation
- Emit — Services write structured logs (JSON) to stdout/stderr. Include: timestamp, service name, trace ID, severity, message, and contextual fields.
- Collect — A log shipper (Fluentd, Fluent Bit, Filebeat, Vector) runs as a DaemonSet (one per node) or sidecar. It reads stdout from containers, adds metadata (pod name, namespace, node), and forwards to the aggregator.
- Process — The aggregator (Logstash, Fluentd, Vector) parses, filters, enriches, and routes logs. Example: parse JSON, extract trace ID into a searchable field, drop debug-level logs in production, route error logs to a separate alert stream.
- Store — Elasticsearch (most common), Loki (Grafana’s lightweight alternative), Datadog Logs, Splunk, CloudWatch Logs. Elasticsearch indexes every field for fast search. Loki only indexes labels (cheaper but less flexible).
- Visualize & Alert — Kibana (with Elasticsearch), Grafana (with Loki), Datadog dashboards. Create dashboards for error rates, alert on patterns (5xx spike, specific error messages).
"Error processing order 42") require regex parsing and break on format changes. Structured logs (JSON with consistent fields) enable reliable filtering, aggregation, and alerting. Use a logging library that enforces structure (Winston, Logback, zerolog).Critical fields every log must include:- Timestamp (ISO 8601 with timezone)
- Service name
- Trace ID / Correlation ID (links logs to distributed traces)
- Log level (ERROR, WARN, INFO, DEBUG)
- Request context (user ID, request ID, endpoint)
42. Health Checks (Liveness vs Readiness)
42. Health Checks (Liveness vs Readiness)
- If the liveness probe fails, Kubernetes restarts the pod (kills and recreates the container).
- Use for: detecting deadlocks, infinite loops, memory leaks that cause hangs.
- Should check: “Can this process respond at all?” — a lightweight
/healthzendpoint that returns200 OK. - Must NOT check external dependencies. If the liveness probe checks the database and the database goes down, Kubernetes restarts ALL pods simultaneously. The pods restart, still cannot reach the database, fail liveness again, enter a restart loop. Meanwhile, your service is completely down instead of partially degraded.
- Keep it simple: respond to HTTP request, maybe check that the main thread is alive. Nothing more.
- If the readiness probe fails, the pod is removed from the Service’s endpoints (stops receiving traffic) but is NOT restarted.
- Use for: checking if external dependencies are available (database connected, cache warm, config loaded).
- Should check: database connection pool, required cache availability, background initialization completion.
- A pod that fails readiness is expected to recover on its own (e.g., when the database comes back up). Traffic routes to other healthy pods in the meantime.
- Used for slow-starting applications (JVM apps, apps that load large models).
- While the startup probe is running, liveness and readiness probes are disabled. This prevents Kubernetes from killing a pod that is just slow to start (not actually dead).
- Example: a Java app that takes 60 seconds to initialize. Without a startup probe, the liveness probe (with a 10-second timeout) would kill it repeatedly.
preStop hook with a brief sleep (5 seconds) to allow the Kubernetes endpoints controller to remove the pod from the Service before the pod actually stops. (3) Set terminationGracePeriodSeconds high enough for in-flight requests to complete. (4) Immediately fail the readiness probe on SIGTERM so the pod is removed from the endpoint list before it shuts down.43. Throttling vs Rate Limiting
43. Throttling vs Rate Limiting
- Applied per client, per API key, per tier, per endpoint.
- “Free tier users get 100 requests/minute. Pro tier gets 1,000.”
- Enforced regardless of server load. Even if your servers are at 10% CPU, a free-tier user exceeding 100/min gets
429. - Configured in advance based on business rules and capacity planning.
- Returns:
429 Too Many RequestswithRetry-Afterheader and rate limit headers.
- Applied globally based on current system health.
- “Server CPU is at 95%. Reject 50% of incoming requests to prevent a crash.”
- Kicks in dynamically when the system is under stress. Traffic that would normally be allowed is rejected.
- Based on real-time metrics: CPU, memory, queue depth, response time.
- Returns:
503 Service UnavailablewithRetry-After.
| Aspect | Rate Limiting | Throttling |
|---|---|---|
| Trigger | Client exceeds quota | System is overloaded |
| Scope | Per-client | Global or per-service |
| Configuration | Static policy | Dynamic, metric-driven |
| Purpose | Fairness, monetization | Survival, stability |
| When applied | Always (even under low load) | Only during stress |
| Response code | 429 | 503 |
- Adaptive throttling (Google SRE approach): Each backend tracks its capacity and rejection rate. When rejections exceed a threshold, clients back off proportionally. Google’s Doorman uses this for internal load management.
- Priority-based throttling: During overload, protect high-priority traffic (checkout, payments) and shed low-priority traffic (analytics, recommendations) first.
- Load shedding: The extreme form of throttling. Proactively drop requests to keep the system alive. Netflix’s concurrency limiter dynamically adjusts the number of in-flight requests per server.
503 with Retry-After for shed requests. (5) Monitor the shed rate as a key SLI — if you are shedding critical traffic, something is very wrong.Q: How would you implement global throttling across 20 API server instances?
(1) Centralized approach: all instances check a Redis counter tracking total RPS. Simple but adds latency and creates a single point of failure. (2) Decentralized approach: each instance tracks local metrics (CPU, response time, queue depth) and makes independent throttling decisions. If local CPU exceeds 80%, start rejecting. No coordination needed. (3) Hybrid: use a lightweight gossip protocol (like Consul) where instances share their load metrics. Each instance makes decisions based on both local and cluster-wide health. (4) Infrastructure-level: use Kubernetes HPA and pod disruption budgets to prevent overload, and use Envoy’s circuit breaking as the throttling mechanism.44. Semantic Versioning for APIs
44. Semantic Versioning for APIs
MAJOR.MINOR.PATCH- MAJOR (v1 to v2): Breaking changes. Clients must update.
- MINOR (v1.1 to v1.2): New features, backward-compatible. Clients do not need to update.
- PATCH (v1.1.1 to v1.1.2): Bug fixes, backward-compatible.
- Removing a field from the response.
- Renaming a field.
- Changing a field’s type (
"price": 10to"price": "10.00"). - Removing an endpoint.
- Changing the authentication mechanism.
- Making an optional field required.
- Changing the URL structure.
- Changing the meaning of a status code.
- Adding a new field to a response (clients should ignore unknown fields).
- Adding a new optional field to a request.
- Adding a new endpoint.
- Adding a new enum value (controversial — some clients break on unknown enum values).
- Adding new headers.
- Most public APIs only expose the MAJOR version:
/v1/,/v2/. Minor and patch versions are invisible to clients. - Breaking changes are expensive: every consumer must update. Good API teams ship 1-2 major versions in the API’s entire lifetime.
- Deprecation lifecycle: (1) Announce the new version. (2) Add
SunsetandDeprecationheaders to old version responses. (3) Monitor usage of the old version. (4) Give 6-12 months notice. (5) Return410 Goneafter sunset.
userName to user_name (inconsistent casing fix). Is this a breaking change? How do you handle it?
Yes, it is a breaking change. Clients parsing userName will break when it becomes user_name. Handle with a transition period: (1) Return BOTH fields simultaneously: {"userName": "Alice", "user_name": "Alice"}. (2) Mark userName as deprecated in documentation. (3) After all consumers have migrated (track via analytics or consumer surveys), remove userName in a new major version. GitHub did exactly this during their API standardization — they returned both field names for months.Q: A new enum value is added to a response field. A client’s switch statement hits the default case and crashes. Whose fault is it and how do you prevent this?
Both sides share responsibility. The API should document that new enum values may be added (making it a non-breaking change by contract). The client should handle unknown enum values gracefully (default case should log and continue, not crash). To prevent: (1) Document in the API contract that enums are “open” — new values can be added. (2) Client SDKs should generate enum types with an UNKNOWN value. (3) Consider using strings instead of enums if the set of values is expected to grow frequently.45. API Documentation (OpenAPI/Swagger)
45. API Documentation (OpenAPI/Swagger)
- Single source of truth — The spec defines what the API does. Code, documentation, and client SDKs are all generated from it.
- Interactive UI — Swagger UI and Redoc render the spec into a browsable, testable documentation site. Developers can try API calls directly from the docs.
- Client SDK generation —
openapi-generatorproduces typed client libraries in 40+ languages from the spec. No hand-written HTTP clients. - Server stub generation — Generate server boilerplate in Go, Java, Python, etc. Ensure the implementation matches the spec.
- Contract testing — Validate that the running API matches the spec. Tools like Dredd and Schemathesis test every endpoint against the spec.
- Gateway configuration — Import the spec into API gateways (Kong, AWS API Gateway) to auto-configure routing and validation.
- Design-first: Write the OpenAPI spec before writing code. Review the API contract with consumers (frontend team, mobile team). Once agreed, generate server stubs and implement. Best for public APIs and APIs with multiple consumers.
- Code-first: Write the code, annotate with decorators/annotations, and auto-generate the spec. Best for internal APIs and rapid prototyping. Tools: Springdoc (Java), FastAPI (Python — auto-generates from type hints), tsoa (TypeScript).
- Descriptive endpoint summaries and detailed descriptions.
- Request/response examples for every endpoint (not just schemas).
- Error response schemas with all possible error codes.
- Authentication requirements per endpoint.
- Deprecation notices for old endpoints/fields.
- Rate limit documentation.
- Swagger UI / Redoc — Render the spec as interactive docs.
- Stoplight / Bump.sh — API design platforms with visual editors.
- Postman — Import OpenAPI specs to auto-generate API collections.
- Spectral — Lint your OpenAPI spec for consistency (naming conventions, missing descriptions).
46. Handling Partial Failures
46. Handling Partial Failures
- Partial response with degradation indicators:
_meta section tells the client which parts are degraded.- GraphQL’s native error handling:
200 OK with both data and errors. The client can render what is available and show a fallback for the failed section.- Fallback responses: When Recommendations Service is down, return cached recommendations (stale but better than nothing), popular items (generic fallback), or an empty list. Netflix’s API returns generic “Top 10” recommendations when personalization fails — users see content, not errors.
- HTTP 207 Multi-Status (WebDAV): Used when a batch request has mixed results:
- API Gateway fans out requests to 5 services in parallel.
- Set a timeout for each (e.g., 500ms).
- After all responses return (or timeout), assemble the composite response.
- Include what succeeded, mark what failed, apply fallbacks for failed components.
- Return
200 OKwith the partial response (not500— the request partially succeeded).
500 Internal Server Error when one of five downstream services fails, or not considering graceful degradation at all.Follow-up questions:Q: The product team says “if payment service is down, the checkout page should not render at all.” But the recommendations service going down should not affect anything. How do you implement this?
Classify downstream dependencies as critical (payment, inventory, user auth — page cannot function without them) and non-critical (recommendations, analytics, personalization — page degrades gracefully). For critical dependencies: fail the request if they fail (return 503). For non-critical dependencies: use circuit breakers with fallbacks. Implement this as a middleware or decorator pattern: @critical("payment-service") vs @degradable("recommendations-service", fallback=default_recs). This classification should be documented and reviewed with the product team.Q: Your API aggregates data from 3 services. Two respond in 50ms, one takes 3 seconds. How do you handle this?
(1) Set a per-service timeout (e.g., 500ms). After 500ms, proceed without the slow service’s data. (2) Return the partial response immediately with the slow service’s data marked as degraded. (3) If the slow service is non-critical, apply a fallback. (4) If the slow service is critical, consider: can you return a loading state and let the client poll? Or use SSE to push the remaining data when it arrives? (5) Investigate the slow service — 3-second responses suggest a performance problem that should be fixed, not just tolerated.47. Sidecar Pattern
47. Sidecar Pattern
- Service mesh proxy (Envoy) — Handles mTLS, retries, circuit breaking, load balancing, observability. The application makes plain HTTP calls to localhost; the sidecar handles everything else. This is how Istio and Linkerd work.
- Log collection (Fluent Bit / Fluentd) — Reads log files from a shared volume and ships them to the centralized logging system. The application writes to a file; the sidecar handles collection, parsing, and shipping.
- Configuration management (Vault Agent) — Fetches secrets from HashiCorp Vault, writes them to a shared volume or injects them as environment variables. The application reads secrets from a file; the sidecar handles auth and renewal.
- Metrics collection (Prometheus exporter) — Exposes application metrics in Prometheus format. The application writes metrics to a local endpoint; the sidecar reformats and exposes them.
-
Database proxy (Cloud SQL Proxy, PgBouncer) — The application connects to
localhost:5432; the sidecar handles connection pooling, TLS, and authentication to the actual database.
- Init container — Runs ONCE before the main container starts. Use for: database migrations, config file generation, dependency checks.
- Sidecar — Runs alongside the main container for the pod’s entire lifetime. Use for: ongoing concerns (proxying, logging, metrics).
48. Twelve-Factor App
48. Twelve-Factor App
- Codebase — One codebase tracked in version control, many deploys. Do not have separate repos for staging vs production. Use branches and environment config.
-
Dependencies — Explicitly declare and isolate dependencies (
package.json,requirements.txt,go.mod). Never rely on system-level packages being pre-installed. - Config — Store configuration in environment variables, not in code. Never commit database URLs, API keys, or feature flags to the repo. Use: Kubernetes ConfigMaps/Secrets, Vault, AWS Parameter Store.
- Backing services — Treat databases, caches, queues, email services as attached resources accessed via URL/connection string. Swapping a local PostgreSQL for Amazon RDS should require only changing an environment variable, not code changes.
- Build, Release, Run — Strictly separate build (compile + package), release (combine build with config), and run (execute in the environment) stages. Use immutable Docker images as the build artifact.
- Processes — Execute the app as one or more stateless processes. Store all persistent data in backing services (database, Redis), not in local files or in-memory. Any in-memory cache is a performance optimization, not the source of truth. This enables horizontal scaling.
-
Port binding — Export services via port binding. The app is self-contained and binds to a port (e.g.,
EXPOSE 8080). No dependency on an external web server. - Concurrency — Scale out via the process model. Run multiple instances (pods) rather than one large process with many threads. Different process types for different workloads (web process, worker process, scheduler process).
- Disposability — Maximize robustness with fast startup and graceful shutdown. Processes can be started or killed at any time (Kubernetes rolling deployments, preemptible VMs). Handle SIGTERM gracefully.
- Dev/prod parity — Keep development, staging, and production as similar as possible. Use Docker to ensure identical environments. Do not use SQLite in dev and PostgreSQL in prod.
- Logs — Treat logs as event streams. Write to stdout, not to files. Let the execution environment (Docker, Kubernetes) handle log routing and aggregation.
-
Admin processes — Run admin/management tasks (database migrations, console commands) as one-off processes in the same environment as the app. Use
kubectl execor Kubernetes Jobs, not SSH.
- #3 (Config) — Hardcoded config in code. Fix: use environment variables.
- #6 (Processes) — Storing state in local filesystem. Fix: use shared backing services.
- #9 (Disposability) — Not handling SIGTERM. Fix: implement graceful shutdown in the application.
- #10 (Dev/prod parity) — “Works on my machine.” Fix: Docker compose for local development.
maxUnavailable: 0, maxSurge: 1 so old pods keep serving until new ones pass readiness probes.49. Canonical Data Model
49. Canonical Data Model
- Service A:
{ "firstName": "Alice", "lastName": "Smith" } - Service B:
{ "name": "Alice Smith" } - Service C:
{ "user_name": "asmith", "full_name": "Smith, Alice" }
- Canonical:
{ "given_name": "Alice", "family_name": "Smith", "display_name": "Alice Smith" } - Each service has 2 mappings: to-canonical and from-canonical. Total: 10 (5*2) mappings.
- Enterprise integration — Many systems (ERP, CRM, billing, shipping) need to exchange customer/order data. The CDM is the “lingua franca.”
- Event-driven architectures — Events on the message bus use the canonical schema. Each service translates at the boundary.
- Industry standards — Healthcare (HL7 FHIR), finance (FIX protocol), e-commerce (GS1) provide canonical models.
- Small teams with 3-5 services where direct communication is manageable.
- When services are loosely coupled and rarely exchange data.
- Over-centralized CDM — If the canonical model is maintained by one team and every change requires their approval, it becomes a bottleneck. Use a federated ownership model where each domain owns its portion of the canonical model.
- CDM as a “god object” — The canonical model grows to include every field from every service, becoming a massive, unwieldy schema. Keep it focused on shared concepts. Service-specific fields stay internal.
- Forcing CDM on internal service APIs — The CDM is for inter-service events and integration, not for every API call. Internal APIs can use domain-specific models.
50. Contract Testing
50. Contract Testing
-
Consumer writes a contract: The consumer (e.g., frontend or another service) defines what it expects from the provider:
- “When I send
GET /users/42, I expect a response with{"id": 42, "name": "string", "email": "string"}.” - This is saved as a “pact file” (JSON contract).
- “When I send
- Provider verifies the contract: The provider runs the pact file against its actual API. For each consumer expectation, it sends the described request and checks if the response matches.
- If verification fails: The provider knows its change will break a consumer. The change is blocked.
| Aspect | Contract Testing | E2E Testing |
|---|---|---|
| Speed | Seconds (unit test speed) | Minutes to hours |
| Reliability | Highly reliable (no flaky infra) | Flaky (network, data, timing) |
| Scope | One interaction at a time | Full system |
| Maintenance | Low (auto-generated from consumer) | High (shared test env, test data) |
| Feedback speed | Immediate in CI | Slow (requires full deployment) |
| Environment needs | None (mocked/stubbed) | Full staging environment |
- Consumer generates pact files, publishes to Pact Broker.
- Provider CI pulls consumer pacts, runs verification.
- Pact Broker tracks compatibility matrix: which provider versions are compatible with which consumer versions.
can-i-deploytool checks the matrix before deployment: “Is this provider version compatible with all consumers currently in production?”
- Provider removes a field consumers depend on.
- Provider changes a field type (number to string).
- Provider changes the URL structure.
- Provider changes the auth mechanism.
- Business logic bugs.
- Performance issues.
- Full workflow correctness.
userName but the current provider returns user_name.” The provider developer sees this before merging, adds backward compatibility (return both fields), and the pact passes.Q: Your team resists contract testing because “it’s too much overhead.” How do you make the case?
(1) Calculate the cost of the last integration-related outage. At most companies, a single production integration failure costs more in engineering time than setting up contract testing for all services. (2) Start small: add contract testing to the one integration that breaks most frequently. Show the ROI. (3) Use auto-generated contracts: tools like Spring Cloud Contract can generate contracts from actual request/response pairs captured during E2E tests. (4) The real overhead is maintaining shared staging environments for E2E tests — contract testing eliminates that entirely.6. Advanced Topics (Bonus Questions)
51. API Design: Resource Modeling and URL Structure
51. API Design: Resource Modeling and URL Structure
- Nouns, not verbs:
/users/42/ordersnot/getOrdersForUser?userId=42. Resources are nouns. HTTP methods provide the verbs. - Plural collection names:
/users,/orders,/products. Not/user/42(inconsistent when the collection is also/user). - Hierarchical relationships:
/users/42/orders/7— Order 7 belongs to User 42. Maximum 2-3 levels deep. Deeper nesting becomes unwieldy. - Use query parameters for filtering:
/orders?status=shipped&created_after=2024-01-01, not/orders/shipped/after/2024-01-01. - Consistent naming convention: lowercase with hyphens (
/user-profiles) or lowercase without (/userprofiles). Pick one and enforce it.
-
Actions that do not map to CRUD:
- “Cancel an order” is not a DELETE (the order still exists, it is in a cancelled state).
- Options:
POST /orders/42/cancel(sub-resource action),PATCH /orders/42 {"status": "cancelled"}(state change), orPOST /orders/42/actions/cancel(explicit action resource). - Industry consensus: for simple state changes, use PATCH. For complex actions with side effects, use a sub-resource POST.
-
Search across multiple resource types:
GET /search?q=alice&type=users,orders— a dedicated search endpoint.- Not
/users?search=aliceand/orders?search=aliceseparately (forces the client to make multiple calls).
-
Batch operations:
POST /users/batchwith[{...}, {...}, {...}]in the body.- Return individual results per item (some may succeed, some may fail).
-
Tenant-scoped resources:
/tenants/acme/users/42(explicit tenant in URL).- Or extract tenant from auth token and scope implicitly (cleaner URLs but less explicit).
- Stripe:
/v1/customers/cus_123/payment_methods— clean hierarchy, consistent naming. - GitHub:
/repos/owner/repo/pulls/42/reviews— deep hierarchy but intuitive. - Twilio:
/2010-04-01/Accounts/AC123/Messages.json— date-versioned, explicit format suffix.
/getUser, /createOrder) or inconsistent naming patterns across endpoints.Follow-up questions:Q: A designer asks for an endpoint that “archives all orders older than 30 days.” How do you model this?
POST /orders/actions/archive with {"older_than_days": 30} in the body. This is a bulk action, not a CRUD operation on a single resource. Return 202 Accepted with a task ID since it is a long-running operation. The alternative (DELETE /orders?older_than=30d) is dangerous because a misconfigured client could accidentally delete everything. POST for destructive bulk operations is safer because it cannot be triggered by a browser prefetch or a link click.Q: Should you nest resources (/users/42/orders) or use flat URLs (/orders?user_id=42)?
Both are valid. Nested resources make the ownership relationship explicit and enable URL-based authorization (/users/42/* requires being user 42). Flat resources are simpler when the child resource has a globally unique ID (order 7 is unique regardless of user). Practical guideline: nest one level deep for strong ownership (user’s orders), use flat with query params for loose relationships (orders by status). GitHub uses nested (/repos/owner/repo/issues) and Stripe uses flat with expansion (/invoices?customer=cus_123).52. Graceful Degradation and Feature Flags in APIs
52. Graceful Degradation and Feature Flags in APIs
- Circuit breaker + fallback: When the Recommendations service trips its circuit breaker, return cached recommendations or popular items instead of an error.
-
Feature flags for API behavior:
- Enable/disable features without deploying:
if (featureFlags.isEnabled("new_search_algorithm")) { ... }. - Use feature flags to control rollout of new API versions: 1% of traffic gets the new behavior.
- Tools: LaunchDarkly, Unleash, Split.io, Flagsmith, AWS AppConfig.
- Kill switches: instant rollback of a misbehaving feature without deployment.
- Enable/disable features without deploying:
-
Read-only mode: During a database failover, switch the API to read-only mode. All write operations return
503 Service UnavailablewithRetry-After. Reads continue from replicas. -
Stale-while-revalidate: Return cached (potentially stale) data immediately while asynchronously refreshing the cache in the background. HTTP
Cache-Control: stale-while-revalidate=60enables this at the CDN/proxy level.
- Short-lived flags — Feature flags are for release management, not permanent configuration. Remove flags after rollout completes (within 2-4 weeks). Permanent flags become tech debt.
- Server-side evaluation — Evaluate feature flags on the server, not the client. The client should not know about flags.
- Targeted rollout — Roll out by: percentage of traffic, specific user IDs (dogfooding), user attributes (region, subscription tier), header value (for QA testing).
- Monitoring per flag — Track error rates and latency separately for flag-on vs flag-off cohorts. If flag-on shows higher errors, automatically disable it (automated rollback).
created_at and expected_removal_date to every flag. (2) Automated alerts when a flag exceeds its expected lifetime. (3) Quarterly “flag cleanup” sprints. (4) Use a flag management tool that tracks flag age and usage. (5) Make flag creation require an owner and a removal plan. (6) In code reviews, enforce that removing a flag is part of the feature completion definition-of-done.53. API Gateway vs Service Mesh vs Load Balancer
53. API Gateway vs Service Mesh vs Load Balancer
| Aspect | Load Balancer | API Gateway | Service Mesh |
|---|---|---|---|
| Traffic type | Any TCP/HTTP traffic | External-to-internal (north-south) | Internal service-to-service (east-west) |
| Layer | L4 (TCP) or L7 (HTTP) | L7 (HTTP/gRPC) | L7 (HTTP/gRPC/TCP) |
| Scope | Distributes traffic across backends | Single entry point for all external clients | All inter-service communication |
| Features | Health checks, round-robin, sticky sessions | Auth, rate limiting, routing, transformation | mTLS, retries, tracing, traffic splitting |
| Examples | AWS ALB/NLB, HAProxy, NGINX | Kong, AWS API Gateway, Apigee | Istio, Linkerd, Consul Connect |
| Deployed as | Standalone infrastructure | Standalone service/cluster | Sidecar proxy per pod |
- Load balancer only: Simple applications. One or a few services behind a single entry point. No auth, rate limiting, or complex routing needed.
- API Gateway: Multiple services, external consumers, need for centralized auth, rate limiting, and routing. This is the standard for any microservices architecture exposed to the internet.
- Service mesh: 10+ microservices where managing mTLS, observability, and resilience policies per-service becomes unsustainable. The mesh handles service-to-service concerns that the API gateway does not cover.
54. Designing APIs for Mobile Clients
54. Designing APIs for Mobile Clients
- Minimize round trips: Each HTTP request on cellular costs 100-500ms in RTT plus radio wake-up time. If a screen needs data from 3 services, the BFF should aggregate them into a single response. Reduce the number of API calls per screen to 1-2.
-
Reduce payload size:
- Return only fields the client needs (GraphQL or BFF with field selection).
- Use compression (gzip/brotli — 60-80% size reduction for JSON).
- Use efficient serialization (Protobuf if using gRPC — 30-80% smaller than JSON).
- Return image URLs at the appropriate resolution for the device screen, not full-resolution.
-
Design for offline-first:
- Return
ETagandLast-Modifiedheaders for conditional requests. On retry, sendIf-None-Match— if data has not changed, get304 Not Modified(no body, saves bandwidth). - Include
Cache-Control: max-age=300for data that can be cached client-side. - Design resources to be independently cacheable (avoid deeply nested responses that invalidate too frequently).
- Return
-
Handle unreliable connections:
- Every mutating endpoint must be idempotent (idempotency keys for POST).
- Client retries are inevitable. The server must handle them gracefully.
- Return partial responses (degraded) rather than errors when possible.
-
Pagination that works offline:
- Cursor-based pagination works better for mobile because clients can resume from where they left off after a disconnect.
- Include a sync endpoint:
GET /orders?updated_since=2024-01-15T10:30:00Zso the client can fetch only what changed since last sync.
-
Delta / Patch responses:
- Instead of returning the full resource on every request, return only changed fields since the client’s last known version.
GET /user/42?since_version=5returns{"version": 7, "changes": {"email": "new@example.com"}}.
55. Observability Strategy for Microservices
55. Observability Strategy for Microservices
-
Metrics — Numerical measurements aggregated over time. “What is happening?”
- RED method (for services): Rate (requests/sec), Errors (error rate), Duration (latency distribution).
- USE method (for resources): Utilization, Saturation, Errors.
- Tools: Prometheus + Grafana, Datadog, CloudWatch.
- Key metrics for APIs: p50/p95/p99 latency per endpoint, error rate (4xx and 5xx separately), RPS, concurrent connections, queue depth, consumer lag.
-
Logs — Structured event records. “What happened in this specific request?”
- Structured JSON logs with trace IDs (covered in Q41).
- Tools: ELK, Loki + Grafana, Datadog Logs, Splunk.
-
Traces — Distributed request flow visualization. “Where did the time go?”
- Spans across services showing latency breakdown (covered in Q40).
- Tools: Jaeger, Zipkin, Datadog APM, Honeycomb.
- A dashboard (metrics) shows a latency spike on the Order service at 3:15 PM.
- Click the spike data point — exemplar links to a specific trace.
- The trace shows the Order service waiting 4 seconds on a database call.
- Click the slow span — linked logs show a specific SQL query with a missing index.
- Total diagnosis time: 3 minutes.
- SLI (Service Level Indicator) — A measurable metric. Example: “99.5% of requests complete in < 200ms.”
- SLO (Service Level Objective) — The target for the SLI. Example: “99.9% availability over 30 days.”
- Error Budget — The acceptable amount of unreliability. If your SLO is 99.9%, you have a 0.1% error budget (~43 minutes of downtime per month). When the error budget is exhausted, freeze deployments and focus on reliability.
- Alert on symptoms (high error rate, high latency) not causes (CPU high, disk full). Symptoms directly impact users.
- Use multi-window alerts: fire only if the error rate exceeds the threshold for both the last 5 minutes AND the last 1 hour (reduces false positives).
- Page (wake someone up) only for customer-impacting issues. Everything else is a ticket.
- Dashboard without alerts — Nobody watches dashboards 24/7. If it is important enough to dashboard, it is important enough to alert on.
- Alerting on everything — Alert fatigue. Engineers start ignoring alerts. Only alert on SLO violations and customer-impacting symptoms.
- Metrics without context — “CPU is high” is useless without: which service, which pod, what is it doing, and which customers are affected.
56. Event-Driven Architecture vs Request-Response
56. Event-Driven Architecture vs Request-Response
- Service A sends a request to Service B and waits for a response.
- Direct coupling: A must know about B, and A blocks until B responds.
- Simple, predictable, easy to debug (single request/response flow).
- Fragile: if B is down, A fails.
- Service A publishes an event (“OrderCreated”) to a message broker (Kafka, RabbitMQ).
- Services B, C, D independently subscribe and react.
- Loose coupling: A does not know (or care) who consumes the event.
- Resilient: if B is down, events queue up and are processed when B recovers.
| Scenario | Best approach |
|---|---|
| Client needs an immediate response | Request-Response |
| User-facing API call | Request-Response |
| Notifying multiple services of a change | Event-Driven |
| Operations that can be processed later | Event-Driven |
| Long-running workflows | Event-Driven (Saga) |
| Data sync between services | Event-Driven (CDC) |
| Audit logging | Event-Driven |
- Debugging — An event is published, and something goes wrong downstream. Tracing requires correlation IDs across the event chain.
- Ordering — Events may arrive out of order. Design consumers to be order-independent or use Kafka’s partition ordering guarantees.
- Schema evolution — Events are contracts. Changing an event’s schema affects all consumers. Use a Schema Registry.
- Eventual consistency — After publishing an event, the system is temporarily inconsistent. Clients may not see changes immediately.
- Zombie events — Consumers that are never removed continue processing events, wasting resources.
GET /orders/42 but publishes an OrderCreated event for downstream processing (payment, inventory, notifications).Red flag answer: “Everything should be event-driven for loose coupling” without acknowledging the debugging, consistency, and operational complexity. Or not knowing when synchronous communication is the better choice.Follow-up questions:Q: Your team publishes 50 different event types. Documentation is scattered and consumers do not know what events are available. How do you fix this?
(1) Create an Event Catalog (AsyncAPI spec is the standard — like OpenAPI but for async APIs). Document every event type with schema, example payload, producer, and expected consumers. (2) Use a Schema Registry (Confluent, AWS Glue) that enforces schema compatibility and serves as a discovery mechanism. (3) Build an internal “event storefront” — a UI where teams can browse available events, subscribe, and see example payloads. (4) Require every new event type to be registered in the catalog before it can be published (CI check).Q: Service A publishes an event, Service B consumes it and publishes another event, Service C consumes that and calls Service D synchronously. Service D fails. How do you debug this?
This is why correlation IDs (trace IDs) are non-negotiable in event-driven architectures. (1) Every event must carry a correlation ID originating from the initial trigger. (2) When Service B publishes its event, it includes the same correlation ID. (3) When Service C makes a synchronous call, it passes the correlation ID in a header. (4) Search your centralized logging by correlation ID to see the full chain: A’s event, B’s processing, B’s event, C’s processing, C’s call to D, D’s failure. (5) Distributed tracing (OpenTelemetry) supports async span propagation — the trace links all of these together into one visual timeline.