Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Security
Security in microservices is more complex than monoliths due to increased attack surface. Every service-to-service call needs authentication and authorization.- Implement JWT-based authentication
- Design authorization strategies
- Set up mutual TLS (mTLS)
- Manage secrets securely
- Protect against common attacks
Security Challenges in Microservices
In a monolith, you had one front door to lock. In microservices, every service is its own building with its own door, and they all talk to each other constantly. That means the attack surface multiplies with each new service you deploy. Zero-trust architecture flips the old “trust the internal network” model on its head: assume the network is hostile, authenticate every request, and authorize every action — even between your own services. The tradeoff is operational complexity: every service now needs identity, every call needs verification, and every secret needs rotation.Authentication Strategies
JWT-Based Authentication
JWTs (JSON Web Tokens) solve a core distributed-systems problem: how do services verify a user’s identity without every service calling back to a central auth service on every request? By cryptographically signing a token once at login time, any downstream service can verify the signature locally using a shared secret or public key. This prevents the “auth service becomes a bottleneck and single point of failure” attack pattern, and enables stateless authentication that scales horizontally. The tradeoff is revocation — because tokens are self-contained, you cannot instantly invalidate them without a blacklist (which reintroduces the centralized check you were trying to avoid). In zero-trust terms, JWTs carry identity claims across trust boundaries so each service can independently verify “who is this, and what are they allowed to do?”JWT Service Implementation
The auth service is the gatekeeper that issues signed identity tokens after verifying credentials. Password hashing with bcrypt (or argon2) prevents an attacker who dumps your database from getting plaintext passwords — bcrypt’s adaptive cost factor makes brute-force attacks computationally expensive even with modern GPUs. Using short-lived access tokens (15 minutes) paired with long-lived refresh tokens limits the damage window if a token is stolen: an attacker has minutes, not days, to exploit it. Storing the refresh token server-side gives you a revocation point (one of the few places stateful auth is worth the complexity). In zero-trust, this service is the identity provider — it establishes “who” before any service decides “what they can do.”- Node.js
- Python
JWT Middleware
The middleware is the first line of defense at every service boundary — it validates that incoming requests carry a legitimate token before any business logic runs. This prevents unauthenticated traffic from reaching your handlers, protecting against direct endpoint abuse, token forgery, and expired-token replay. In zero-trust, this is where “never trust, always verify” becomes concrete: even if the request came from inside the cluster, the middleware re-validates the signature, issuer, audience, and expiration. The tradeoff is latency — every request pays a cryptographic verification cost — but with modern HMAC or RSA, this is microseconds and well worth the security guarantee.- Node.js
- Python
Authorization Patterns
Role-Based Access Control (RBAC)
Authentication answers “who are you?”; authorization answers “what can you do?”. RBAC is the most common authorization model because it mirrors how organizations think: users have roles (admin, customer, support), and roles have permissions. This prevents privilege escalation attacks by enforcing the principle of least privilege — a customer cannot suddenly access admin-only endpoints just because they know the URL. In zero-trust, RBAC is typically enforced at every service boundary, not just the gateway, so that even if an attacker bypasses the front door they cannot call sensitive internal endpoints. The tradeoff is granularity: RBAC works well when permissions cleanly map to roles, but gets clumsy when individual users need one-off exceptions (which is where ABAC takes over).- Node.js
- Python
Permission-Based Authorization
Permissions are more granular than roles — instead of “is this user an admin?” you ask “does this user have theorders:delete permission?”. This defends against privilege creep, where admin roles accumulate every capability and become too risky to assign. By separating permissions from roles and attaching them to tokens, you enable precise access control: support can read orders but not delete them, even though both actions live on the same resource. Wildcard matching (orders:*) keeps policies maintainable without enumerating every action. In zero-trust, permission-based auth is the “least privilege in practice” layer — each service verifies not just that the caller is authenticated, but that they have the specific capability for this specific operation.
- Node.js
- Python
Attribute-Based Access Control (ABAC)
ABAC is the most expressive authorization model — decisions are made based on attributes of the user, resource, action, and environment (time, IP, device). This prevents a class of attacks that RBAC cannot catch: a user withorders:update permission modifying someone else’s order (RBAC says “yes, they have the permission”; ABAC says “no, they are not the owner”). ABAC also enables contextual policies like “no financial operations outside business hours” or “no admin actions from unfamiliar IPs”, which are critical for fraud prevention. In zero-trust, ABAC is where you encode business-level risk: the policy engine evaluates every request against a rulebook, and each new rule hardens the system against a specific threat. The complexity cost is real — policy engines must be debuggable, testable, and performant, because they run on every request.
- Node.js
- Python
Service-to-Service Authentication
API Keys for Internal Services
When services call each other, you cannot rely on user JWTs — many service-to-service calls happen in background jobs or system flows where no user is present. HMAC-signed requests solve this by giving each service a shared secret and requiring it to sign every request with a timestamp. This defends against two critical attacks: impersonation (an attacker cannot forge requests without the secret) and replay (the timestamp prevents replaying captured requests after five minutes). In zero-trust, this is “machine identity” — services prove who they are just as users do. Compared to mTLS, HMAC is easier to deploy (no certificate infrastructure) but weaker (the secret is only as safe as your secrets manager, and there is no transport-layer mutual authentication).Scenario: During a security audit you discover the fraud-detection service talks directly to the payments database instead of going through the payment service API. The fraud team says it is for performance. What do you do?
Scenario: During a security audit you discover the fraud-detection service talks directly to the payments database instead of going through the payment service API. The fraud team says it is for performance. What do you do?
- Do not flip the table on day one. This pattern exists because it shipped and the fraud team has real performance constraints. Punishing a team for shortcuts they had to take closes the door on future cooperation. Frame it as a shared architectural fix, not a blame assignment.
- Quantify the actual risk. The payments DB has the canonical card-on-file data and transaction history. A compromise of fraud-detection — which has attacker-facing inputs (webhooks, ML model scoring, third-party signals) — now becomes a direct breach of payment data. This is a bounded-context violation and also likely a PCI-DSS scope expansion, which has real compliance and audit implications.
- Immediate mitigation (this week). Lock down the credentials: the fraud service gets a read-only DB user scoped to only the columns/tables it actually reads. Remove write access. Add row-level security if the DB supports it. This shrinks blast radius without breaking the fraud team.
- Short-term fix (this quarter). Work with the payment service owners to expose the exact data the fraud team needs as a read-only internal API — probably a gRPC endpoint that returns a narrow DTO. Benchmark it against the direct-DB path; if the fraud team’s concern was latency, a well-designed gRPC call + Redis cache is usually within 2-5 ms of a direct query and well inside their budget.
- Long-term (quarterly). Move to event-driven: payment service emits
PaymentProcessed,ChargebackFlaggedevents to Kafka; fraud service consumes and maintains its own read model. Now there is no synchronous coupling at all, and each service owns its data. - Prevent recurrence. Add a CI check / network policy that denies outbound connections from services to databases they do not own (easy to enforce with Kubernetes NetworkPolicy or service mesh authorization). “Each service owns its DB” becomes a structurally enforced invariant, not a nice-to-have.
- “What if the fraud team pushes back hard on latency?” Instrument both paths end-to-end. Usually the difference is 1-3 ms of network hop plus JSON/gRPC serialization. If that genuinely matters, use gRPC + connection pooling, or push the fraud check into the payment service itself via a sidecar. Almost never is direct DB access the right answer — it is the path of least design effort, not least latency.
- “How does this interact with PCI-DSS?” Direct DB access likely puts fraud-detection fully in PCI scope (it touches cardholder data). Going through a narrow API keeps it in scope only for the narrow DTO it receives — shrinking audit surface and cost significantly.
- “What is your enforcement mechanism so someone does not re-add this in 6 months?” NetworkPolicy in Kubernetes + IAM-at-the-database (services authenticate to RDS / Spanner with their own IAM role), plus a
service_dependencies.yamlthat is code-reviewed on every PR. Three layers: platform, identity, review.
- “Tell them to fix it by next sprint or block the deploy.” Adversarial; does not solve the underlying design problem and breeds workarounds.
- “Just rotate the credential.” Does nothing — the architecture is still cross-boundary DB access, just with a fresh password.
- Monzo Engineering: “How our network policies evolved”
- Sam Newman, Building Microservices (2nd ed., 2021), Chapter 4 — “Integration Styles” and the “shared database” anti-pattern
- Google Cloud: “Zero Trust data protection”
Scenario: You inherited a system where every service reads its JWT signing secret from an environment variable baked into the container image. How do you migrate to safe secret management without downtime?
Scenario: You inherited a system where every service reads its JWT signing secret from an environment variable baked into the container image. How do you migrate to safe secret management without downtime?
- Name the concrete threats. Env vars in images show up in:
docker inspect,kubectl describe pod, core dumps, crash reports, logs if someone doesconsole.log(process.env), CI build logs, registry layer dumps. The secret leaks in a dozen ways and rotation requires rebuilding and redeploying every image. - Introduce a secrets manager without breaking anything. Deploy Vault / AWS Secrets Manager / GCP Secret Manager. Give every service a new Kubernetes ServiceAccount bound to a Vault role via the Kubernetes auth method. The service authenticates to Vault with its pod identity (no bootstrap credential problem).
- Dual-read phase (week 1-2). Update each service to read the secret from Vault if available, fall back to env var if not. Deploy this version everywhere. Nothing has changed operationally yet.
- Flip the switch (week 3). Populate Vault with the current secret. Services now read from Vault. Env var stays as fallback in case of Vault outage.
- Rotation test (week 4). Rotate the secret in Vault with overlap (new
kidadded, old kept for 1 hour). Services pick up the new one within their cache TTL. Validate zero-downtime rotation works end-to-end. - Remove env var (week 5+). Rebuild images without the env var. Now the secret lives only in Vault with audit logs, TTL, and rotation — and the image is safe to leak to a public registry.
- Prevent regression. Add a CI check that fails the build if any env var matches a secret-like name pattern (
*_SECRET,*_KEY,*_PASSWORD,*_TOKEN).
- “What if Vault is down? Is your service dead?” In-memory secret cache with a TTL longer than typical Vault outages (e.g., 15-30 min cache, 5-min refresh). Vault HA with Raft cluster. For truly critical paths, secrets are fetched at startup and refreshed in the background — a Vault outage does not affect already-running pods.
- “How do you rotate a JWT signing secret without logging everyone out?” JWKS with multiple active
kids. Add the new key first, sign new tokens with the newkid, let old tokens (signed with oldkid) expire naturally. Remove the old key only after the longest token TTL has passed. Zero-downtime rotation. - “How do you ensure a rogue or compromised service cannot read another service’s secrets?” Vault policies scoped per role.
order-servicerole hasreadaccess tosecret/order-service/*only. If compromised, it cannot readsecret/payment-service/*. Audit log every read so you can detect anomalous access.
- “Put secrets in Kubernetes Secrets.” Kubernetes Secrets are base64-encoded, not encrypted at rest by default, and anyone with
get secretsRBAC can read them in plaintext. Better than env-vars-in-image but still weak — enable KMS encryption at rest and restrict RBAC, or use a real secrets manager. - “Rotate the JWT secret at 2am during the maintenance window.” Every logged-in user is logged out. Works once; the team never rotates again because it was painful. Do it right with overlapping keys.
- HashiCorp: Vault Kubernetes Auth
- OWASP: Secrets Management Cheat Sheet
- AWS: Rotating secrets with AWS Secrets Manager
Scenario: A pentester demonstrates that they can impersonate your order-service to the payment-service by stealing a Kubernetes Secret and calling the payment endpoint directly. The only auth between services is a shared API key. How do you redesign?
Scenario: A pentester demonstrates that they can impersonate your order-service to the payment-service by stealing a Kubernetes Secret and calling the payment endpoint directly. The only auth between services is a shared API key. How do you redesign?
- Node.js
- Python
Mutual TLS (mTLS)
Standard TLS proves the server’s identity to the client (“yes, this really is bank.com”), but the server trusts anyone who connects. mTLS flips this: both sides present certificates, so the server also verifies the client. In microservices, this means the Order Service can prove “I am really the Order Service” cryptographically, not just by claiming so in a header. This defeats an entire class of attacks — a compromised pod in the cluster cannot talk to other services because it lacks a valid certificate signed by your internal CA. mTLS is the transport-layer backbone of zero-trust: identity is verified on every connection, not inferred from network location. The tradeoff is heavy operational complexity — you need a certificate authority, issuance pipeline, rotation policy, and a way to handle expired certs gracefully. This is why service meshes like Istio exist: they automate the whole lifecycle.mTLS Configuration
Configuring mTLS in application code gives you fine-grained control over what certificate properties you trust (CN, SANs, issuer chain). The server demands a client certificate, validates it against the trusted CA bundle, and extracts the client’s identity from the certificate’s Common Name. This prevents “anonymous caller” attacks — every connection is tied to a specific service identity, and you can audit every byte of traffic back to an actual certificate. In zero-trust, extracting the service name from the cert CN is the foundation of authorization: “the Payment Service is calling me — should I let it?”. The tradeoff is that manual mTLS is brittle compared to a mesh: certificates expire, rotations break things, and debugging TLS handshake errors at 3am is not fun.- Node.js
- Python
Certificate Generation Script
Shell scripts using OpenSSL are the universal baseline for certificate operations — they work in any CI pipeline, any container, any cloud. The CA is the root of trust; every service certificate is signed by it, and services trust any cert with that signature. In production you would replace this with cert-manager, Vault PKI, or a service mesh CA, but understanding the raw OpenSSL flow is essential for debugging when those higher-level tools fail.Secrets Management
HashiCorp Vault Integration
Secrets management is where most teams have the worst security posture — database passwords in env vars, API keys in CI configs, encryption keys in Git history. Vault solves this by centralizing secret storage with encryption at rest, audit logging, and (critically) dynamic secret generation. Instead of a long-lived database password shared by every instance, Vault issues a unique PostgreSQL user for each service request with a short TTL. This limits blast radius: if a service is compromised, the attacker gets one service’s credentials for one hour, not the master password forever. Kubernetes auth means services authenticate using their pod identity (no bootstrap credential problem), and leases auto-renew so secrets stay fresh. In zero-trust, Vault is the “dynamic identity for machines” layer — short-lived, verifiable, revocable.- Node.js
- Python
Environment Variable Encryption
When Vault is overkill, envelope encryption of env vars is the pragmatic middle ground — secrets are encrypted at rest with a master key and decrypted only in memory. AES-256-GCM provides both confidentiality and integrity (the auth tag catches tampering), which is non-negotiable for production. This defends against config file leaks, backup exposure, and container image scans — an attacker who sees the encrypted value cannot decrypt it without the master key (which lives in a KMS or separate bootstrap). In zero-trust, this is a compensating control when a full secrets manager is not yet in place, and a defense-in-depth layer even when one is. For Kubernetes Secrets, remember they are base64-encoded, not encrypted by default — you still need envelope encryption on top.- Node.js
- Python
Security Best Practices
Input Validation
Input validation is your first line of defense against injection attacks — SQL injection, NoSQL injection, command injection, XSS, and the rest. Every input that crosses a trust boundary must be validated against a strict schema: unexpected fields dropped, sizes bounded, formats enforced. This defeats “confused deputy” attacks where an attacker tricks your service into misinterpreting data (for example, passing a giant array to exhaust memory). Parameterized queries are non-negotiable — they separate code from data at the driver level, which is the only reliable way to prevent SQL injection. In zero-trust, “validate all input, even from trusted services” is the rule: a compromised internal service can send malicious payloads too. The tradeoff is code verbosity, but Pydantic and Joi turn that into a small, declarative cost.- Node.js
- Python
Rate Limiting
Rate limiting is how you survive abuse — brute-force login attempts, credential stuffing, scraping, and straight-up DoS. Without it, an attacker with a botnet can exhaust your database connections or CPU budget faster than you can autoscale. The key insight is that different endpoints need different limits: a product listing can tolerate 1000 req/s, but a login endpoint should cap at 5 attempts per hour per IP (because each failed attempt is a password guess). Backing the limiter with Redis makes it work across multiple instances, so an attacker cannot just round-robin to dodge it. In zero-trust, rate limiting is a quota layer: even authenticated, authorized requests get budgeted, because a compromised legitimate account is still an attack vector.- Node.js
- Python
Security Headers
Security headers are the browser-side half of defense in depth. Content Security Policy (CSP) neutralizes XSS by telling the browser “only run scripts from these origins” — even if an attacker injects a<script> tag, the browser refuses to execute it. HSTS forces HTTPS so a man-in-the-middle cannot downgrade to plaintext. X-Frame-Options blocks clickjacking by preventing your site from being iframed. These controls are cheap (one middleware) but prevent entire attack categories. In zero-trust, they extend the “verify everything” principle to the client: the browser is not trusted to render your content safely without explicit instructions. The tradeoff is that strict CSP can break inline scripts and third-party widgets, so rollouts typically start in report-only mode to find violations before enforcing them.
- Node.js
- Python
Interview Questions
Q1: How do you secure service-to-service communication?
Q1: How do you secure service-to-service communication?
-
mTLS (Mutual TLS)
- Both client and server present certificates
- Strongest authentication
- Used in service meshes (Istio, Linkerd)
-
API Keys + HMAC
- Service ID + timestamp + signature
- Prevents replay attacks
- Easier to implement than mTLS
-
JWT Tokens
- Service-specific JWTs
- Can include permissions
- Short expiry for security
- Use mTLS in production
- API keys for development/simple cases
- Always encrypt in transit (TLS)
Q2: Explain JWT security considerations
Q2: Explain JWT security considerations
- Never in localStorage (XSS vulnerable)
- HttpOnly cookies (preferred)
- Memory only for SPAs
- Short expiry (15-30 minutes)
- Refresh tokens for session management
- Rotate secrets regularly
- Verify signature, issuer, audience
- Check expiry and not-before claims
- Validate against token blacklist
- Use RS256 or ES256 for production
- Never use
alg: none - Specify algorithm in verification
Q3: How do you handle secrets in microservices?
Q3: How do you handle secrets in microservices?
- Hardcode in source code
- Commit to version control
- Store in plain environment variables
-
Secrets Manager (HashiCorp Vault, AWS Secrets Manager)
- Dynamic secret generation
- Automatic rotation
- Audit logging
-
Kubernetes Secrets
- Encrypted at rest
- RBAC for access control
- Mount as files, not env vars
-
Encryption
- Encrypt secrets at rest
- Use envelope encryption
- Rotate encryption keys
-
Principle of Least Privilege
- Service-specific secrets
- Time-limited access
- Audit all access
Summary
Key Takeaways
- JWT for user authentication
- RBAC/ABAC for authorization
- mTLS for service-to-service auth
- Use secrets managers (Vault)
- Defense in depth approach
Next Steps
Interview Deep-Dive
'In your microservices system, Service A calls Service B, which calls Service C. How do you propagate and verify user identity across all three services without each service calling the auth service independently?'
'In your microservices system, Service A calls Service B, which calls Service C. How do you propagate and verify user identity across all three services without each service calling the auth service independently?'
'How do you manage secrets (database passwords, API keys, encryption keys) across 20 microservices? What are the failure modes of different approaches?'
'How do you manage secrets (database passwords, API keys, encryption keys) across 20 microservices? What are the failure modes of different approaches?'
'Explain Zero Trust Architecture in the context of microservices. Is mTLS enough, or do you need more?'
'Explain Zero Trust Architecture in the context of microservices. Is mTLS enough, or do you need more?'