Security in system design is like the layers of access control in a corporate building. Authentication (AuthN) is the ID badge check at the front door — “prove you are who you say you are.” Authorization (AuthZ) is the keycard system on individual floors — “you can enter the engineering floor but not the executive suite.” Neither replaces the other, and a system that gets either one wrong is fundamentally broken regardless of how well everything else is designed.Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Authentication vs Authorization
Token-Based Authentication
JWT (JSON Web Token)
Access Token vs Refresh Token
Implementation
OAuth 2.0 / OpenID Connect
Role-Based Access Control (RBAC)
- Python
- JavaScript
API Security
API Key vs OAuth
Rate Limiting
Common Security Vulnerabilities
Trade-off Analysis: Security Mechanism Selection
| Mechanism | Complexity | Scalability | Revocability | Best For |
|---|---|---|---|---|
| Session tokens (server-side) | Low | Low (session store is bottleneck) | Instant | Traditional web apps, small scale |
| JWT (stateless) | Medium | High (no DB lookup per request) | Difficult (must wait for expiry or use blocklist) | Microservices, mobile APIs |
| API keys | Low | High | Instant (DB check) | Server-to-server, public APIs |
| OAuth 2.0 | High | High | Moderate (token + refresh) | Third-party integrations, user delegation |
| mTLS | High | High | Hard (cert rotation) | Service mesh, zero-trust internal |
Secure Communication
mTLS (Mutual TLS)
Secrets Management
Senior Interview Questions
How would you design authentication for a multi-tenant SaaS?
How would you design authentication for a multi-tenant SaaS?
- Isolation: Tenant data must be completely isolated
- SSO: Enterprise customers want their own IdP
- Roles: Per-tenant roles (admin of tenant A ≠ admin of tenant B)
- Every DB query includes tenant_id
- Middleware validates tenant_id matches token
- Support SAML/OIDC for enterprise SSO
- Separate encryption keys per tenant (optional)
How do you handle secret rotation without downtime?
How do you handle secret rotation without downtime?
- Add new secret (old still works)
- Deploy services to accept both old and new
- Switch write to new secret
- Verify all traffic uses new secret
- Revoke old secret
How would you secure inter-service communication?
How would you secure inter-service communication?
- Network: VPC/private network (services not internet-accessible)
- Transport: mTLS (mutual authentication)
- Application: JWT tokens with service identity
- Authorization: Service-level permissions
- Automatic mTLS between all services
- Policy-based access control
- No code changes needed
- Traffic encryption by default
Design a rate limiter that's fair and abuse-resistant
Design a rate limiter that's fair and abuse-resistant
- Per-IP: Prevent anonymous abuse
- Per-User: Fair usage for authenticated users
- Per-API-Key: For API consumers
- Global: Protect overall system
- Token bucket: Allows bursts, smooths traffic
- Sliding window: More accurate than fixed windows
- Weighted limits: Premium tiers get more
- Priority queues: Critical requests bypass limits
- Fingerprinting (detect rotating IPs)
- Behavioral analysis (bot detection)
- CAPTCHA after threshold
- Progressive delays (slow down, don’t block)
Interview Questions
Explain the difference between authentication and authorization. Why is it dangerous to conflate them?
Explain the difference between authentication and authorization. Why is it dangerous to conflate them?
Why can't you revoke a JWT, and what strategies exist to work around this limitation?
Why can't you revoke a JWT, and what strategies exist to work around this limitation?
- A JWT is stateless by design — the server validates it using a cryptographic signature without looking anything up in a database. This is its primary advantage (no session store, no DB call per request, scales horizontally) and also its primary limitation. Once issued, a JWT is valid until it expires. There is no built-in mechanism to invalidate it because no server-side state tracks which tokens are active.
- If a user’s account is compromised, if they change their password, or if an admin revokes their access, any previously issued JWTs continue to work until expiry. This is a real security gap. The standard mitigations are:
- Short expiration times (15 minutes or less for access tokens). This limits the damage window. Combined with refresh tokens stored server-side, you get the scalability of JWTs with the revocability of sessions. When the user’s access is revoked, the refresh token is deleted from Redis/DB, and the access token expires naturally within 15 minutes.
- Token blocklist/denylist — maintain a set of revoked JWT IDs (
jticlaim) in Redis. On every request, check if the token’sjtiis in the blocklist. This technically works but defeats the stateless advantage of JWTs — you are now doing a Redis lookup per request, which is essentially a session store with extra steps. - Short-lived tokens + event-driven invalidation — when a security event occurs (password change, role change), publish an event that causes all services to reject tokens issued before that timestamp for that user. Each service maintains a small in-memory map of
user_id -> minimum_iat(issued-at time).
- The pragmatic answer most senior engineers give: use short-lived JWTs (15 min) as access tokens, long-lived opaque refresh tokens stored in a database with revocation capability, and accept the 15-minute window as an acceptable security trade-off for the scalability benefit. For high-security operations (changing password, transferring money), require re-authentication regardless of token validity.
- Example: A company discovered an employee was terminated but their JWT was still valid for 24 hours (they had set a long expiry). During that window, the ex-employee exfiltrated customer data. The incident led to reducing JWT TTL from 24 hours to 15 minutes and adding a Redis blocklist for immediate revocation of critical accounts, checked on every API call to sensitive endpoints.
- How does refresh token rotation work and why does it help detect token theft?
- If you implement a JWT blocklist in Redis, how do you handle the case where Redis goes down — do you reject all requests (secure but unavailable) or accept all tokens (available but insecure)?
Walk me through the OAuth 2.0 Authorization Code flow. Why is the authorization code exchanged server-to-server?
Walk me through the OAuth 2.0 Authorization Code flow. Why is the authorization code exchanged server-to-server?
How would you implement rate limiting for an API that has both free and premium tiers?
How would you implement rate limiting for an API that has both free and premium tiers?
What is the difference between RBAC and ABAC, and when would you choose each?
What is the difference between RBAC and ABAC, and when would you choose each?
- RBAC (Role-Based Access Control) assigns permissions to roles, and roles to users. A user’s access is determined entirely by which roles they hold. It is a lookup:
user -> roles -> permissions. Simple, well-understood, and covers 80% of use cases. The permission model is static — you define roles likeviewer,editor,adminwith fixed permission sets, and adding a new user just means assigning them the right role. - ABAC (Attribute-Based Access Control) evaluates a policy against attributes of the subject (user), resource (what they are accessing), action (what they want to do), and environment (time of day, IP address, device type). A policy might say: “Allow if user.department == resource.department AND user.clearance_level
>=resource.classification AND time is within business hours.” This is far more expressive than RBAC but also more complex to implement, debug, and audit. - Choose RBAC when: Your permission model is relatively simple and stable, you have well-defined organizational roles, and you need easy auditability (“who has admin access?” is a simple query). Most B2B SaaS products start with RBAC and it serves them fine for years. It is also much easier for non-technical stakeholders to understand — “editors can edit, viewers can view” is intuitive.
- Choose ABAC when: Your authorization rules depend on relationships between the user and the resource, or on contextual factors that change per request. Examples: a doctor can only view records of patients assigned to them (relationship), a financial transaction over $10,000 requires manager approval (attribute threshold), API access is restricted to office IP ranges during business hours (environmental context). Healthcare (HIPAA) and finance (SOX) often need ABAC because their compliance rules are inherently attribute-based.
- Example: A healthcare platform started with RBAC (
doctor,nurse,admin) but discovered it couldn’t express “a doctor can only see their own patients’ records.” They would have needed a unique role per doctor-patient pair, which is unmanageable. They moved authorization for record access to ABAC using OPA (Open Policy Agent) with policies like:allow if input.user.role == "doctor" AND input.resource.patient_id IN input.user.assigned_patients. Role-based access for admin functions stayed in RBAC.
- How would you implement ABAC in a microservices architecture? Would each service evaluate policies locally, or would you centralize policy evaluation?
- How do you audit and debug ABAC policies? If a user reports “I can’t access this document,” how do you trace which policy rule denied them?
How should secrets be managed in a production microservices environment?
How should secrets be managed in a production microservices environment?
- The golden rule is: secrets should never exist in code, config files, environment variables (in production), container images, or version control. They must live in a dedicated secrets management system (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) and be fetched at runtime by the application. This gives you centralized access control, audit logging, and the ability to rotate secrets without redeploying.
- Environment variables are acceptable for development and CI/CD but are a security risk in production. They show up in process listings (
/proc/PID/environ), crash dumps, error reporting tools (Sentry capturesos.environby default), container inspection (docker inspect), and orchestrator dashboards. A common real-world leak vector is a developer accidentally loggingos.environor a monitoring tool capturing environment state during an exception. - Secret rotation without downtime requires a dual-read pattern: (1) Generate a new secret and store it alongside the old one. (2) Deploy application code that accepts both the old and new secrets (e.g., validate API keys against
current_keyORprevious_key). (3) Switch all writes to use the new secret. (4) Monitor for any traffic still using the old secret. (5) After a grace period, revoke the old secret. Vault’s dynamic secrets feature automates this — it generates short-lived credentials (e.g., a database password valid for 1 hour) and handles rotation automatically. - Access control for secrets follows least privilege: each service should only be able to read the specific secrets it needs, and access should be authenticated (Vault uses token-based or Kubernetes service account auth). Every secret read should be audit-logged. In regulated environments (PCI-DSS, SOC 2), you need to demonstrate who accessed which secrets and when.
- Example: A startup stored their database password in a
docker-compose.ymlfile committed to a public GitHub repo. Within 4 hours, cryptocurrency miners had compromised the database. GitHub’s secret scanning detected it and sent an alert, but the damage was done. After the incident, they moved to AWS Secrets Manager with IAM-based access control, enabled automatic rotation every 30 days, and addedgit-secretsas a pre-commit hook to prevent future leaks.
- How would you handle the bootstrap problem — your application needs a secret to authenticate with the secrets manager, so where does that initial credential come from?
- How do dynamic secrets (Vault’s approach of generating short-lived, unique credentials per service instance) improve security compared to static shared secrets?
A penetration test found an IDOR vulnerability in your API. Explain what happened and how you'd fix it systemically.
A penetration test found an IDOR vulnerability in your API. Explain what happened and how you'd fix it systemically.
- IDOR (Insecure Direct Object Reference) means your API exposes internal identifiers (like sequential database IDs) in URLs and doesn’t verify that the authenticated user is authorized to access the referenced object. The classic example:
GET /api/invoices/1001returns invoice 1001 for user A, but user B can also callGET /api/invoices/1001and see user A’s invoice because the API only checks “is this user authenticated?” and not “does this invoice belong to this user?” - The fix has two layers — immediate and systemic. Immediately: add ownership checks at the data access layer. Every query for a user-scoped resource must include the user’s tenant/owner ID:
SELECT * FROM invoices WHERE id = :invoice_id AND user_id = :current_user_id. This “scoped query” approach means even if an attacker guesses another ID, the query returns nothing because the ownership filter excludes it. - Systemically, you need multiple defenses: (1) Use UUIDs instead of sequential integer IDs in URLs — this makes enumeration attacks impractical (guessing a valid UUID is 1 in 2^122). This is defense-in-depth, not the primary fix, because security through obscurity alone is not sufficient. (2) Centralize authorization checks in middleware or a base repository class so individual endpoints cannot accidentally skip them. A
BaseRepositorythat always injectsWHERE tenant_id = :current_tenantmakes it structurally impossible to forget. (3) Automated testing — add integration tests that attempt cross-user resource access and assert 403/404 responses. Run these in CI on every PR. (4) API gateway-level resource scoping — for microservices, the gateway can inject the authenticated user’s tenant ID into request headers and each service trusts only the gateway-injected header, not the client. - The deeper lesson: IDOR is not an edge case — it is one of the most exploited vulnerabilities in APIs according to the OWASP API Security Top 10. It happens because developers build CRUD endpoints that work correctly for the happy path (authenticated user fetching their own data) and forget the adversarial path (authenticated user fetching someone else’s data).
- Example: In 2019, a major US financial services company had an IDOR vulnerability where changing the account number in the URL of an API call returned another customer’s full bank statements. Approximately 885 million records were exposed. The root cause was that the API Gateway validated the JWT but never checked that the
account_idin the URL matched theaccount_idin the token. The fix was adding a middleware layer that compared the requested resource’s owner against the authenticated user on every request.
- How would you detect IDOR vulnerabilities proactively in a large codebase with hundreds of endpoints? What tooling or testing strategy would you use?
- Should a failed authorization check return 403 Forbidden or 404 Not Found? What are the security implications of each?
How would you secure inter-service communication in a microservices architecture?
How would you secure inter-service communication in a microservices architecture?
- The fundamental shift is moving from perimeter security (“everything inside the network is trusted”) to zero-trust (“verify every request, even internal ones”). Perimeter security fails because once an attacker breaches any single service, they can freely call every other internal service. In a zero-trust model, every service-to-service call must be authenticated and authorized.
- Layer 1: Network isolation. Services run in a VPC (Virtual Private Cloud) with private subnets. No internal service is directly accessible from the internet. Security groups and network policies restrict which services can talk to which — the notification service should not be able to call the payment service directly.
- Layer 2: mTLS (Mutual TLS). Both the client and server present certificates and verify each other’s identity. This provides encryption in transit and mutual authentication. In practice, nobody manages mTLS certificates manually — a service mesh (Istio, Linkerd, Consul Connect) handles certificate issuance, rotation, and verification automatically. Certificates are short-lived (hours, not years) and rotated transparently. The application code doesn’t even know mTLS is happening.
- Layer 3: Application-level identity and authorization. Even with mTLS confirming “this request is from Service A,” you still need to answer “is Service A allowed to call this endpoint on Service B?” This is done with service-level JWTs or service accounts with scoped permissions. A policy engine like OPA can evaluate: “Service A (identity from mTLS) is allowed to call
POST /payments(action) but notDELETE /payments(different action).” - Layer 4: Request-level context propagation. The end user’s identity must flow through the service chain so downstream services can make authorization decisions about the user, not just the calling service. This is typically done via a signed JWT in a header that each service validates and forwards. Without this, Service B knows the request came from Service A but doesn’t know which user initiated it.
- Example: A fintech company initially relied on VPC-level security (“all internal traffic is trusted”). After a vulnerability in their logging service was exploited, the attacker pivoted laterally to the payment processing service and extracted transaction data. The company adopted Istio for automatic mTLS between all services, OPA for service-to-service authorization policies, and end-user identity propagation via signed JWTs. The next breach attempt (through a similar vector) was contained — the compromised service’s mTLS identity was not authorized to call payment APIs.
- How does a service mesh like Istio handle certificate rotation for mTLS without causing connection drops or downtime?
- In a system with 50 microservices, how would you manage the service-to-service authorization policies? How do you avoid the policy matrix becoming unmanageable?