Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Senior Level: Security is non-negotiable at senior levels. You’ll be expected to design systems that are secure by default and can explain authentication, authorization, and common attack mitigations.
Security in system design is like the layers of access control in a corporate building. Authentication (AuthN) is the ID badge check at the front door — “prove you are who you say you are.” Authorization (AuthZ) is the keycard system on individual floors — “you can enter the engineering floor but not the executive suite.” Neither replaces the other, and a system that gets either one wrong is fundamentally broken regardless of how well everything else is designed.

Authentication vs Authorization

┌─────────────────────────────────────────────────────────────────┐
│              AuthN vs AuthZ                                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  AUTHENTICATION (AuthN)           AUTHORIZATION (AuthZ)        │
│  ──────────────────────           ─────────────────────        │
│  "WHO are you?"                   "WHAT can you do?"           │
│                                                                 │
│  Verifies identity                Controls access              │
│  Happens first                    Happens after AuthN          │
│                                                                 │
│  Methods:                         Methods:                     │
│  • Password                       • RBAC (Role-Based)          │
│  • OAuth/OIDC                     • ABAC (Attribute-Based)     │
│  • API Keys                       • ACL (Access Control List)  │
│  • mTLS                           • Policy (e.g., OPA)         │
│                                                                 │
│  Example Flow:                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                         │   │
│  │  User: "I'm alice@example.com, password: ****"         │   │
│  │                    │                                    │   │
│  │                    ▼                                    │   │
│  │  AuthN: Verify credentials → Identity confirmed        │   │
│  │                    │                                    │   │
│  │                    ▼                                    │   │
│  │  User: "I want to DELETE /orders/123"                  │   │
│  │                    │                                    │   │
│  │                    ▼                                    │   │
│  │  AuthZ: Does alice have 'orders:delete' permission?    │   │
│  │         Alice's role = 'viewer' → Denied               │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Token-Based Authentication

JWT (JSON Web Token)

┌─────────────────────────────────────────────────────────────────┐
│              JWT Structure                                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Header.Payload.Signature                                       │
│  ──────────────────────────                                     │
│                                                                 │
│  HEADER (Algorithm + Type):                                    │
│  {                                                              │
│    "alg": "RS256",                                             │
│    "typ": "JWT"                                                │
│  }                                                              │
│  Base64: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9                 │
│                                                                 │
│  PAYLOAD (Claims):                                             │
│  {                                                              │
│    "sub": "usr_123",        // Subject (user ID)              │
│    "iat": 1699900000,       // Issued at                      │
│    "exp": 1699903600,       // Expires (1 hour)               │
│    "iss": "auth.example.com", // Issuer                       │
│    "aud": "api.example.com",  // Audience                     │
│    "roles": ["user", "admin"] // Custom claims                │
│  }                                                              │
│  Base64: eyJzdWIiOiJ1c3JfMTIzIiwiaWF0IjoxNjk5OTAwMDAw...      │
│                                                                 │
│  SIGNATURE:                                                    │
│  RS256(                                                        │
│    base64(header) + "." + base64(payload),                    │
│    privateKey                                                  │
│  )                                                              │
│  Base64: SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c         │
│                                                                 │
│  FULL TOKEN:                                                   │
│  eyJhbGci...eyJzdWIi...SflKxw...                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Access Token vs Refresh Token

┌─────────────────────────────────────────────────────────────────┐
│              Token Types                                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ACCESS TOKEN                     REFRESH TOKEN                │
│  ────────────                     ─────────────                │
│  Short-lived (15 min - 1 hour)   Long-lived (days - weeks)    │
│  Sent with every request          Stored securely              │
│  Stateless validation             Stateful (can revoke)        │
│  Contains permissions             Just an opaque ID            │
│                                                                 │
│  TOKEN FLOW:                                                   │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                         │   │
│  │  1. Login with credentials                              │   │
│  │     Client → Auth Server                                │   │
│  │     ← Access Token (15min) + Refresh Token (7 days)    │   │
│  │                                                         │   │
│  │  2. API calls                                          │   │
│  │     Client → API Server                                │   │
│  │     Header: Authorization: Bearer <access_token>       │   │
│  │                                                         │   │
│  │  3. Access token expires                               │   │
│  │     API returns 401 Unauthorized                       │   │
│  │                                                         │   │
│  │  4. Refresh token                                      │   │
│  │     Client → Auth Server (with refresh token)          │   │
│  │     ← New Access Token + New Refresh Token             │   │
│  │                                                         │   │
│  │  5. Logout / Revoke                                    │   │
│  │     Delete refresh token from server's allowlist       │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
│  SECURITY:                                                     │
│  • Access token: Can't revoke (short expiry is the mitigation) │
│  • Refresh token: Can revoke (check against DB/Redis)         │
│  • Refresh token rotation: Issue new refresh on each use      │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Implementation

import jwt
from datetime import datetime, timedelta
from typing import Optional

class TokenService:
    def __init__(self, private_key: str, public_key: str):
        self.private_key = private_key
        self.public_key = public_key
        self.access_token_ttl = timedelta(minutes=15)
        self.refresh_token_ttl = timedelta(days=7)
    
    def create_tokens(self, user_id: str, roles: list) -> dict:
        """Create access and refresh tokens"""
        now = datetime.utcnow()
        
        # Access token with claims
        access_payload = {
            "sub": user_id,
            "roles": roles,
            "type": "access",
            "iat": now,
            "exp": now + self.access_token_ttl,
            "iss": "auth.example.com"
        }
        access_token = jwt.encode(
            access_payload, 
            self.private_key, 
            algorithm="RS256"
        )
        
        # Refresh token (minimal claims)
        refresh_payload = {
            "sub": user_id,
            "type": "refresh",
            "jti": str(uuid.uuid4()),  # Unique ID for revocation
            "iat": now,
            "exp": now + self.refresh_token_ttl
        }
        refresh_token = jwt.encode(
            refresh_payload, 
            self.private_key, 
            algorithm="RS256"
        )
        
        # Store refresh token JTI in Redis for revocation
        self.redis.setex(
            f"refresh:{refresh_payload['jti']}", 
            self.refresh_token_ttl,
            user_id
        )
        
        return {
            "access_token": access_token,
            "refresh_token": refresh_token,
            "expires_in": self.access_token_ttl.seconds
        }
    
    def verify_access_token(self, token: str) -> dict:
        """Verify and decode access token"""
        try:
            payload = jwt.decode(
                token,
                self.public_key,
                algorithms=["RS256"],
                options={"require": ["sub", "exp", "roles"]}
            )
            if payload.get("type") != "access":
                raise jwt.InvalidTokenError("Invalid token type")
            return payload
        except jwt.ExpiredSignatureError:
            raise AuthError("Token expired")
        except jwt.InvalidTokenError as e:
            raise AuthError(f"Invalid token: {e}")
    
    def refresh_tokens(self, refresh_token: str) -> dict:
        """Exchange refresh token for new tokens"""
        try:
            payload = jwt.decode(
                refresh_token,
                self.public_key,
                algorithms=["RS256"]
            )
            
            # Check if revoked
            jti = payload.get("jti")
            if not self.redis.exists(f"refresh:{jti}"):
                raise AuthError("Token revoked")
            
            # Revoke old refresh token
            self.redis.delete(f"refresh:{jti}")
            
            # Get user and issue new tokens
            user = self.get_user(payload["sub"])
            return self.create_tokens(user.id, user.roles)
            
        except jwt.InvalidTokenError:
            raise AuthError("Invalid refresh token")

OAuth 2.0 / OpenID Connect

┌─────────────────────────────────────────────────────────────────┐
│              OAuth 2.0 Authorization Code Flow                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────┐         ┌─────────────┐         ┌──────────────┐  │
│  │  User   │         │   Client    │         │ Auth Server  │  │
│  │ Browser │         │   (App)     │         │ (Google/etc) │  │
│  └────┬────┘         └──────┬──────┘         └──────┬───────┘  │
│       │                     │                       │          │
│       │ 1. Click "Login     │                       │          │
│       │    with Google"     │                       │          │
│       │────────────────────>│                       │          │
│       │                     │                       │          │
│       │ 2. Redirect to auth │                       │          │
│       │<────────────────────│                       │          │
│       │                     │                       │          │
│       │ 3. Login & Consent  │                       │          │
│       │─────────────────────────────────────────────>          │
│       │                     │                       │          │
│       │ 4. Redirect with authorization code         │          │
│       │<─────────────────────────────────────────────          │
│       │ ?code=abc123&state=xyz                      │          │
│       │                     │                       │          │
│       │────────────────────>│                       │          │
│       │                     │                       │          │
│       │                     │ 5. Exchange code      │          │
│       │                     │    for tokens         │          │
│       │                     │──────────────────────>│          │
│       │                     │                       │          │
│       │                     │ 6. Access + ID Token  │          │
│       │                     │<──────────────────────│          │
│       │                     │                       │          │
│       │ 7. Logged in!       │                       │          │
│       │<────────────────────│                       │          │
│       │                     │                       │          │
└─────────────────────────────────────────────────────────────────┘

Key Points:
• Authorization code is short-lived, exchanged server-to-server
• Access token never exposed to browser (secure)
• ID Token contains user info (OpenID Connect extension)
• State parameter prevents CSRF attacks

Role-Based Access Control (RBAC)

from enum import Enum
from functools import wraps

class Permission(Enum):
    # Resource: Action
    ORDERS_READ = "orders:read"
    ORDERS_CREATE = "orders:create"
    ORDERS_UPDATE = "orders:update"
    ORDERS_DELETE = "orders:delete"
    
    USERS_READ = "users:read"
    USERS_CREATE = "users:create"
    USERS_UPDATE = "users:update"
    USERS_DELETE = "users:delete"
    
    ADMIN_PANEL = "admin:access"

# Role definitions
ROLES = {
    "viewer": {
        Permission.ORDERS_READ,
        Permission.USERS_READ
    },
    "editor": {
        Permission.ORDERS_READ,
        Permission.ORDERS_CREATE,
        Permission.ORDERS_UPDATE,
        Permission.USERS_READ
    },
    "admin": {
        Permission.ORDERS_READ,
        Permission.ORDERS_CREATE,
        Permission.ORDERS_UPDATE,
        Permission.ORDERS_DELETE,
        Permission.USERS_READ,
        Permission.USERS_CREATE,
        Permission.USERS_UPDATE,
        Permission.USERS_DELETE,
        Permission.ADMIN_PANEL
    }
}

def has_permission(user, permission: Permission) -> bool:
    """Check if user has specific permission"""
    user_permissions = set()
    for role in user.roles:
        user_permissions.update(ROLES.get(role, set()))
    return permission in user_permissions

def require_permission(permission: Permission):
    """Decorator for authorization"""
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            request = kwargs.get('request') or args[0]
            user = request.state.user
            
            if not has_permission(user, permission):
                raise HTTPException(
                    status_code=403,
                    detail=f"Missing permission: {permission.value}"
                )
            
            return await func(*args, **kwargs)
        return wrapper
    return decorator

# Usage
@app.delete("/orders/{order_id}")
@require_permission(Permission.ORDERS_DELETE)
async def delete_order(request: Request, order_id: str):
    # Only users with 'orders:delete' permission can access
    await order_service.delete(order_id)
    return {"status": "deleted"}

API Security

API Key vs OAuth

┌─────────────────────────────────────────────────────────────────┐
│              API Key vs OAuth                                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  API KEY                          OAUTH TOKEN                  │
│  ───────                          ───────────                  │
│  Simple, long-lived               Complex, short-lived         │
│  Server-to-server                 User delegation              │
│  All-or-nothing access            Scoped permissions           │
│  Easy to leak                     Rotates automatically        │
│                                                                 │
│  USE API KEY FOR:                 USE OAUTH FOR:               │
│  • Server-to-server APIs          • User-facing apps           │
│  • Public APIs with rate limits   • Third-party integrations   │
│  • Simple integrations            • Fine-grained permissions   │
│                                                                 │
│  BEST PRACTICES:                                               │
│  • Prefix keys: sk_live_xxx (Stripe style)                    │
│  • Hash before storing                                         │
│  • Rate limit per key                                          │
│  • Log usage for audit                                         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Rate Limiting

import time
from typing import Tuple

class RateLimiter:
    """
    Sliding window rate limiter
    """
    
    def __init__(self, redis_client, requests_per_minute: int = 60):
        self.redis = redis_client
        self.limit = requests_per_minute
        self.window = 60  # seconds
    
    def is_allowed(self, key: str) -> Tuple[bool, dict]:
        """
        Check if request is allowed
        Returns (allowed, rate_limit_info)
        """
        now = time.time()
        window_start = now - self.window
        
        pipe = self.redis.pipeline()
        
        # Remove old entries
        pipe.zremrangebyscore(key, 0, window_start)
        
        # Count current requests
        pipe.zcard(key)
        
        # Add current request
        pipe.zadd(key, {str(now): now})
        
        # Set expiry
        pipe.expire(key, self.window)
        
        results = pipe.execute()
        request_count = results[1]
        
        allowed = request_count < self.limit
        
        return allowed, {
            "X-RateLimit-Limit": str(self.limit),
            "X-RateLimit-Remaining": str(max(0, self.limit - request_count - 1)),
            "X-RateLimit-Reset": str(int(now + self.window))
        }

# Middleware
@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Use API key or IP as identifier
    identifier = request.headers.get("X-API-Key") or request.client.host
    key = f"rate_limit:{identifier}"
    
    allowed, headers = rate_limiter.is_allowed(key)
    
    if not allowed:
        return JSONResponse(
            status_code=429,
            content={"error": "Rate limit exceeded"},
            headers=headers
        )
    
    response = await call_next(request)
    
    # Add rate limit headers
    for header, value in headers.items():
        response.headers[header] = value
    
    return response

Common Security Vulnerabilities

┌─────────────────────────────────────────────────────────────────┐
│              OWASP Top 10 Mitigations                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. INJECTION (SQL, NoSQL, Command)                            │
│     + Use parameterized queries / ORMs                         │
│     + Validate and sanitize input                              │
│     - Never concatenate user input into queries                │
│                                                                 │
│  2. BROKEN AUTHENTICATION                                       │
│     + Use strong password hashing (bcrypt, Argon2)             │
│     + Implement MFA                                            │
│     + Rate limit login attempts                                │
│     + Secure session management                                │
│                                                                 │
│  3. SENSITIVE DATA EXPOSURE                                     │
│     + Encrypt data at rest (AES-256)                           │
│     + Encrypt data in transit (TLS 1.3)                        │
│     + Don't log sensitive data                                 │
│     + Hash + salt passwords                                    │
│                                                                 │
│  4. XML EXTERNAL ENTITIES (XXE)                                 │
│     + Disable external entity processing                       │
│     + Use JSON instead of XML                                  │
│                                                                 │
│  5. BROKEN ACCESS CONTROL                                       │
│     + Deny by default                                          │
│     + Server-side authorization                                │
│     + Check ownership (can user X access resource Y?)          │
│                                                                 │
│  6. SECURITY MISCONFIGURATION                                   │
│     + Remove default credentials                               │
│     + Disable unnecessary features                             │
│     + Keep software updated                                    │
│     + Secure headers (CSP, HSTS, etc.)                         │
│                                                                 │
│  7. CROSS-SITE SCRIPTING (XSS)                                  │
│     + Escape output (HTML encode)                              │
│     + Content Security Policy header                           │
│     + HttpOnly cookies                                         │
│                                                                 │
│  8. INSECURE DESERIALIZATION                                    │
│     + Don't deserialize untrusted data                         │
│     + Use signed serialized objects                            │
│                                                                 │
│  9. USING COMPONENTS WITH KNOWN VULNERABILITIES                 │
│     + Keep dependencies updated                                │
│     + Scan with tools (Snyk, Dependabot)                       │
│                                                                 │
│  10. INSUFFICIENT LOGGING & MONITORING                          │
│      + Log security events                                     │
│      + Alert on suspicious activity                            │
│      + Audit trail for sensitive operations                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Trade-off Analysis: Security Mechanism Selection

MechanismComplexityScalabilityRevocabilityBest For
Session tokens (server-side)LowLow (session store is bottleneck)InstantTraditional web apps, small scale
JWT (stateless)MediumHigh (no DB lookup per request)Difficult (must wait for expiry or use blocklist)Microservices, mobile APIs
API keysLowHighInstant (DB check)Server-to-server, public APIs
OAuth 2.0HighHighModerate (token + refresh)Third-party integrations, user delegation
mTLSHighHighHard (cert rotation)Service mesh, zero-trust internal
Practical tip — a senior engineer would say: “The biggest security mistake I see in system design interviews is treating JWT as a session token. JWTs are not revocable by design — if a JWT is leaked, the attacker has valid credentials until expiry. This is why access token TTLs should be short (15 minutes max) and refresh tokens should be stored server-side with revocation capability. If your interviewer asks ‘how would you handle a compromised user?’, mentioning JWT blocklists in Redis shows you understand this trade-off.”

Secure Communication

mTLS (Mutual TLS)

┌─────────────────────────────────────────────────────────────────┐
│              mTLS for Service-to-Service                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Regular TLS: Server proves identity to client                 │
│  ┌─────────┐                   ┌─────────┐                     │
│  │ Client  │ ─── HTTPS ────────│ Server  │                     │
│  │         │  Server cert ✓    │         │                     │
│  └─────────┘                   └─────────┘                     │
│                                                                 │
│  mTLS: Both prove identity                                     │
│  ┌─────────┐                   ┌─────────┐                     │
│  │ Service │ ←── Server cert ──│ Service │                     │
│  │    A    │ ─── Client cert ──│    B    │                     │
│  └─────────┘    Both verified  └─────────┘                     │
│                                                                 │
│  USE CASE: Microservices in zero-trust network                 │
│  • Each service has its own certificate                        │
│  • Only services with valid certs can communicate              │
│  • Handled by service mesh (Istio, Linkerd)                   │
│                                                                 │
│  CERTIFICATE MANAGEMENT:                                       │
│  • Short-lived certs (hours, not years)                        │
│  • Automatic rotation                                          │
│  • Service mesh handles transparently                          │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Secrets Management

# DON'T: Hardcode secrets
API_KEY = "sk_live_abc123"  # Never do this

# DON'T: Environment variables for production
import os
API_KEY = os.environ["API_KEY"]  # OK for dev, not production

# DO: Use secrets manager
from aws_secrets_manager import get_secret

class SecretsManager:
    """
    Centralized secrets management with caching
    """
    
    def __init__(self):
        self.cache = {}
        self.cache_ttl = 300  # 5 minutes
    
    def get_secret(self, secret_name: str) -> str:
        """
        Retrieve secret from AWS Secrets Manager
        with local caching
        """
        cached = self.cache.get(secret_name)
        if cached and cached['expires'] > time.time():
            return cached['value']
        
        # Fetch from secrets manager
        client = boto3.client('secretsmanager')
        response = client.get_secret_value(SecretId=secret_name)
        
        value = response['SecretString']
        
        # Cache locally
        self.cache[secret_name] = {
            'value': value,
            'expires': time.time() + self.cache_ttl
        }
        
        return value

# Best practices:
# 1. Never log secrets
# 2. Rotate regularly
# 3. Least privilege access
# 4. Audit access to secrets
# 5. Use managed solutions (AWS Secrets Manager, Vault, etc.)

Senior Interview Questions

Key considerations:
  1. Isolation: Tenant data must be completely isolated
  2. SSO: Enterprise customers want their own IdP
  3. Roles: Per-tenant roles (admin of tenant A ≠ admin of tenant B)
Architecture:
JWT Payload:
{
  "sub": "user_123",
  "tenant_id": "tenant_abc",
  "roles": ["admin"],  // Roles within this tenant
  "org_ids": ["org_1", "org_2"]  // Sub-organizations
}
Implementation:
  • Every DB query includes tenant_id
  • Middleware validates tenant_id matches token
  • Support SAML/OIDC for enterprise SSO
  • Separate encryption keys per tenant (optional)
Strategy: Dual-read, Single-write
  1. Add new secret (old still works)
  2. Deploy services to accept both old and new
  3. Switch write to new secret
  4. Verify all traffic uses new secret
  5. Revoke old secret
Example for API keys:
def validate_api_key(key: str) -> bool:
    # Check against both old and new
    current_key = secrets.get("api_key_current")
    previous_key = secrets.get("api_key_previous")
    
    return key == current_key or key == previous_key
Layers of security:
  1. Network: VPC/private network (services not internet-accessible)
  2. Transport: mTLS (mutual authentication)
  3. Application: JWT tokens with service identity
  4. Authorization: Service-level permissions
Service mesh approach (Istio):
  • Automatic mTLS between all services
  • Policy-based access control
  • No code changes needed
  • Traffic encryption by default
Multi-layer approach:
  1. Per-IP: Prevent anonymous abuse
  2. Per-User: Fair usage for authenticated users
  3. Per-API-Key: For API consumers
  4. Global: Protect overall system
Fairness strategies:
  • Token bucket: Allows bursts, smooths traffic
  • Sliding window: More accurate than fixed windows
  • Weighted limits: Premium tiers get more
  • Priority queues: Critical requests bypass limits
Abuse prevention:
  • Fingerprinting (detect rotating IPs)
  • Behavioral analysis (bot detection)
  • CAPTCHA after threshold
  • Progressive delays (slow down, don’t block)

Interview Questions

Strong answer:
  • Authentication (AuthN) verifies identity — “prove you are who you claim to be.” Authorization (AuthZ) verifies permissions — “are you allowed to do this specific action?” They are fundamentally different concerns that must be handled by different subsystems. AuthN happens first and produces an identity; AuthZ takes that identity and evaluates it against a policy to produce an allow/deny decision.
  • The danger of conflating them shows up in a very specific and common bug: checking that a user is logged in but not checking that they own the resource they are accessing. For example, user A is authenticated and requests GET /orders/456, but order 456 belongs to user B. If your middleware only checks “is there a valid session?” (AuthN) but doesn’t check “does this user own order 456?” (AuthZ), you have an Insecure Direct Object Reference (IDOR) vulnerability — one of the most common security bugs in web applications and consistently in the OWASP Top 10.
  • In practice, AuthN is typically handled at the gateway or middleware layer (validate JWT, check session, verify API key), while AuthZ must often happen at the application layer because it requires business logic. You can verify a JWT signature without knowing anything about your domain. But checking “can this user approve this expense report?” requires knowing the user’s role, the expense amount, the department’s approval hierarchy — that is application-level authorization.
  • The clean separation also matters for architecture: AuthN can be centralized (one auth service for all microservices — e.g., Auth0, Okta, or a custom OAuth server), but AuthZ is often distributed because each service owns its own permission model. Some teams centralize AuthZ too using a policy engine like OPA (Open Policy Agent) or AWS Cedar, which decouples policy from code.
  • Example: A SaaS platform had a middleware that checked if (req.user) next() — meaning any authenticated user could access any endpoint. An attacker with a free-tier account discovered they could call admin APIs like DELETE /api/tenants/other-tenant-id because the system verified they were logged in but never checked their role. AuthN was present; AuthZ was completely absent.
Red flag answer: “Authentication is the login page and authorization is checking if the user is an admin.” This is shallow — it misses IDOR vulnerabilities, the architectural separation between gateway-level AuthN and application-level AuthZ, and the fact that authorization goes far beyond simple role checks.Follow-ups:
  1. How would you implement resource-level authorization (e.g., “user A can edit document X but not document Y”) in a microservices architecture where the documents service is separate from the auth service?
  2. Compare RBAC (Role-Based Access Control) versus ABAC (Attribute-Based Access Control) — when would you choose one over the other?
Strong answer:
  • A JWT is stateless by design — the server validates it using a cryptographic signature without looking anything up in a database. This is its primary advantage (no session store, no DB call per request, scales horizontally) and also its primary limitation. Once issued, a JWT is valid until it expires. There is no built-in mechanism to invalidate it because no server-side state tracks which tokens are active.
  • If a user’s account is compromised, if they change their password, or if an admin revokes their access, any previously issued JWTs continue to work until expiry. This is a real security gap. The standard mitigations are:
    • Short expiration times (15 minutes or less for access tokens). This limits the damage window. Combined with refresh tokens stored server-side, you get the scalability of JWTs with the revocability of sessions. When the user’s access is revoked, the refresh token is deleted from Redis/DB, and the access token expires naturally within 15 minutes.
    • Token blocklist/denylist — maintain a set of revoked JWT IDs (jti claim) in Redis. On every request, check if the token’s jti is in the blocklist. This technically works but defeats the stateless advantage of JWTs — you are now doing a Redis lookup per request, which is essentially a session store with extra steps.
    • Short-lived tokens + event-driven invalidation — when a security event occurs (password change, role change), publish an event that causes all services to reject tokens issued before that timestamp for that user. Each service maintains a small in-memory map of user_id -> minimum_iat (issued-at time).
  • The pragmatic answer most senior engineers give: use short-lived JWTs (15 min) as access tokens, long-lived opaque refresh tokens stored in a database with revocation capability, and accept the 15-minute window as an acceptable security trade-off for the scalability benefit. For high-security operations (changing password, transferring money), require re-authentication regardless of token validity.
  • Example: A company discovered an employee was terminated but their JWT was still valid for 24 hours (they had set a long expiry). During that window, the ex-employee exfiltrated customer data. The incident led to reducing JWT TTL from 24 hours to 15 minutes and adding a Redis blocklist for immediate revocation of critical accounts, checked on every API call to sensitive endpoints.
Red flag answer: “Just delete the JWT from the client.” The client can store the token elsewhere or an attacker already has a copy. Deleting from the client is logout UX, not security revocation.Follow-ups:
  1. How does refresh token rotation work and why does it help detect token theft?
  2. If you implement a JWT blocklist in Redis, how do you handle the case where Redis goes down — do you reject all requests (secure but unavailable) or accept all tokens (available but insecure)?
Strong answer:
  • The Authorization Code flow has 4 parties: the user (resource owner), the client application (your app), the authorization server (Google, GitHub, etc.), and the resource server (the API you want to access). The flow works in two phases: a front-channel redirect to get an authorization code, then a back-channel server-to-server exchange to get tokens.
  • Step by step: (1) Your app redirects the user’s browser to the auth server with client_id, redirect_uri, scope, state, and response_type=code. (2) The user authenticates with the auth server and consents to the requested scopes. (3) The auth server redirects back to your redirect_uri with a short-lived authorization code in the URL query string. (4) Your server (not the browser) sends the authorization code + client_secret + client_id directly to the auth server’s token endpoint. (5) The auth server returns an access token (and optionally a refresh token and ID token if using OpenID Connect).
  • The authorization code is exchanged server-to-server because the access token must never be exposed to the browser. The authorization code itself appears briefly in the URL, but it is single-use, short-lived (typically 30-60 seconds), and useless without the client_secret. The client_secret is stored on your server and never sent to the browser. This two-step process ensures that even if someone intercepts the redirect URL (via browser history, referrer headers, or malicious browser extensions), they cannot exchange the code for a token without the secret.
  • The state parameter is critical and often overlooked — it prevents CSRF attacks. Your server generates a random state value, stores it in the user’s session, and includes it in the authorization URL. When the redirect comes back, your server verifies the state matches. Without this, an attacker could craft a URL that associates their own authorization code with your user’s session.
  • Example: Without the server-to-server exchange, you would use the Implicit flow (now deprecated by OAuth 2.1) where the access token comes back directly in the URL fragment. This was used for SPAs before PKCE existed. The problem: tokens in URLs get logged in browser history, server logs, and analytics tools. Multiple high-profile token leaks happened this way. The modern recommendation for SPAs is Authorization Code with PKCE (Proof Key for Code Exchange), which adds a code verifier/challenge to make the code exchange safe even from public clients.
Red flag answer: “OAuth is when you click ‘Login with Google’ and it gives you a token.” This skips the entire authorization code exchange, shows no understanding of why it is a two-step process, and ignores CSRF protection via the state parameter.Follow-ups:
  1. What is PKCE and why is it now recommended for all OAuth clients, not just mobile apps?
  2. What happens if an attacker registers a malicious redirect URI? How do authorization servers prevent open redirect attacks?
Strong answer:
  • The core challenge is applying different limits to different user classes while keeping the system fair, responsive, and resistant to abuse. A naive approach of just checking counts per API key breaks down quickly because you need per-tier limits, per-endpoint limits, burst handling, and distributed counting across multiple API server instances.
  • Algorithm choice matters. A sliding window log (store each request timestamp in a sorted set in Redis, count entries within the window) gives precise counts but is memory-heavy at scale. A token bucket is better for tiered rate limiting because it naturally supports burst allowances — a premium user might have a bucket of 100 tokens that refills at 10/second, while a free user has 10 tokens refilling at 1/second. The premium user can burst to 100 requests instantly, but sustains only 10/second. This “burst then sustain” behavior matches real API usage patterns.
  • Implementation architecture: Use Redis with atomic Lua scripts for distributed counting. Each API key gets a Redis key like ratelimit:v1:{api_key}:{endpoint} with a TTL matching the window. The Lua script atomically checks the count, increments if allowed, and returns the result — all in one round trip. Always return standard headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset so clients can self-regulate, and return 429 Too Many Requests with a Retry-After header when exceeded.
  • Multi-layer limiting is essential: (1) Per-IP rate limit at the edge/CDN to stop anonymous abuse before it hits your servers. (2) Per-API-key limit based on tier (free: 100/min, pro: 1000/min, enterprise: 10000/min). (3) Per-endpoint limits — a /search endpoint might allow 60/min while /export allows 5/min regardless of tier. (4) A global limit to protect the system from cascading overload. Each layer catches different attack vectors.
  • Example: Stripe uses tiered rate limiting with a base of 100 requests/second in live mode and 25 requests/second in test mode. They apply limits per API key and return headers on every response showing the remaining budget. When a merchant exceeds the limit, Stripe returns 429 with a clear Retry-After header. For enterprise customers, they negotiate custom limits and implement them as configuration — no code change needed.
Red flag answer: “Check a counter in the database and reject if it’s over the limit.” This ignores race conditions under concurrent requests, doesn’t address distributed counting across multiple servers, misses burst handling, and has no tier differentiation.Follow-ups:
  1. How would you handle the case where a user distributes requests across multiple API keys to evade rate limits? What signals would you use to detect this?
  2. If Redis goes down and your rate limiter is unavailable, do you fail open (allow all traffic) or fail closed (reject all traffic)? What are the trade-offs?
Strong answer:
  • RBAC (Role-Based Access Control) assigns permissions to roles, and roles to users. A user’s access is determined entirely by which roles they hold. It is a lookup: user -> roles -> permissions. Simple, well-understood, and covers 80% of use cases. The permission model is static — you define roles like viewer, editor, admin with fixed permission sets, and adding a new user just means assigning them the right role.
  • ABAC (Attribute-Based Access Control) evaluates a policy against attributes of the subject (user), resource (what they are accessing), action (what they want to do), and environment (time of day, IP address, device type). A policy might say: “Allow if user.department == resource.department AND user.clearance_level >= resource.classification AND time is within business hours.” This is far more expressive than RBAC but also more complex to implement, debug, and audit.
  • Choose RBAC when: Your permission model is relatively simple and stable, you have well-defined organizational roles, and you need easy auditability (“who has admin access?” is a simple query). Most B2B SaaS products start with RBAC and it serves them fine for years. It is also much easier for non-technical stakeholders to understand — “editors can edit, viewers can view” is intuitive.
  • Choose ABAC when: Your authorization rules depend on relationships between the user and the resource, or on contextual factors that change per request. Examples: a doctor can only view records of patients assigned to them (relationship), a financial transaction over $10,000 requires manager approval (attribute threshold), API access is restricted to office IP ranges during business hours (environmental context). Healthcare (HIPAA) and finance (SOX) often need ABAC because their compliance rules are inherently attribute-based.
  • Example: A healthcare platform started with RBAC (doctor, nurse, admin) but discovered it couldn’t express “a doctor can only see their own patients’ records.” They would have needed a unique role per doctor-patient pair, which is unmanageable. They moved authorization for record access to ABAC using OPA (Open Policy Agent) with policies like: allow if input.user.role == "doctor" AND input.resource.patient_id IN input.user.assigned_patients. Role-based access for admin functions stayed in RBAC.
Red flag answer: “RBAC uses roles and ABAC uses attributes” — this is a tautology that demonstrates zero understanding of when each model breaks down, how to combine them, or real-world authorization complexity.Follow-ups:
  1. How would you implement ABAC in a microservices architecture? Would each service evaluate policies locally, or would you centralize policy evaluation?
  2. How do you audit and debug ABAC policies? If a user reports “I can’t access this document,” how do you trace which policy rule denied them?
Strong answer:
  • The golden rule is: secrets should never exist in code, config files, environment variables (in production), container images, or version control. They must live in a dedicated secrets management system (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault) and be fetched at runtime by the application. This gives you centralized access control, audit logging, and the ability to rotate secrets without redeploying.
  • Environment variables are acceptable for development and CI/CD but are a security risk in production. They show up in process listings (/proc/PID/environ), crash dumps, error reporting tools (Sentry captures os.environ by default), container inspection (docker inspect), and orchestrator dashboards. A common real-world leak vector is a developer accidentally logging os.environ or a monitoring tool capturing environment state during an exception.
  • Secret rotation without downtime requires a dual-read pattern: (1) Generate a new secret and store it alongside the old one. (2) Deploy application code that accepts both the old and new secrets (e.g., validate API keys against current_key OR previous_key). (3) Switch all writes to use the new secret. (4) Monitor for any traffic still using the old secret. (5) After a grace period, revoke the old secret. Vault’s dynamic secrets feature automates this — it generates short-lived credentials (e.g., a database password valid for 1 hour) and handles rotation automatically.
  • Access control for secrets follows least privilege: each service should only be able to read the specific secrets it needs, and access should be authenticated (Vault uses token-based or Kubernetes service account auth). Every secret read should be audit-logged. In regulated environments (PCI-DSS, SOC 2), you need to demonstrate who accessed which secrets and when.
  • Example: A startup stored their database password in a docker-compose.yml file committed to a public GitHub repo. Within 4 hours, cryptocurrency miners had compromised the database. GitHub’s secret scanning detected it and sent an alert, but the damage was done. After the incident, they moved to AWS Secrets Manager with IAM-based access control, enabled automatic rotation every 30 days, and added git-secrets as a pre-commit hook to prevent future leaks.
Red flag answer: “Use environment variables and make sure they’re not in the Dockerfile.” This ignores the many ways environment variables leak in production and shows no awareness of dedicated secrets management tools.Follow-ups:
  1. How would you handle the bootstrap problem — your application needs a secret to authenticate with the secrets manager, so where does that initial credential come from?
  2. How do dynamic secrets (Vault’s approach of generating short-lived, unique credentials per service instance) improve security compared to static shared secrets?
Strong answer:
  • IDOR (Insecure Direct Object Reference) means your API exposes internal identifiers (like sequential database IDs) in URLs and doesn’t verify that the authenticated user is authorized to access the referenced object. The classic example: GET /api/invoices/1001 returns invoice 1001 for user A, but user B can also call GET /api/invoices/1001 and see user A’s invoice because the API only checks “is this user authenticated?” and not “does this invoice belong to this user?”
  • The fix has two layers — immediate and systemic. Immediately: add ownership checks at the data access layer. Every query for a user-scoped resource must include the user’s tenant/owner ID: SELECT * FROM invoices WHERE id = :invoice_id AND user_id = :current_user_id. This “scoped query” approach means even if an attacker guesses another ID, the query returns nothing because the ownership filter excludes it.
  • Systemically, you need multiple defenses: (1) Use UUIDs instead of sequential integer IDs in URLs — this makes enumeration attacks impractical (guessing a valid UUID is 1 in 2^122). This is defense-in-depth, not the primary fix, because security through obscurity alone is not sufficient. (2) Centralize authorization checks in middleware or a base repository class so individual endpoints cannot accidentally skip them. A BaseRepository that always injects WHERE tenant_id = :current_tenant makes it structurally impossible to forget. (3) Automated testing — add integration tests that attempt cross-user resource access and assert 403/404 responses. Run these in CI on every PR. (4) API gateway-level resource scoping — for microservices, the gateway can inject the authenticated user’s tenant ID into request headers and each service trusts only the gateway-injected header, not the client.
  • The deeper lesson: IDOR is not an edge case — it is one of the most exploited vulnerabilities in APIs according to the OWASP API Security Top 10. It happens because developers build CRUD endpoints that work correctly for the happy path (authenticated user fetching their own data) and forget the adversarial path (authenticated user fetching someone else’s data).
  • Example: In 2019, a major US financial services company had an IDOR vulnerability where changing the account number in the URL of an API call returned another customer’s full bank statements. Approximately 885 million records were exposed. The root cause was that the API Gateway validated the JWT but never checked that the account_id in the URL matched the account_id in the token. The fix was adding a middleware layer that compared the requested resource’s owner against the authenticated user on every request.
Red flag answer: “Use UUIDs instead of auto-increment IDs so users can’t guess them.” This treats the symptom (enumeration) rather than the disease (missing authorization checks). UUIDs reduce discoverability but a determined attacker can still find valid UUIDs through other vectors.Follow-ups:
  1. How would you detect IDOR vulnerabilities proactively in a large codebase with hundreds of endpoints? What tooling or testing strategy would you use?
  2. Should a failed authorization check return 403 Forbidden or 404 Not Found? What are the security implications of each?
Strong answer:
  • The fundamental shift is moving from perimeter security (“everything inside the network is trusted”) to zero-trust (“verify every request, even internal ones”). Perimeter security fails because once an attacker breaches any single service, they can freely call every other internal service. In a zero-trust model, every service-to-service call must be authenticated and authorized.
  • Layer 1: Network isolation. Services run in a VPC (Virtual Private Cloud) with private subnets. No internal service is directly accessible from the internet. Security groups and network policies restrict which services can talk to which — the notification service should not be able to call the payment service directly.
  • Layer 2: mTLS (Mutual TLS). Both the client and server present certificates and verify each other’s identity. This provides encryption in transit and mutual authentication. In practice, nobody manages mTLS certificates manually — a service mesh (Istio, Linkerd, Consul Connect) handles certificate issuance, rotation, and verification automatically. Certificates are short-lived (hours, not years) and rotated transparently. The application code doesn’t even know mTLS is happening.
  • Layer 3: Application-level identity and authorization. Even with mTLS confirming “this request is from Service A,” you still need to answer “is Service A allowed to call this endpoint on Service B?” This is done with service-level JWTs or service accounts with scoped permissions. A policy engine like OPA can evaluate: “Service A (identity from mTLS) is allowed to call POST /payments (action) but not DELETE /payments (different action).”
  • Layer 4: Request-level context propagation. The end user’s identity must flow through the service chain so downstream services can make authorization decisions about the user, not just the calling service. This is typically done via a signed JWT in a header that each service validates and forwards. Without this, Service B knows the request came from Service A but doesn’t know which user initiated it.
  • Example: A fintech company initially relied on VPC-level security (“all internal traffic is trusted”). After a vulnerability in their logging service was exploited, the attacker pivoted laterally to the payment processing service and extracted transaction data. The company adopted Istio for automatic mTLS between all services, OPA for service-to-service authorization policies, and end-user identity propagation via signed JWTs. The next breach attempt (through a similar vector) was contained — the compromised service’s mTLS identity was not authorized to call payment APIs.
Red flag answer: “Use HTTPS and API keys between services.” HTTPS only authenticates the server, not the client. API keys for internal services are hard to rotate, easy to leak, and provide no fine-grained authorization.Follow-ups:
  1. How does a service mesh like Istio handle certificate rotation for mTLS without causing connection drops or downtime?
  2. In a system with 50 microservices, how would you manage the service-to-service authorization policies? How do you avoid the policy matrix becoming unmanageable?