Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Introduction

Content Delivery Networks (CDNs) and edge computing are critical for delivering fast, reliable experiences to users worldwide. Understanding these concepts is essential for designing systems at scale. Here is the core insight: no matter how fast your code runs, physics imposes a hard floor on latency. Light in a fiber optic cable travels at roughly 200,000 km/s, which means a round trip from New York to Singapore takes at minimum ~160ms — and that is before your server does any work. CDNs attack this problem by moving data and compute closer to users, turning 200ms responses into 20ms responses for the vast majority of requests.
Interview Context: Questions about CDNs often come up when designing content-heavy systems (Netflix, YouTube) or discussing latency optimization strategies.

CDN Fundamentals

How CDNs Work

┌─────────────────────────────────────────────────────────────┐
│                    CDN Architecture                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   User in Tokyo                     User in New York        │
│        │                                   │                 │
│        ▼                                   ▼                 │
│   ┌─────────┐                         ┌─────────┐           │
│   │  Edge   │                         │  Edge   │           │
│   │  PoP    │                         │  PoP    │           │
│   │ Tokyo   │                         │ New York│           │
│   └────┬────┘                         └────┬────┘           │
│        │                                   │                 │
│        │     Cache Miss?                   │                 │
│        │          │                        │                 │
│        └──────────┼────────────────────────┘                 │
│                   │                                          │
│                   ▼                                          │
│            ┌─────────────┐                                   │
│            │   Origin    │                                   │
│            │   Server    │                                   │
│            │ (US-West)   │                                   │
│            └─────────────┘                                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

PoP = Point of Presence (edge location)

CDN Request Flow

1. User requests: https://cdn.example.com/video.mp4

2. DNS Resolution:
   cdn.example.com → Anycast IP → Nearest PoP

3. Edge Check:
   ┌─────────────────────────────────────────────┐
   │ Edge Server                                 │
   ├─────────────────────────────────────────────┤
   │ if cache_hit:                               │
   │     return cached_content  (< 50ms)        │
   │ else:                                       │
   │     fetch_from_origin()    (200-500ms)     │
   │     cache_locally()                         │
   │     return content                          │
   └─────────────────────────────────────────────┘

4. Headers Returned:
   X-Cache: HIT (or MISS)
   X-Edge-Location: NRT52 (Tokyo)
   Cache-Control: max-age=86400
   Age: 3600 (seconds since cached)

Caching Strategies

Cache Control Headers

# Cache for 1 day, allow CDN caching
Cache-Control: public, max-age=86400

# Cache for 1 hour, revalidate after
Cache-Control: public, max-age=3600, must-revalidate

# Don't cache at all
Cache-Control: no-store, no-cache

# Private (browser only, not CDN)
Cache-Control: private, max-age=3600

# Stale-while-revalidate (serve stale while fetching fresh)
Cache-Control: max-age=3600, stale-while-revalidate=86400

Cache Key Design

from hashlib import sha256
from urllib.parse import urlparse, parse_qs
from typing import Dict, List, Optional

class CacheKeyGenerator:
    """
    Generate cache keys for CDN.
    
    Key considerations:
    - URL path
    - Query parameters (some, not all)
    - Headers (Accept-Encoding, Accept-Language)
    - Cookies (for personalization)
    - Device type
    """
    
    def __init__(
        self,
        include_query_params: List[str] = None,
        exclude_query_params: List[str] = None,
        vary_headers: List[str] = None
    ):
        self.include_params = include_query_params or []
        self.exclude_params = exclude_query_params or [
            "utm_source", "utm_medium", "utm_campaign",  # Analytics
            "fbclid", "gclid",  # Tracking
            "_", "timestamp"   # Cache busters
        ]
        self.vary_headers = vary_headers or [
            "Accept-Encoding",
            "Accept-Language"
        ]
    
    def generate_key(
        self,
        url: str,
        headers: Dict[str, str] = None,
        cookies: Dict[str, str] = None
    ) -> str:
        """Generate a cache key for the request."""
        parsed = urlparse(url)
        query_params = parse_qs(parsed.query)
        
        # Filter query parameters
        filtered_params = self._filter_query_params(query_params)
        
        # Build key components
        components = [
            parsed.netloc,
            parsed.path,
            self._normalize_params(filtered_params)
        ]
        
        # Add vary header values
        if headers:
            for header in self.vary_headers:
                if header.lower() in (h.lower() for h in headers):
                    components.append(
                        f"{header}={headers.get(header, '')}"
                    )
        
        # Add personalization segment (optional)
        if cookies and "segment" in cookies:
            components.append(f"segment={cookies['segment']}")
        
        # Generate hash
        key_string = ":".join(components)
        return sha256(key_string.encode()).hexdigest()[:32]
    
    def _filter_query_params(
        self, 
        params: Dict[str, List[str]]
    ) -> Dict[str, List[str]]:
        """Filter out excluded parameters."""
        if self.include_params:
            return {
                k: v for k, v in params.items() 
                if k in self.include_params
            }
        return {
            k: v for k, v in params.items() 
            if k not in self.exclude_params
        }
    
    def _normalize_params(
        self, 
        params: Dict[str, List[str]]
    ) -> str:
        """Sort and normalize parameters for consistent keys."""
        sorted_params = sorted(params.items())
        return "&".join(
            f"{k}={','.join(sorted(v))}" 
            for k, v in sorted_params
        )


class CacheTTLPolicy:
    """Determine TTL based on content type."""
    
    DEFAULT_TTL = 3600  # 1 hour
    
    TTL_BY_EXTENSION = {
        # Static assets - long TTL
        ".js": 31536000,    # 1 year
        ".css": 31536000,
        ".woff2": 31536000,
        ".woff": 31536000,
        
        # Images - medium TTL
        ".jpg": 86400,      # 1 day
        ".jpeg": 86400,
        ".png": 86400,
        ".gif": 86400,
        ".webp": 86400,
        
        # Video - long TTL
        ".mp4": 604800,     # 1 week
        ".webm": 604800,
        
        # HTML - short TTL
        ".html": 300,       # 5 minutes
        
        # API responses
        ".json": 60,        # 1 minute
    }
    
    @classmethod
    def get_ttl(cls, path: str, content_type: str = None) -> int:
        """Get appropriate TTL for content."""
        # Check by extension
        for ext, ttl in cls.TTL_BY_EXTENSION.items():
            if path.endswith(ext):
                return ttl
        
        # Check by content type
        if content_type:
            if "image" in content_type:
                return 86400
            if "video" in content_type:
                return 604800
            if "javascript" in content_type or "css" in content_type:
                return 31536000
        
        return cls.DEFAULT_TTL

Cache Invalidation

┌─────────────────────────────────────────────────────────────┐
│               Cache Invalidation Strategies                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. TTL-Based (Time-to-Live)                                │
│     Cache-Control: max-age=3600                             │
│     • Simple, predictable                                   │
│     • May serve stale content                               │
│                                                              │
│  2. Purge (Immediate)                                       │
│     POST /purge/path/to/content                             │
│     • Instant invalidation                                   │
│     • Requires purge API access                              │
│                                                              │
│  3. Soft Purge (Stale-while-revalidate)                     │
│     Mark as stale, serve while fetching fresh               │
│     • Better availability                                    │
│     • Brief inconsistency                                    │
│                                                              │
│  4. Versioned URLs                                          │
│     /assets/app.abc123.js                                   │
│     • Never invalidate, use new URL                          │
│     • Best for static assets                                 │
│                                                              │
│  5. Tag-Based Invalidation                                  │
│     Surrogate-Key: product-123, category-shoes              │
│     Purge by tag: all pages with product-123                │
│     • Efficient bulk invalidation                            │
│     • Requires CDN support                                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Edge Computing

Edge computing represents a fundamental shift: instead of the edge being a “dumb cache” that only stores static files, it becomes a programmable layer that can run your code. This means decisions that used to require a round trip to your origin server — authentication checks, A/B test assignments, geographic routing, personalization — can now happen in 5ms at the edge instead of 200ms at the origin. The trade-off is that edge environments are constrained: limited CPU time (typically 10-50ms), limited memory, no persistent connections, and cold starts can add latency. Design your edge logic to be small, fast, and stateless.

Edge Functions/Workers

┌─────────────────────────────────────────────────────────────┐
│                    Edge Computing                            │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Traditional:                                               │
│  User ───► CDN (static only) ───► Origin Server            │
│                                                              │
│  Edge Computing:                                            │
│  User ───► Edge (compute + cache) ───► Origin (optional)   │
│                                                              │
│  Edge capabilities:                                         │
│  • Run JavaScript/WASM at edge                              │
│  • Modify requests/responses                                │
│  • Access edge KV storage                                    │
│  • Make subrequests                                          │
│  • A/B testing, personalization                             │
│  • Authentication/authorization                              │
│  • API routing                                               │
│                                                              │
└─────────────────────────────────────────────────────────────┘
// Cloudflare Worker example

addEventListener('fetch', event => {
    event.respondWith(handleRequest(event.request));
});

async function handleRequest(request) {
    const url = new URL(request.url);
    
    // 1. A/B Testing
    const variant = getABVariant(request);
    
    // 2. Geographic routing
    const country = request.cf.country;
    const region = getRegion(country);
    
    // 3. Authentication at edge
    const authResult = await validateToken(request);
    if (!authResult.valid) {
        return new Response('Unauthorized', { status: 401 });
    }
    
    // 4. Rate limiting
    const rateLimited = await checkRateLimit(
        request.headers.get('CF-Connecting-IP')
    );
    if (rateLimited) {
        return new Response('Too Many Requests', { status: 429 });
    }
    
    // 5. Modify request before sending to origin
    const modifiedRequest = new Request(request, {
        headers: new Headers({
            ...Object.fromEntries(request.headers),
            'X-AB-Variant': variant,
            'X-User-Region': region,
            'X-User-ID': authResult.userId
        })
    });
    
    // 6. Fetch from origin (or cache)
    const response = await fetch(modifiedRequest);
    
    // 7. Modify response before returning
    const modifiedResponse = new Response(response.body, {
        status: response.status,
        headers: new Headers({
            ...Object.fromEntries(response.headers),
            'X-Edge-Location': request.cf.colo,
            'X-Cache-Status': response.headers.get('CF-Cache-Status')
        })
    });
    
    return modifiedResponse;
}

function getABVariant(request) {
    // Consistent hashing based on user cookie or IP
    const cookie = request.headers.get('Cookie') || '';
    const userId = extractUserId(cookie) || 
                   request.headers.get('CF-Connecting-IP');
    
    // Simple hash for variant selection
    const hash = simpleHash(userId);
    return hash % 2 === 0 ? 'A' : 'B';
}

function getRegion(country) {
    const regions = {
        'US': 'na',
        'CA': 'na',
        'GB': 'eu',
        'DE': 'eu',
        'FR': 'eu',
        'JP': 'apac',
        'SG': 'apac',
        'AU': 'apac'
    };
    return regions[country] || 'default';
}

async function validateToken(request) {
    const authHeader = request.headers.get('Authorization');
    if (!authHeader?.startsWith('Bearer ')) {
        return { valid: false };
    }
    
    const token = authHeader.slice(7);
    
    // Check edge KV for token (or validate JWT)
    const userData = await TOKENS.get(token);
    if (!userData) {
        return { valid: false };
    }
    
    return { valid: true, userId: JSON.parse(userData).userId };
}

async function checkRateLimit(ip) {
    const key = `ratelimit:${ip}`;
    const current = parseInt(await RATELIMITS.get(key) || '0');
    
    if (current >= 100) {  // 100 requests per minute
        return true;
    }
    
    await RATELIMITS.put(key, String(current + 1), {
        expirationTtl: 60
    });
    
    return false;
}

CDN Architecture Patterns

Multi-Tier Caching

┌─────────────────────────────────────────────────────────────┐
│                  Multi-Tier Cache Architecture               │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Tier 1: Edge PoPs (200+ locations)                        │
│   ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐            │
│   │Tokyo │ │NYC   │ │London│ │Sydney│ │ ...  │            │
│   │ 10ms │ │ 10ms │ │ 10ms │ │ 10ms │ │      │            │
│   └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ └──────┘            │
│      │        │        │        │                          │
│   Tier 2: Regional Shields (5-10 locations)                │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│   │   US-East   │  │    EU       │  │    APAC     │       │
│   │    50ms     │  │    50ms     │  │    50ms     │       │
│   └──────┬──────┘  └──────┬──────┘  └──────┬──────┘       │
│          │                │                │               │
│   Tier 3: Origin Shield (1-2 locations)                    │
│   ┌─────────────────────────────────────────────┐          │
│   │              Origin Shield                   │          │
│   │               100ms                          │          │
│   └────────────────────┬────────────────────────┘          │
│                        │                                    │
│   Origin Servers                                           │
│   ┌─────────────────────────────────────────────┐          │
│   │         Origin (database, compute)           │          │
│   └─────────────────────────────────────────────┘          │
│                                                              │
│   Benefits:                                                 │
│   • Reduce origin load (cache hit at each tier)            │
│   • Faster cache fills (from nearest tier)                 │
│   • Better availability (tier isolation)                   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Pull vs Push CDN

Pull CDN (Lazy Loading):
┌─────────────────────────────────────────────────────────────┐
│ 1. User requests content                                    │
│ 2. CDN checks cache → MISS                                  │
│ 3. CDN fetches from origin                                  │
│ 4. CDN caches and returns                                   │
│ 5. Subsequent requests → HIT                                │
│                                                              │
│ Pros: Simple, automatic, no pre-warming needed              │
│ Cons: First request slow (cold cache)                       │
│ Best for: Websites, APIs, unpredictable access patterns     │
└─────────────────────────────────────────────────────────────┘

Push CDN (Pre-Loading):
┌─────────────────────────────────────────────────────────────┐
│ 1. Origin pushes content to CDN                             │
│ 2. CDN distributes to edge locations                        │
│ 3. Users always get cache HIT                               │
│                                                              │
│ Pros: No cold cache, predictable latency                    │
│ Cons: Manual management, storage costs                      │
│ Best for: Video streaming, software downloads               │
└─────────────────────────────────────────────────────────────┘

Video Streaming Architecture

Adaptive Bitrate Streaming

┌─────────────────────────────────────────────────────────────┐
│               Video Delivery Architecture                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   Source Video                                              │
│   ┌──────────────┐                                          │
│   │ master.mp4   │                                          │
│   │   4K/60fps   │                                          │
│   └──────┬───────┘                                          │
│          │                                                   │
│   Transcoding Pipeline                                      │
│   ┌──────▼───────┐                                          │
│   │  Transcoder  │                                          │
│   └──────┬───────┘                                          │
│          │                                                   │
│   ┌──────┴──────────────────────────────────────┐           │
│   │                                             │           │
│   ▼           ▼           ▼           ▼         ▼           │
│ 240p       480p       720p       1080p       4K           │
│ 400kbps   1Mbps      3Mbps      6Mbps      15Mbps        │
│                                                              │
│   Each quality → Segments (2-10 seconds each)               │
│                                                              │
│   Manifest (HLS/DASH)                                       │
│   ┌──────────────────────────────────────────┐              │
│   │ #EXTM3U                                  │              │
│   │ #EXT-X-STREAM-INF:BANDWIDTH=400000       │              │
│   │ 240p/playlist.m3u8                       │              │
│   │ #EXT-X-STREAM-INF:BANDWIDTH=1000000      │              │
│   │ 480p/playlist.m3u8                       │              │
│   │ ...                                       │              │
│   └──────────────────────────────────────────┘              │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Client Adaptive Logic:
┌─────────────────────────────────────────────────────────────┐
│ while playing:                                              │
│   bandwidth = measure_current_bandwidth()                   │
│   buffer_level = get_buffer_level()                         │
│                                                              │
│   if buffer_level < 5s:                                     │
│       switch_to_lower_quality()  # Prevent rebuffering     │
│   elif bandwidth > current_quality * 1.5:                   │
│       switch_to_higher_quality() # Improve experience       │
│                                                              │
│   fetch_next_segment(selected_quality)                      │
└─────────────────────────────────────────────────────────────┘

Performance Optimization

Edge Performance Techniques

1. Connection Optimization:
   • HTTP/2 or HTTP/3 (QUIC)
   • Connection pooling to origin
   • TLS session resumption
   • 0-RTT connections

2. Compression:
   • Brotli compression (better than gzip)
   • WebP/AVIF for images
   • Minification at edge

3. Request Collapsing:
   Multiple simultaneous requests for same resource
   → Single origin request
   → Response distributed to all waiters

4. Predictive Prefetching:
   • Analyze user behavior
   • Pre-warm cache for likely next requests
   • Push resources via HTTP/2 push

5. Image Optimization:
   • On-the-fly resizing
   • Format conversion (WebP/AVIF)
   • Quality adjustment based on network

Latency Breakdown

Total Latency (without CDN):
┌──────────────────────────────────────────────────────────┐
│ DNS (50ms) + TCP (100ms) + TLS (50ms) + TTFB (200ms)    │
│                                                          │
│ Total: ~400ms minimum                                    │
└──────────────────────────────────────────────────────────┘

With CDN (cache hit):
┌──────────────────────────────────────────────────────────┐
│ DNS (5ms) + TCP (10ms) + TLS (5ms) + Edge (5ms)         │
│                                                          │
│ Total: ~25ms                                             │
└──────────────────────────────────────────────────────────┘

Optimization impact:
• Anycast DNS: 50ms → 5ms
• Edge proximity: 100ms TCP → 10ms
• TLS resumption: 50ms → 5ms
• Cache hit: 200ms → 5ms

Security at the Edge

DDoS Protection

┌─────────────────────────────────────────────────────────────┐
│               Edge DDoS Protection Layers                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Layer 3/4: Network Level                                   │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ • Anycast distribution (absorb volume)              │   │
│  │ • SYN flood protection                              │   │
│  │ • UDP amplification filtering                        │   │
│  │ • IP reputation blocking                             │   │
│  └─────────────────────────────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│  Layer 7: Application Level                                 │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ • Rate limiting per IP/path                         │   │
│  │ • Bot detection (CAPTCHA, JS challenge)             │   │
│  │ • WAF rules (SQL injection, XSS)                    │   │
│  │ • Behavioral analysis                                │   │
│  └─────────────────────────────────────────────────────┘   │
│                         │                                    │
│                         ▼                                    │
│  Origin Protection                                          │
│  ┌─────────────────────────────────────────────────────┐   │
│  │ • Origin IP hidden behind CDN                        │   │
│  │ • Authentication headers to origin                   │   │
│  │ • Allowlist only CDN IPs                             │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Cost Optimization

CDN Pricing Model

Cost Components:
┌────────────────────────────────────────────────────────────┐
│                                                            │
│  1. Bandwidth (egress from edge)                          │
│     $0.05 - $0.15 per GB (varies by region)               │
│     Committed use discounts available                      │
│                                                            │
│  2. Requests                                               │
│     $0.01 per 10,000 requests (HTTP)                      │
│     Higher for HTTPS                                       │
│                                                            │
│  3. Edge Compute                                           │
│     $0.50 per million invocations                          │
│     Plus duration charges                                  │
│                                                            │
│  4. Storage (origin shield, KV)                           │
│     $0.02 per GB/month                                     │
│                                                            │
└────────────────────────────────────────────────────────────┘

Optimization Strategies:
• Higher cache hit ratio = lower origin bandwidth
• Compress content = lower edge bandwidth
• Consolidate requests = lower request count
• Use tiered caching = fewer origin fetches

Interview Tips

Common CDN Interview Questions and Strong Answer Patterns:
  1. “How would you design a video streaming service?”
    • Start with the transcoding pipeline (source to multiple resolutions), then discuss HLS/DASH adaptive bitrate streaming, CDN distribution with multi-tier caching, and segment size trade-offs (smaller segments = faster quality switching but more requests; larger segments = fewer requests but slower adaptation).
  2. “How do you handle cache invalidation?”
    • Lead with versioned URLs for static assets (the cache never needs invalidation because the URL changes), then discuss purge APIs for dynamic content, and stale-while-revalidate as a pattern that preserves availability during revalidation. The key insight: “There are only two hard things in computer science — cache invalidation and naming things.”
  3. “What happens on a cache miss?”
    • Walk through the full request flow: edge checks local cache, misses, checks regional shield (if present), misses again, goes to origin. Mention the thundering herd problem (1,000 simultaneous requests for the same uncached resource all hit origin) and how request collapsing solves it (one request to origin, response fanned out to all waiters).
  4. “How do you optimize for global users?”
    • Discuss Anycast DNS for routing to nearest PoP, multi-tier caching to reduce origin load, HTTP/2 or QUIC for connection efficiency, and consider whether the use case needs read-your-writes consistency (which complicates caching) or can tolerate brief staleness.
  5. “How do you secure content at the edge?”
    • Signed URLs with expiration for access control, token-based authentication at edge workers, WAF rules for injection attacks, and DDoS protection via Anycast distribution that absorbs volumetric attacks across the entire edge network.

Practice Problem

Design a Global Image Delivery ServiceRequirements:
  • Serve 1 billion images per day
  • Support on-the-fly resizing and format conversion
  • Sub-100ms latency globally
  • Cost-effective storage and delivery
Consider:
  1. How would you structure the URL scheme? (Hint: encode dimensions and format in the URL path, e.g., /images/abc123/400x300.webp, so each variant has a unique cacheable URL)
  2. Where would you do image transformations? (Trade-off: at the edge means faster first response but limited compute; at the origin means more powerful processing but higher latency on cache miss. The sweet spot is usually transform-on-first-request-then-cache at the edge, with an origin shield to prevent duplicate work)
  3. How would you handle cache invalidation? (Versioned source URLs mean transformed variants auto-invalidate when the source changes)
  4. What is your strategy for unpopular images? (The long tail problem: 80% of images are rarely accessed. Do not pre-generate all variants. Generate on demand, cache with a shorter TTL, and let the CDN evict cold entries naturally via LRU)
Scalability Analysis: At 1 billion images per day, you are looking at roughly 11,500 image requests per second average, with peaks of 30,000-50,000 QPS. If each image averages 200KB, your daily egress is approximately 200TB. At 0.05/GBforCDNbandwidth,thatis0.05/GB for CDN bandwidth, that is 10,000/day in delivery costs alone — which is why cache hit ratio is the single most important metric. Moving from 85% to 95% cache hit ratio cuts your origin bandwidth (and cost) in half.

Interview Deep-Dive

Strong Answer:Let me start with the numbers, because they drive every architectural decision.Traffic estimation: 200M subscribers, assume 30% are active daily = 60M DAU. Average session: 2 hours of video. At an average bitrate of 5 Mbps (mix of mobile at 2 Mbps and 4K TVs at 15 Mbps), each user consumes 5 Mbps * 7,200 seconds = 4.5 GB per session. Total daily egress: 60M * 4.5 GB = 270 PB/day. Peak concurrent viewers (assume 10% of DAU): 6M * 5 Mbps = 30 Tbps.CDN architecture:
  • Tier 1: Embedded caches inside ISPs (Open Connect Appliances, Netflix’s approach). Place physical servers inside the top 100 ISPs worldwide. Each appliance stores the top 5,000 most popular titles pre-loaded overnight during off-peak hours. This handles 90%+ of traffic without crossing the ISP’s peering boundary. This is the single biggest cost optimization — transit costs $0 because the data never leaves the ISP’s network.
  • Tier 2: Regional PoPs (20-50 locations). For the long-tail content that ISP caches do not have. These serve cache misses from Tier 1 and also handle content for smaller ISPs without embedded appliances.
  • Tier 3: Origin (2-3 locations). Stores the master copy of all content across all resolutions. Only serves ~1-2% of total traffic (cache misses from Tier 2).
Cost estimation: If you do NOT have ISP-embedded caches and rely entirely on a commercial CDN at 0.02/GB(volumepricing),270PB/day=0.02/GB (volume pricing), 270 PB/day = 5.4M/day = 162M/month.ThisiswhyNetflixbuiltOpenConnecttheiractualcostisestimatedat162M/month. This is why Netflix built Open Connect -- their actual cost is estimated at 0.005/GB or less, saving over $120M/month compared to commercial CDN pricing.Follow-up: A new season of your most popular show drops at midnight. 20M users start streaming simultaneously. Walk me through what happens at the CDN layer.This is pre-planned. The content was transcoded into all resolutions/bitrates days in advance and pre-pushed to every ISP cache and regional PoP during the prior 48 hours. At midnight, 20M simultaneous requests hit Tier 1 ISP caches — 95%+ cache hit ratio because the content is already there. The remaining 5% (smaller ISPs, less popular resolutions) hit Tier 2, which was also pre-warmed. Origin sees almost zero traffic. The challenge is not bandwidth but connection concurrency — 20M TCP/QUIC connections establishing in a 60-second window. The adaptive bitrate algorithm starts everyone at 480p for the first 10 seconds, then ramps up quality based on measured bandwidth, preventing a thundering herd of 20M 4K requests hitting the caches simultaneously.
Strong Answer:50,000/monthatroughly50,000/month at roughly 0.05/GB means ~1 PB/month of edge bandwidth. With 85% cache hit ratio, the origin is serving 15% of requests = 150 TB/month to refill caches. The cost breakdown is roughly: 42,500foredgebandwidth(servingusers)+42,500 for edge bandwidth (serving users) + 7,500 for origin-to-edge bandwidth (cache fills). The lever that matters most is cache hit ratio.Step 1: Improve cache hit ratio from 85% to 95% (biggest impact):
  • Normalize URLs: UTM parameters, session tokens, and tracking IDs in the URL create unique cache keys for the same image. Strip all non-essential query parameters from the cache key. I have seen this alone improve hit ratio by 5-10%.
  • Increase TTLs: If product images change infrequently (most do not change after upload), set Cache-Control to 1 year with versioned filenames (product-123-v2.webp). Invalidation happens by changing the filename, not purging the cache.
  • Add an origin shield: Without it, a cache miss at the Tokyo PoP and a cache miss at the London PoP both independently fetch from origin. With an origin shield (a single intermediate cache), the second miss is served from the shield. This reduces origin fetches by 60-80% for globally distributed traffic.
Step 2: Reduce bytes per request (image optimization):
  • Serve WebP/AVIF instead of JPEG/PNG. WebP is 25-35% smaller than JPEG at equivalent quality. At 1 PB/month, a 30% reduction saves 300 TB = $15,000/month.
  • Responsive images: Serve 400px wide images to mobile, 1200px to desktop. Most e-commerce sites serve the same 2000px image to everyone. Serving appropriately sized images saves 40-60% bandwidth on mobile traffic (typically 60% of total).
  • Quality tuning: Most product images are saved at quality 90-95. Reducing to quality 80-85 is visually imperceptible but reduces file size by 20-30%.
Combined impact: 95% cache hit ratio + 30% smaller images could reduce the bill from 50,000to 50,000 to ~20,000-25,000/month — meeting the VP’s target.Follow-up: After optimizing, you are at 96% cache hit ratio. The remaining 4% cache misses are from the “long tail” — millions of obscure product images each accessed once a month. How do you handle them?The long tail is inherently uncacheable at the edge — there are too many unique URLs accessed too infrequently. Accept that these will always be cache misses. The optimization is on the origin side: serve long-tail images from S3 directly (cheap storage, high availability) with a thin Lambda@Edge function that does on-the-fly resizing and format conversion. Do not pre-generate resized variants for the long tail — generate on demand, serve, and let the CDN cache it with a short TTL (1 hour). If it is accessed again within that hour, great. If not, the CDN evicts it and the small storage cost is negligible. The key metric shifts from cache hit ratio (diminishing returns past 96%) to origin response time for cache misses (target under 500ms including image transformation).
Strong Answer:Cost estimation for Cloudflare Workers (the most common edge compute platform):
  • 500K requests/sec = 43.2 billion requests/day = ~1.3 trillion requests/month.
  • Cloudflare Workers pricing: 0.50permillionrequests(paidplan).1.3trillion/1M0.50 per million requests (paid plan). 1.3 trillion / 1M * 0.50 = 650,000/month.Thatisexpensive.Butthebundledplangives10Mrequests/monthincludedwiththe650,000/month. That is expensive. But the bundled plan gives 10M requests/month included with the 5/month plan, and enterprise pricing with volume commitments drops to roughly 0.150.30permillionatthisscale.Realisticcost:0.15-0.30 per million at this scale. Realistic cost: 200,000-400,000/month.
  • CPU time: JWT validation takes ~0.5ms. At 0.000012permsofCPUtime(Cloudflaresdurationcharge),thatis0.000012 per ms of CPU time (Cloudflare's duration charge), that is 0.000006 per request = $7,800/month for CPU. Negligible compared to the per-request charge.
Comparison: doing this at origin instead: 500K req/sec at 5ms per auth check = 2,500 CPU-seconds per second, requiring roughly 50-80 c6g.2xlarge instances at ~0.27/hr=0.27/hr = 10,000-15,000/month. Significantly cheaper in raw compute, but you lose the latency benefit (adding 50-150ms round trip from edge to origin for every request).Architectural risks:
  1. Key rotation: JWT validation requires the public key. Edge functions need access to the JWKS (JSON Web Key Set). If you cache JWKS at the edge with a 5-minute TTL and rotate keys, there is a 5-minute window where requests with the new key fail validation at edges that still have the old JWKS cached. Solution: always validate against both current and previous keys during rotation.
  2. Cold starts: Cloudflare Workers have minimal cold starts (~0ms, V8 isolates), but Lambda@Edge has 5-50ms cold starts. At 500K req/sec the cold start rate is near zero (all instances stay warm), but during a traffic dip followed by a spike, cold starts can cause a latency bump.
  3. No database access: Edge functions cannot query your user database to check permissions. You must encode all authorization data in the JWT claims or use an edge KV store (which adds 10-50ms per lookup). Design your auth model to be self-contained in the token.
Follow-up: You discover that 30% of your requests are from bots that are wasting your edge compute budget. How do you handle this?Layer bot detection before JWT validation. First, use the CDN’s built-in bot score (Cloudflare Bot Management, AWS WAF Bot Control) — this runs at the network layer before your edge function executes, so you do not pay compute costs for detected bots. Set a threshold: bot score above 90 gets a 403, score 50-90 gets a JS challenge (which bots fail), below 50 passes through to your edge function. This eliminates 30% of your edge compute costs = $60,000-120,000/month savings. The remaining sophisticated bots that pass the challenge are handled by rate limiting at the edge (token bucket per IP, per API key).