Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Problem Statement

Design a video streaming service like Netflix or YouTube that:
  • Allows users to upload and stream videos
  • Handles millions of concurrent viewers
  • Provides smooth playback globally
  • Supports different video qualities (adaptive streaming)
This is a hard interview problem. Focus on video processing pipeline, CDN architecture, and adaptive streaming. Don’t try to cover everything — pick 2-3 areas to go deep. The key differentiator: video streaming is fundamentally a bandwidth and storage problem, not a compute problem. Netflix delivers 100+ Gbps globally. The entire system is designed around moving bytes efficiently from origin to screen with minimal buffering.

Step 1: Requirements

Functional Requirements

Core Features

  • Upload videos
  • Stream videos
  • Search videos
  • Recommendations

Playback Features

  • Adaptive bitrate streaming
  • Resume playback
  • Multiple device support
  • Subtitles/captions

Non-Functional Requirements

  • Low Latency: Video starts in < 2 seconds
  • High Availability: 99.99% uptime
  • Global Scale: Serve users worldwide
  • Storage: Handle petabytes of video content

Capacity Estimation

┌─────────────────────────────────────────────────────────────────┐
│                 Netflix/YouTube Scale                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Users (Netflix-like):                                          │
│  • 200 million subscribers                                     │
│  • 100 million DAU (50%)                                       │
│  • 10 million peak concurrent viewers                          │
│                                                                 │
│  Content:                                                       │
│  • 10,000 movies/shows (Netflix)                               │
│  • 500 million videos (YouTube)                                │
│  • Average video: 1 hour = 3 GB (1080p)                        │
│                                                                 │
│  Bandwidth (Peak):                                              │
│  • 10M viewers × 5 Mbps (1080p) = 50 Tbps                      │
│  • That's 50 terabits per second!                              │
│                                                                 │
│  Storage:                                                       │
│  • 10,000 videos × 3 GB × 5 qualities = 150 TB (Netflix)       │
│  • 500M videos × 100 MB average = 50 PB (YouTube)              │
│                                                                 │
│  Uploads (YouTube):                                             │
│  • 500 hours of video uploaded per minute                      │
│  • 500 × 60 = 30,000 hours/hour                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Step 2: High-Level Design

Netflix Microservices Architecture

Step 3: Video Upload & Processing

Upload Flow

┌─────────────────────────────────────────────────────────────────┐
│                    Video Upload Pipeline                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Client                2. API                 3. Upload      │
│     │                        │                      │           │
│     │ Request upload URL     │                      │           │
│     ├───────────────────────►│                      │           │
│     │                        │                      │           │
│     │◄──Pre-signed S3 URL────│                      │           │
│     │                        │                      │           │
│     │                        │                      │           │
│  2. Direct upload to S3 (chunked, resumable)       │           │
│     │────────────────────────────────────────────►  │           │
│     │                        │                      │           │
│     │                        │          S3 Event    │           │
│     │                        │◄─────────────────────│           │
│     │                        │                      │           │
│     │                        │                      │           │
│  3. Trigger transcoding pipeline                               │
│     │                        │                                  │
│     │                   ┌────▼────┐                            │
│     │                   │  Queue  │                            │
│     │                   └────┬────┘                            │
│     │                        │                                  │
│     │                   ┌────▼────┐                            │
│     │                   │Transcode│                            │
│     │                   │ Workers │                            │
│     │                   └────┬────┘                            │
│     │                        │                                  │
│     │                   Multiple output formats:               │
│     │                   • 360p  (500 Kbps)                     │
│     │                   • 480p  (1 Mbps)                       │
│     │                   • 720p  (2.5 Mbps)                     │
│     │                   • 1080p (5 Mbps)                       │
│     │                   • 4K    (15 Mbps)                      │
│     │                                                          │
│  4. Store transcoded videos + metadata                         │
│     │                                                          │
│  5. Push to CDN edge locations                                 │
│     │                                                          │
│  6. Notify user: "Video ready!"                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Video Transcoding

class TranscodingService:
    """
    Video transcoding pipeline
    
    Key concepts:
    - Chunk-based processing for parallelism
    - Multiple output qualities
    - Generate thumbnails and preview
    """
    
    QUALITIES = [
        {"name": "360p", "width": 640, "height": 360, "bitrate": "500k"},
        {"name": "480p", "width": 854, "height": 480, "bitrate": "1000k"},
        {"name": "720p", "width": 1280, "height": 720, "bitrate": "2500k"},
        {"name": "1080p", "width": 1920, "height": 1080, "bitrate": "5000k"},
        {"name": "4K", "width": 3840, "height": 2160, "bitrate": "15000k"},
    ]
    
    def process_video(self, video_id, source_path):
        # 1. Split video into chunks (for parallel processing)
        chunks = self.split_into_chunks(source_path, chunk_duration=10)
        
        # 2. Transcode each chunk in parallel for each quality
        for quality in self.QUALITIES:
            transcoded_chunks = parallel_map(
                lambda chunk: self.transcode_chunk(chunk, quality),
                chunks
            )
            
            # 3. Merge chunks back together
            output_path = self.merge_chunks(transcoded_chunks, quality)
            
            # 4. Generate HLS/DASH segments
            self.generate_streaming_segments(output_path, video_id, quality)
        
        # 5. Generate thumbnails
        self.generate_thumbnails(source_path, video_id)
        
        # 6. Generate preview/trailer
        self.generate_preview(source_path, video_id)
        
        # 7. Update metadata and notify
        self.mark_video_ready(video_id)

Step 4: Video Streaming (Adaptive Bitrate)

How Adaptive Streaming Works

┌─────────────────────────────────────────────────────────────────┐
│                 Adaptive Bitrate Streaming                      │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  HLS (HTTP Live Streaming) / DASH                              │
│                                                                 │
│  Video is split into small segments (2-10 seconds each)        │
│  Each segment available in multiple qualities                   │
│                                                                 │
│  manifest.m3u8 (Master Playlist)                               │
│  ├── 360p/playlist.m3u8                                        │
│  │   ├── segment_001.ts                                        │
│  │   ├── segment_002.ts                                        │
│  │   └── ...                                                   │
│  ├── 720p/playlist.m3u8                                        │
│  │   ├── segment_001.ts                                        │
│  │   └── ...                                                   │
│  └── 1080p/playlist.m3u8                                       │
│      ├── segment_001.ts                                        │
│      └── ...                                                   │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Player Logic                         │   │
│  │                                                         │   │
│  │  1. Download manifest                                   │   │
│  │  2. Start with lowest quality                          │   │
│  │  3. Measure download speed                             │   │
│  │  4. Adjust quality based on bandwidth:                 │   │
│  │                                                         │   │
│  │  Time  0s     5s     10s    15s    20s                 │   │
│  │        │      │      │      │      │                   │   │
│  │  360p  ████                                            │   │
│  │  720p       ████████████                               │   │
│  │  1080p                   ████████████                  │   │
│  │                                                         │   │
│  │  Bandwidth: 1Mbps → 3Mbps → 6Mbps                      │   │
│  │                                                         │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

HLS Manifest Example

# Master playlist (manifest.m3u8)
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=640x360
360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p/playlist.m3u8

# Quality playlist (1080p/playlist.m3u8)
#EXTM3U
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
segment_000.ts
#EXTINF:10.0,
segment_001.ts
#EXTINF:10.0,
segment_002.ts
...

Step 5: CDN Architecture

Multi-Tier CDN

┌─────────────────────────────────────────────────────────────────┐
│                    CDN Architecture                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                       ┌───────────────┐                        │
│                       │    Origin     │                        │
│                       │   (S3/Cloud)  │                        │
│                       └───────┬───────┘                        │
│                               │                                 │
│         ┌─────────────────────┼─────────────────────┐          │
│         │                     │                     │           │
│    ┌────▼────┐           ┌────▼────┐          ┌────▼────┐     │
│    │ Shield  │           │ Shield  │          │ Shield  │     │
│    │  (US)   │           │  (EU)   │          │ (Asia)  │     │
│    └────┬────┘           └────┬────┘          └────┬────┘     │
│         │                     │                    │           │
│    ┌────┴────────────────┐    │    ┌──────────────┴────┐      │
│    │                     │    │    │                   │       │
│  ┌─▼──┐ ┌───┐ ┌───┐   ┌──▼─┐ ┌▼──┐ ┌───┐    ┌───┐ ┌──▼─┐    │
│  │Edge│ │Edge│ │Edge│  │Edge│ │Edge│ │Edge│   │Edge│ │Edge│    │
│  │NYC │ │LA │ │CHI│   │LON│ │PAR│ │BER│    │TOK│ │SYD│     │
│  └─┬──┘ └─┬─┘ └─┬─┘   └─┬──┘ └─┬─┘ └─┬─┘    └─┬─┘ └─┬──┘    │
│    │      │     │       │      │     │        │     │        │
│    ▼      ▼     ▼       ▼      ▼     ▼        ▼     ▼        │
│  Users  Users  Users  Users  Users  Users   Users  Users     │
│                                                                 │
│  Cache Hierarchy:                                              │
│  1. Edge (closest to user) - most popular content             │
│  2. Shield (regional) - moderately popular                    │
│  3. Origin - all content                                      │
│                                                                 │
│  Netflix has 1000s of Open Connect Appliances in ISPs!        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

CDN Caching Strategy

Content TypeCache DurationLocation
Popular moviesDays-WeeksEdge + Shield
New releasesHoursShield, on-demand edge
Long tailOn-demandOrigin, cached on access
ThumbnailsDaysAll edges
Manifest filesSeconds-MinutesAll edges

Step 6: Data Models

Database Schema

-- Videos table
CREATE TABLE videos (
    id              UUID PRIMARY KEY,
    user_id         UUID NOT NULL,
    title           VARCHAR(255) NOT NULL,
    description     TEXT,
    duration        INT,              -- seconds
    status          VARCHAR(20),      -- processing, ready, failed
    view_count      BIGINT DEFAULT 0,
    like_count      BIGINT DEFAULT 0,
    created_at      TIMESTAMP,
    published_at    TIMESTAMP
);

-- Video files (multiple qualities)
CREATE TABLE video_files (
    id              UUID PRIMARY KEY,
    video_id        UUID NOT NULL,
    quality         VARCHAR(10),      -- 360p, 720p, 1080p, 4K
    format          VARCHAR(10),      -- hls, dash
    storage_path    VARCHAR(500),
    file_size       BIGINT,
    bitrate         INT
);

-- Watch history (for resume & recommendations)
CREATE TABLE watch_history (
    user_id         UUID,
    video_id        UUID,
    progress        INT,              -- seconds watched
    completed       BOOLEAN,
    watched_at      TIMESTAMP,
    PRIMARY KEY (user_id, video_id)
);

-- View counts (distributed counter)
-- Use Redis or dedicated counter service
-- Periodically sync to main database

Step 7: Recommendation System

┌─────────────────────────────────────────────────────────────────┐
│                 Recommendation Architecture                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Two approaches:                                                │
│                                                                 │
│  1. Content-Based Filtering                                    │
│     ─────────────────────────                                   │
│     "You watched action movies → recommend more action"        │
│     Based on: genre, actors, director, tags                    │
│                                                                 │
│  2. Collaborative Filtering                                     │
│     ────────────────────────                                    │
│     "Users like you also watched X"                            │
│     Based on: similar users' watch history                     │
│                                                                 │
│  Netflix uses both (Hybrid):                                   │
│                                                                 │
│  ┌───────────────┐    ┌───────────────┐    ┌───────────────┐  │
│  │  Watch        │    │   User        │    │   Content     │  │
│  │  History      │───►│   Profile     │───►│   Matching    │  │
│  │               │    │   Vector      │    │               │  │
│  └───────────────┘    └───────────────┘    └───────────────┘  │
│         │                    │                     │           │
│         ▼                    ▼                     ▼           │
│  ┌───────────────────────────────────────────────────────────┐│
│  │              ML Model (Matrix Factorization)              ││
│  │                                                           ││
│  │  User-Item Matrix → Latent Factors → Predictions         ││
│  │                                                           ││
│  └───────────────────────────────────────────────────────────┘│
│         │                                                      │
│         ▼                                                      │
│  ┌───────────────────────────────────────────────────────────┐│
│  │              Personalized Recommendations                 ││
│  │                                                           ││
│  │  "Top Picks for You"    "Because you watched X"          ││
│  │  "Trending Now"         "Continue Watching"              ││
│  │                                                           ││
│  └───────────────────────────────────────────────────────────┘│
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Reliability: Chaos Engineering

Chaos Monkey

Key Design Decisions

DecisionChoiceReasoning
StreamingHLS/DASHIndustry standard, adaptive bitrate
StorageS3 + CDNCost-effective, globally distributed
TranscodingParallel chunksFaster processing, scalable
CDNMulti-tierPopular content at edge, long-tail at origin
EncodingMultiple qualitiesSupport all bandwidths
DBPostgreSQL + RedisMetadata + counters/cache

Common Interview Questions

Different from VOD:
  1. No pre-transcoding — transcode in real-time using dedicated encoder clusters
  2. Use RTMP or SRT for ingest, HLS/DASH for delivery. The protocol mismatch is intentional: RTMP optimizes for low-latency ingest while HLS/DASH optimize for scalable CDN delivery
  3. Shorter segments (2-4 seconds vs 10 seconds) to reduce glass-to-glass latency. The trade-off: shorter segments mean more HTTP requests and worse compression efficiency
  4. Edge servers geographically close to the streamer for ingest, close to viewers for delivery — these are different sets of servers
  5. Latency spectrum: 5-30 seconds for standard live (sports), under 3 seconds for interactive (auctions, gaming). Sub-second requires WebRTC, not HLS
  1. Don’t update DB on every view — at 100K+ concurrent viewers on a popular video, that would be 100K writes per second to a single row (a hot partition nightmare)
  2. Use Redis to batch view counts: INCR video:123:views is O(1) and atomic
  3. Background job syncs Redis counters to the persistent database every 30-60 seconds
  4. Display the cached count — the slight lag (users see “1.2M views” instead of “1,200,047 views”) is imperceptible and acceptable
  5. For YouTube specifically, view counting includes fraud detection (bot views, repeated views) which adds a validation pipeline before counts are finalized. This is why YouTube view counts sometimes appear frozen for new viral videos
  1. Don’t transcode to higher resolution than the source — a 720p upload should never generate a 4K rendition
  2. Two-pass encoding for better quality at the same bitrate (Netflix uses per-title encoding to optimize bitrate for each piece of content individually)
  3. Netflix’s per-title encoding analyzes the complexity of each video. An animated film needs far less bitrate than an action movie at the same perceptual quality — this saves roughly 20% bandwidth
  4. Use hardware encoders (NVENC, QuickSync) for real-time needs. Software encoders (x264/x265) for quality-optimized offline transcoding
  5. Spot/preemptible instances for batch transcoding workers — Netflix saves roughly 90% on compute for non-urgent transcoding jobs. Checkpointing ensures work is not lost on preemption
  6. Skip rarely-requested qualities and generate them on-demand if a user specifically requests them

Key Trade-offs

DecisionOption AOption BRecommendation
CDN strategyThird-party CDN (Akamai/CloudFront)Custom CDN (Open Connect)Custom CDN at Netflix scale. Netflix delivers ~15% of all downstream internet traffic globally. At that volume, paying per-GB to a third-party CDN costs billions annually. Custom hardware appliances placed inside ISP networks pay for themselves within months. During peak hours, 90%+ of traffic is served from within the ISP network, never crossing the internet backbone. The trade-off: massive upfront capital investment and operational complexity. For a startup or service below ~1% of internet traffic, third-party CDN is correct. The crossover point is roughly 10-50 Tbps of sustained delivery.
Video codecH.264 (AVC)H.265 (HEVC) / AV1AV1 for new content, H.264 as fallback. AV1 achieves 30-50% bitrate savings over H.264 at equivalent quality, which directly reduces CDN bandwidth costs. The trade-off: AV1 encoding is 10-100x slower than H.264 and requires newer hardware for decoding. Netflix uses AV1 for devices that support hardware decode (most 2020+ smart TVs, Chrome, Android) and falls back to H.264/H.265 for older devices. Encode every title in both — storage is cheap compared to bandwidth savings at 10M concurrent viewers.
Transcoding approachFixed quality ladderPer-title encodingPer-title encoding for quality optimization. A fixed bitrate ladder (360p at 500kbps, 720p at 2.5Mbps, etc.) wastes bandwidth: an animated film looks perfect at 40% less bitrate than a live-action explosion scene. Netflix’s per-title encoding analyzes scene complexity and builds a custom quality ladder for each title, saving ~20% bandwidth on average. The trade-off: per-title analysis adds hours of compute per title. Under a tight deadline (48-hour content launch), use simplified two-pass encoding and accept ~10-15% higher bitrate.
Content pre-positioningOn-demand (lazy fill)Predictive (pre-warm)Predictive fill for popular content, on-demand for long tail. Pre-push new releases and trending titles to all edge locations before publish time, using recommendation engine predictions and marketing data. The long tail (rarely-watched content) stays at the origin and is fetched on-demand with an origin shield layer to prevent thundering herd. The trade-off: predictive fill wastes storage on edge nodes when predictions are wrong, but storage is cheap (~$0.02/GB/month) compared to the latency penalty of a cache miss (200ms+ from origin vs 5ms from edge).
Recommendation servingReal-time ML inferencePre-computed candidatesPre-computed candidates with real-time re-ranking. Running a full recommendation model for each of 100M+ DAU at page load is computationally impossible within a 200ms budget. Pre-compute the top 1000 candidates per user offline (nightly or every few hours), cache in Redis/DynamoDB, and apply lightweight re-ranking at read time (suppress recently watched, boost new releases, apply time-of-day context). The trade-off: recommendations are stale by up to a few hours, but the re-ranking layer handles the most important contextual adjustments. Netflix reportedly saves 80%+ of inference compute with this architecture.

Common Candidate Mistakes

Mistake 1: Designing a single monolithic CDN
  Netflix uses Open Connect -- hardware appliances placed directly
  inside ISP networks. During peak hours, 90%+ of Netflix traffic
  is served from within the ISP network, never crossing the
  internet backbone. This is fundamentally different from a
  traditional CDN and worth mentioning.

Mistake 2: Ignoring the cold-start problem for new content
  When a new show drops (e.g., a major Netflix Original), millions
  of users will request it simultaneously. You need a content
  pre-positioning strategy: push new releases to all edge locations
  before the publish time. This requires predictive analytics about
  which content will be popular in which regions.

Mistake 3: Not distinguishing between Netflix and YouTube patterns
  Netflix has ~10K titles, YouTube has 800M+. Netflix can afford to
  pre-encode every title at every quality level. YouTube cannot --
  it uses lazy transcoding for long-tail content and only pre-encodes
  at popular quality levels.

Mistake 4: Forgetting about the recommendation system's impact on CDN
  The recommendation system determines what content gets watched,
  which determines CDN cache hit rates. If recommendations push
  users toward the same popular content, cache efficiency is high.
  If recommendations are highly personalized toward niche content,
  CDN costs increase. This is a real tension in the system.

Interview Deep-Dive Questions

What the interviewer is really testing: Can you reason about a real-time engineering constraint against the full transcoding pipeline — chunking, parallelism, quality tiers, storage, and CDN pre-positioning — without hand-waving past the hard parts?Strong answer:
  • Chunk-based parallel transcoding is the core lever. Split the source file into 5-10 second chunks. Each chunk can be transcoded independently into all target quality levels (360p through 4K) across a fleet of workers. A 2-hour film becomes ~720 chunks, and if you have enough workers, all qualities for all chunks process concurrently. Netflix reportedly uses thousands of EC2 instances for a single title encode.
  • Per-title encoding is where you trade deadline against quality. Netflix’s per-title encoding analyzes scene complexity (animation needs ~40% less bitrate than live-action explosions at equivalent perceptual quality). Full per-title analysis can take hours of compute per quality ladder. Under a 48-hour deadline, you might run a simplified two-pass encode instead of the full optimization pipeline, accepting ~10-15% higher bitrate (more CDN cost) in exchange for faster availability.
  • Prioritize quality tiers by audience data. Not all renditions are equal. 1080p and 720p account for 70%+ of streams. Encode those first. 4K can follow. 360p is needed for mobile in emerging markets — queue it alongside 1080p since it is cheap. Skip encoding qualities higher than the source master (a 1080p source should never produce a 4K rendition).
  • CDN pre-positioning runs in parallel with later-stage encodes. As soon as the 1080p rendition finishes, start pushing it to regional shields and high-priority edge locations (based on subscriber density and predicted demand from marketing data). Do not wait for all renditions to finish before starting distribution.
  • Fault tolerance via checkpointing. Workers process on spot instances to save cost, but spot instances get terminated. Checkpoint each chunk’s progress to S3 so another worker can resume. Without this, a terminated instance wastes its entire chunk. Netflix reportedly saves ~90% on batch encoding compute through spot instances.
  • The manifest file is the last artifact. Generate the HLS/DASH manifests only after all segments for a given quality are verified. A premature manifest pointing to missing segments causes playback failures.
Red flag answer: “You just throw more machines at it” without discussing chunk boundaries, codec choices (H.264 vs H.265 vs AV1), quality ladder prioritization, or the interplay between transcoding completion and CDN distribution timing.Follow-up questions:
  1. What happens if a chunk transcodes successfully at 720p but fails at 1080p — can you serve a partial quality ladder, and what does the manifest look like?
  2. Netflix recently invested heavily in AV1 encoding. What is the trade-off between AV1 and H.265 in terms of encode time, decode complexity, and bandwidth savings, and how would that affect your 48-hour deadline?
What the interviewer is really testing: Do you understand multi-tier CDN architecture, cache economics, and the business reasoning behind build-vs-buy decisions for infrastructure at extreme scale?Strong answer:
  • The play-button flow involves a steering service, not just DNS. When a user presses play, the Netflix client contacts a steering service that considers the user’s ISP, geographic location, current server load, and content availability to select the optimal Open Connect Appliance (OCA). This is not simple DNS-based geographic routing — it is an active, real-time decision based on server health and capacity.
  • Open Connect Appliances are hardware boxes inside ISP networks. Netflix ships custom FreeBSD-based servers (with 100+ TB of SSD/HDD storage) and places them directly in ISP data centers. During peak hours, 90%+ of Netflix traffic is served from within the ISP’s own network, never crossing the internet backbone. This reduces latency, eliminates backbone congestion, and is free for the ISP (Netflix pays for the hardware and the ISP benefits from reduced transit costs).
  • The economic argument is straightforward. Netflix delivers ~15% of all downstream internet traffic globally. At that scale, paying Akamai or CloudFront per-GB would cost billions annually. The capital expenditure on custom hardware pays for itself within months. A single OCA serving 40 Gbps at a busy ISP replaces what would be enormous transit costs.
  • Content popularity drives the fill strategy. Each OCA has limited storage. Netflix’s predictive fill algorithm pushes the most-likely-to-be-watched content to each OCA overnight during off-peak hours. An OCA in Tokyo gets different content than one in Sao Paulo. This is driven by the recommendation engine’s predictions and regional content licensing. The long tail — content rarely watched in that region — is fetched on-demand from a regional cache or origin.
  • Fallback tiers handle cache misses. If the closest OCA does not have the content, the request falls back to a regional OCA cluster (shield tier), then to the S3 origin. Each tier adds latency: edge ~5ms, shield ~50ms, origin ~200ms+. The goal is >95% cache hit rate at the edge tier.
Red flag answer: “Netflix just uses CloudFront” or describing a single-tier CDN without discussing ISP peering, content popularity-based pre-positioning, or the economics of build-vs-buy at Netflix’s bandwidth scale.Follow-up questions:
  1. How does content licensing affect CDN architecture? If a title is licensed for the US but not the EU, how does the system prevent EU OCAs from caching and serving it?
  2. What happens during a major global launch (a new season of a massive show) when every OCA worldwide needs the same content simultaneously — how do you avoid origin overload?
What the interviewer is really testing: Can you bridge the gap between ML model complexity and real-time serving infrastructure, understanding the offline/online split, caching strategies, and the latency budget?Strong answer:
  • The key architectural split is offline training vs. online serving. Training (matrix factorization, deep learning models on viewing history) runs on massive Spark/GPU clusters over hours or days. Serving must return results in under 200ms. You cannot run a full model inference for each page load — the math does not work at 100M+ DAU.
  • Pre-compute and cache the heavy lifting. For each user, run the full recommendation model offline (nightly or every few hours) and store the top N (e.g., 1000) candidate titles with scores in a key-value store (Redis or DynamoDB). When the user opens the app, you fetch these pre-computed candidates and apply lightweight re-ranking in real-time based on context (time of day, device, recently watched).
  • Online re-ranking is where the 200ms budget is spent. The online layer takes the pre-computed candidate set and applies fast contextual adjustments: suppress titles the user started and abandoned, boost titles matching current viewing session mood, apply business rules (promote Netflix Originals). This re-ranking is a simple model (logistic regression or a small neural net) that runs in single-digit milliseconds.
  • Feature store is the connective tissue. Features like “user’s genre affinity vector” or “title popularity score this week” are computed offline and stored in a feature store (Netflix uses their custom system). Both the offline training pipeline and the online serving layer read from the same feature store, ensuring consistency.
  • The “rows” on the Netflix homepage are separate recommendations. “Top Picks for You,” “Because You Watched X,” and “Trending Now” are each generated by different algorithms. The page assembly service calls multiple recommendation endpoints in parallel, each with its own latency budget (~50ms each), and assembles the page. A slow endpoint gets a timeout and a fallback (e.g., show globally popular titles).
Red flag answer: “Just run the ML model when the user opens the app” without acknowledging latency constraints, or describing only collaborative filtering without mentioning serving infrastructure, caching, or the offline/online split.Follow-up questions:
  1. How do you handle the cold-start problem for a brand-new user who has no watch history — what do you recommend and what signals do you use?
  2. Netflix personalizes even the thumbnail artwork for each title per user. How would you architect a system that selects the optimal thumbnail from a set of candidates in real-time?
What the interviewer is really testing: Do you understand adaptive bitrate streaming at the implementation level — segment structure, manifest files, buffer management, and the quality switching algorithm — not just the high-level concept?Strong answer:
  • The client player maintains a playback buffer (typically 30-60 seconds ahead). When bandwidth drops, the buffer starts draining faster than it fills. The ABR algorithm detects this by measuring the ratio of segment download time to segment duration. If a 10-second segment at 1080p (5 Mbps) takes 25 seconds to download, the estimated bandwidth is 2 Mbps, and the algorithm must switch down.
  • Quality switching happens at segment boundaries, not mid-segment. The player finishes downloading the current segment at the old quality. The next segment request targets a lower quality tier (e.g., 720p at 2.5 Mbps, or 480p at 1 Mbps). The video file segments at each quality level are independently decodable — this is the entire point of the HLS/DASH segment structure.
  • Buffer-based algorithms (like Netflix’s) outperform throughput-based ones. Simple throughput-based ABR (measure last download speed, pick matching quality) oscillates badly because bandwidth is noisy. Netflix developed a buffer-based algorithm: if the buffer is above 30 seconds, the player can be aggressive (pick higher quality); if the buffer drops below 10 seconds, it must be conservative (pick lower quality). This avoids oscillation and reduces rebuffering by ~20%.
  • On the server/CDN side, each quality tier has its own set of segments. The master manifest (manifest.m3u8) lists all available quality tiers. Each tier has its own playlist pointing to its segments. The CDN serves whichever segment the client requests. There is no server-side decision-making about quality — the client drives all quality selection. This is critical for CDN cacheability: every client requesting 720p/segment_042 gets the same cached file.
  • The user experience during the switch: There is a brief visual quality drop (the image becomes slightly softer or more compressed). Well-designed players blend the transition so it is not jarring. Netflix also uses per-shot encoding to ensure each segment is encoded optimally for its content complexity, reducing visible quality differences between tiers.
Red flag answer: “The server detects the bandwidth drop and sends lower quality” — this reverses the architecture. The client makes all quality decisions. The server is a stateless segment store.Follow-up questions:
  1. What is the trade-off between shorter segments (2 seconds) and longer segments (10 seconds) in terms of encoding efficiency, quality switching responsiveness, and CDN load?
  2. How would you implement adaptive bitrate for a live stream where you cannot pre-encode segments — what changes in the pipeline?
What the interviewer is really testing: Can you think beyond pure engineering into the intersection of technical architecture and business/legal constraints, which is what staff-level engineers encounter daily?Strong answer:
  • Content licensing creates per-region catalogs, which fragments the content graph. A title available in the US may not exist in France due to a pre-existing exclusive deal with a French broadcaster. The metadata service must maintain region-aware catalogs: when a user in France browses, they see a different title list than a user in the US. This is a filter applied at the API layer, not at the CDN layer.
  • CDN content must respect geographic restrictions. An Open Connect Appliance in France must not cache or serve US-only content. The fill algorithm (which decides what content to push to each OCA) is region-aware. If a title is not licensed in a region, it is excluded from the fill manifest for OCAs in that region. If a user uses a VPN to appear in the US, the steering service may route them to a US OCA, but Netflix’s VPN detection system actively blocks this.
  • The recommendation engine must be license-aware. Recommending a title to a user that the user cannot watch (because it is not licensed in their country) is worse than not recommending it. The candidate generation step must filter by the user’s region. This means the pre-computed recommendation cache is per-region-per-user, not global-per-user — multiplying storage and compute costs.
  • Licensing windows are temporal, not just geographic. A title might be available in Germany from January to June, then leave (because the license expired or was sold to another platform). The system must handle time-based activation and deactivation: content metadata includes available_from and available_until timestamps per region. CDN fill algorithms must also purge content from OCAs when the license expires.
  • Measurement and compliance are required. Content owners want proof that their content was only served in licensed territories. Netflix must maintain detailed access logs showing which regions accessed which titles. This is both a technical requirement (logging at the CDN edge) and a legal obligation (audit-ready reports).
Red flag answer: “Just use geo-IP to block access” as the entire strategy, without discussing how licensing affects the content pipeline, recommendations, CDN fill strategy, or temporal license windows.Follow-up questions:
  1. How would you handle a scenario where a user starts watching a title in the US (licensed), then flies to a country where it is not licensed and tries to resume on the plane’s Wi-Fi?
  2. What architectural changes would you make to support “download for offline viewing” given licensing constraints — how do you enforce license expiration on a device that may be offline?
What the interviewer is really testing: Fault tolerance and idempotency in distributed data pipelines — can you design a system that makes progress even when individual components fail?Strong answer:
  • The pipeline is modeled as a DAG of idempotent tasks, not a monolithic job. Each chunk x quality combination is an independent task (e.g., “transcode chunk 247 at 1080p”). A workflow orchestrator (like Netflix’s Conductor, or Apache Airflow) tracks task states. If a worker crashes on chunk 247, only that task is retried — not the entire video.
  • Checkpointing to object storage is the resilience mechanism. Each completed chunk is written to S3 immediately. The orchestrator records “chunk 247 at 1080p: complete, stored at s3://bucket/video123/1080p/chunk_247.ts”. When the final assembly step runs, it reads the manifest of completed chunks. Missing chunks are re-queued.
  • Workers are stateless and pull from a task queue. A worker picks up a task from SQS or Kafka, downloads the source chunk from S3, transcodes it, uploads the output to S3, and acknowledges the task. If the worker dies mid-processing, the task visibility timeout expires and another worker picks it up. This is the standard “competing consumers” pattern.
  • Idempotency is enforced by task ID. Each task has a deterministic ID (e.g., video123-1080p-chunk247). If a worker completes the task but crashes before acknowledging it, the task is picked up again. The new worker writes the output to the same S3 path, overwriting the previous output (which is identical). No duplication, no corruption.
  • Poison pill detection prevents infinite retry loops. If a specific chunk consistently fails (corrupted source data, codec bug), the orchestrator marks it as failed after N retries (typically 3). The entire video is flagged for manual review. Without this, a single bad chunk could consume worker capacity indefinitely.
  • The merge step validates completeness before publishing. Before generating the final HLS/DASH manifest, verify that every chunk for every quality tier exists and passes integrity checks (file size, duration, codec metadata). A missing or corrupt chunk halts publishing and triggers a targeted re-encode.
Red flag answer: “Restart the whole transcoding job from scratch” or not mentioning idempotency, checkpointing, or how the orchestrator tracks partial progress.Follow-up questions:
  1. How would you prioritize retries for a high-profile launch (a major Netflix Original premiering tonight) vs. a catalog backfill operation — do they share the same worker pool?
  2. What monitoring and alerting would you put in place to detect a systemic issue (e.g., a codec bug causing all 4K encodes to produce corrupt output) before it affects user-visible content?
What the interviewer is really testing: Staff-level systems thinking — can you navigate a cross-team trade-off where optimizing one subsystem (recommendations) degrades another (CDN efficiency), and propose a solution that balances both?Strong answer:
  • This is a real tension Netflix has documented publicly. The recommendation system’s job is to match users with content they will enjoy, which often means niche titles. But the CDN’s efficiency depends on many users requesting the same content (cache hit = content already on the edge server). These goals directly conflict.
  • Quantify the impact before reacting. A drop from 95% to 78% cache hit rate means 22% of requests now hit the shield or origin tier. At 50 Tbps peak bandwidth, that is an additional ~11 Tbps of cross-network traffic. This has a real dollar cost (bandwidth, origin server load) that can be estimated and weighed against the engagement lift from better personalized recommendations.
  • Solution 1: Recommendation-aware CDN pre-fill. Feed the recommendation engine’s predictions into the CDN fill algorithm. If the recommendation system predicts that 5,000 users in the Tokyo region will be recommended a specific niche documentary tonight, pre-fill Tokyo OCAs with that title. This turns predicted demand into cache hits. The cost is additional storage on edge servers.
  • Solution 2: Tiered caching with recommendation influence. Adjust the CDN cache eviction policy to consider “recommendation score” alongside “access recency.” A title that is actively being recommended to many users in a region gets a higher cache priority than one that was accessed once. This requires a feedback loop from the recommendation service to the CDN control plane.
  • Solution 3: Soft constraints in the recommendation algorithm. Add a “CDN friendliness” signal as a small negative weight for titles not currently cached at the user’s nearest edge. This slightly biases recommendations toward cached content without fundamentally compromising personalization. The weight should be small enough that a highly relevant niche title still wins, but it tips the balance for near-equal candidates.
  • The organizational dimension matters. This requires the recommendation team and the infrastructure team to share metrics and jointly optimize. In most organizations, these teams have separate OKRs. A staff engineer’s job is to identify this misalignment and propose a shared metric (e.g., “engagement per CDN dollar”).
Red flag answer: “Just increase CDN capacity” without acknowledging the systemic tension, or “just stop recommending niche content” without understanding the business value of personalization.Follow-up questions:
  1. How would you design an A/B test to measure the impact of the “CDN friendliness” recommendation signal on both user engagement and infrastructure cost?
  2. If recommendation-aware pre-fill causes OCA storage to exceed capacity, how would you decide which content to evict — what eviction policy balances recency, popularity, and predicted future demand?
What the interviewer is really testing: Production debugging skills — can you reason through a real-world streaming issue that spans client, CDN, and backend, using systematic elimination rather than guessing?Strong answer:
  • Start with the client-side ABR telemetry. Netflix clients report detailed playback metrics: buffer level over time, estimated bandwidth per segment, quality tier switches, and rebuffer events. Pull this user’s session data. If the client-side bandwidth estimate shows 50 Mbps but the ABR algorithm still selects 360p, the issue is in the ABR logic or buffer state, not the network.
  • Check for a specific CDN edge issue. Identify which OCA served this session. If that OCA’s disk is degraded (slow reads), segment download times will be high even though the user’s ISP link is fast. The ABR algorithm measures segment download throughput, not the user’s raw link speed. A slow CDN server looks identical to a slow network from the client’s perspective. Compare this user’s experience against other users served by the same OCA.
  • Investigate DRM license latency. If the DRM license server is slow to respond, the player may pause or buffer while waiting for decryption keys, which the ABR algorithm interprets as network congestion. Check the license acquisition latency for this session.
  • Check for ISP-level throttling. Some ISPs throttle specific traffic patterns (e.g., sustained high-bandwidth video streams) after an initial burst period. The speed test uses a short burst that does not trigger throttling. This explains “speed test shows 50 Mbps but streaming is slow.” Netflix’s ISP Speed Index tracks this per-ISP. Verify whether other users on the same ISP show similar patterns.
  • Smart TV firmware and app version matter. Older smart TV apps may have buggy ABR implementations that fail to recover from a quality downgrade. Check whether this is a known issue for this device model and app version. Netflix maintains device-specific quality caps and workarounds.
  • Thermal throttling on the device. Some smart TVs throttle their network chipset when the device overheats (after 2+ hours of streaming). This is a hardware limitation that causes real throughput drops that a network speed test (run for 30 seconds) would never catch.
Red flag answer: “Their internet must be slow” or jumping straight to a single cause without a systematic elimination approach. Also a red flag: not knowing that ABR measures download throughput per segment, not the user’s raw link speed.Follow-up questions:
  1. How would you build a monitoring dashboard that proactively detects the class of issue described above (CDN edge degradation causing widespread quality drops) before users report it?
  2. Netflix serves hundreds of device types (smart TVs, phones, tablets, game consoles). How do you handle device-specific ABR tuning at scale?