Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Problem Statement
Design a video streaming service like Netflix or YouTube that:- Allows users to upload and stream videos
- Handles millions of concurrent viewers
- Provides smooth playback globally
- Supports different video qualities (adaptive streaming)
Step 1: Requirements
Functional Requirements
Core Features
- Upload videos
- Stream videos
- Search videos
- Recommendations
Playback Features
- Adaptive bitrate streaming
- Resume playback
- Multiple device support
- Subtitles/captions
Non-Functional Requirements
- Low Latency: Video starts in < 2 seconds
- High Availability: 99.99% uptime
- Global Scale: Serve users worldwide
- Storage: Handle petabytes of video content
Capacity Estimation
Step 2: High-Level Design
Step 3: Video Upload & Processing
Upload Flow
Video Transcoding
Step 4: Video Streaming (Adaptive Bitrate)
How Adaptive Streaming Works
HLS Manifest Example
Step 5: CDN Architecture
Multi-Tier CDN
CDN Caching Strategy
| Content Type | Cache Duration | Location |
|---|---|---|
| Popular movies | Days-Weeks | Edge + Shield |
| New releases | Hours | Shield, on-demand edge |
| Long tail | On-demand | Origin, cached on access |
| Thumbnails | Days | All edges |
| Manifest files | Seconds-Minutes | All edges |
Step 6: Data Models
Database Schema
Step 7: Recommendation System
Reliability: Chaos Engineering
Key Design Decisions
| Decision | Choice | Reasoning |
|---|---|---|
| Streaming | HLS/DASH | Industry standard, adaptive bitrate |
| Storage | S3 + CDN | Cost-effective, globally distributed |
| Transcoding | Parallel chunks | Faster processing, scalable |
| CDN | Multi-tier | Popular content at edge, long-tail at origin |
| Encoding | Multiple qualities | Support all bandwidths |
| DB | PostgreSQL + Redis | Metadata + counters/cache |
Common Interview Questions
How do you handle live streaming?
How do you handle live streaming?
- No pre-transcoding — transcode in real-time using dedicated encoder clusters
- Use RTMP or SRT for ingest, HLS/DASH for delivery. The protocol mismatch is intentional: RTMP optimizes for low-latency ingest while HLS/DASH optimize for scalable CDN delivery
- Shorter segments (2-4 seconds vs 10 seconds) to reduce glass-to-glass latency. The trade-off: shorter segments mean more HTTP requests and worse compression efficiency
- Edge servers geographically close to the streamer for ingest, close to viewers for delivery — these are different sets of servers
- Latency spectrum: 5-30 seconds for standard live (sports), under 3 seconds for interactive (auctions, gaming). Sub-second requires WebRTC, not HLS
How do you handle copyright/DRM?
How do you handle copyright/DRM?
- Content ID system for upload scanning — YouTube processes over 500 hours of uploads per minute and checks each against a reference database of millions of copyrighted works
- DRM: Widevine (Chrome/Android), FairPlay (Apple), PlayReady (Microsoft). In practice you must support all three, which means encrypting content three times with different key management
- Encrypted video segments using CENC (Common Encryption) to avoid triple-encrypting where possible
- License server for key delivery — the player requests a decryption key on each playback, enabling revocation
- Forensic watermarking embeds a unique, invisible identifier per user session to trace leaks back to the source account
How does the view counter work at scale?
How does the view counter work at scale?
- Don’t update DB on every view — at 100K+ concurrent viewers on a popular video, that would be 100K writes per second to a single row (a hot partition nightmare)
- Use Redis to batch view counts:
INCR video:123:viewsis O(1) and atomic - Background job syncs Redis counters to the persistent database every 30-60 seconds
- Display the cached count — the slight lag (users see “1.2M views” instead of “1,200,047 views”) is imperceptible and acceptable
- For YouTube specifically, view counting includes fraud detection (bot views, repeated views) which adds a validation pipeline before counts are finalized. This is why YouTube view counts sometimes appear frozen for new viral videos
How do you reduce transcoding costs?
How do you reduce transcoding costs?
- Don’t transcode to higher resolution than the source — a 720p upload should never generate a 4K rendition
- Two-pass encoding for better quality at the same bitrate (Netflix uses per-title encoding to optimize bitrate for each piece of content individually)
- Netflix’s per-title encoding analyzes the complexity of each video. An animated film needs far less bitrate than an action movie at the same perceptual quality — this saves roughly 20% bandwidth
- Use hardware encoders (NVENC, QuickSync) for real-time needs. Software encoders (x264/x265) for quality-optimized offline transcoding
- Spot/preemptible instances for batch transcoding workers — Netflix saves roughly 90% on compute for non-urgent transcoding jobs. Checkpointing ensures work is not lost on preemption
- Skip rarely-requested qualities and generate them on-demand if a user specifically requests them
Key Trade-offs
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| CDN strategy | Third-party CDN (Akamai/CloudFront) | Custom CDN (Open Connect) | Custom CDN at Netflix scale. Netflix delivers ~15% of all downstream internet traffic globally. At that volume, paying per-GB to a third-party CDN costs billions annually. Custom hardware appliances placed inside ISP networks pay for themselves within months. During peak hours, 90%+ of traffic is served from within the ISP network, never crossing the internet backbone. The trade-off: massive upfront capital investment and operational complexity. For a startup or service below ~1% of internet traffic, third-party CDN is correct. The crossover point is roughly 10-50 Tbps of sustained delivery. |
| Video codec | H.264 (AVC) | H.265 (HEVC) / AV1 | AV1 for new content, H.264 as fallback. AV1 achieves 30-50% bitrate savings over H.264 at equivalent quality, which directly reduces CDN bandwidth costs. The trade-off: AV1 encoding is 10-100x slower than H.264 and requires newer hardware for decoding. Netflix uses AV1 for devices that support hardware decode (most 2020+ smart TVs, Chrome, Android) and falls back to H.264/H.265 for older devices. Encode every title in both — storage is cheap compared to bandwidth savings at 10M concurrent viewers. |
| Transcoding approach | Fixed quality ladder | Per-title encoding | Per-title encoding for quality optimization. A fixed bitrate ladder (360p at 500kbps, 720p at 2.5Mbps, etc.) wastes bandwidth: an animated film looks perfect at 40% less bitrate than a live-action explosion scene. Netflix’s per-title encoding analyzes scene complexity and builds a custom quality ladder for each title, saving ~20% bandwidth on average. The trade-off: per-title analysis adds hours of compute per title. Under a tight deadline (48-hour content launch), use simplified two-pass encoding and accept ~10-15% higher bitrate. |
| Content pre-positioning | On-demand (lazy fill) | Predictive (pre-warm) | Predictive fill for popular content, on-demand for long tail. Pre-push new releases and trending titles to all edge locations before publish time, using recommendation engine predictions and marketing data. The long tail (rarely-watched content) stays at the origin and is fetched on-demand with an origin shield layer to prevent thundering herd. The trade-off: predictive fill wastes storage on edge nodes when predictions are wrong, but storage is cheap (~$0.02/GB/month) compared to the latency penalty of a cache miss (200ms+ from origin vs 5ms from edge). |
| Recommendation serving | Real-time ML inference | Pre-computed candidates | Pre-computed candidates with real-time re-ranking. Running a full recommendation model for each of 100M+ DAU at page load is computationally impossible within a 200ms budget. Pre-compute the top 1000 candidates per user offline (nightly or every few hours), cache in Redis/DynamoDB, and apply lightweight re-ranking at read time (suppress recently watched, boost new releases, apply time-of-day context). The trade-off: recommendations are stale by up to a few hours, but the re-ranking layer handles the most important contextual adjustments. Netflix reportedly saves 80%+ of inference compute with this architecture. |
Common Candidate Mistakes
Interview Deep-Dive Questions
You're designing the video encoding pipeline for a Netflix-scale service. A major original series just finished filming and needs to be available globally in 48 hours. Walk me through the pipeline and the trade-offs you'd make to meet that deadline.
You're designing the video encoding pipeline for a Netflix-scale service. A major original series just finished filming and needs to be available globally in 48 hours. Walk me through the pipeline and the trade-offs you'd make to meet that deadline.
- Chunk-based parallel transcoding is the core lever. Split the source file into 5-10 second chunks. Each chunk can be transcoded independently into all target quality levels (360p through 4K) across a fleet of workers. A 2-hour film becomes ~720 chunks, and if you have enough workers, all qualities for all chunks process concurrently. Netflix reportedly uses thousands of EC2 instances for a single title encode.
- Per-title encoding is where you trade deadline against quality. Netflix’s per-title encoding analyzes scene complexity (animation needs ~40% less bitrate than live-action explosions at equivalent perceptual quality). Full per-title analysis can take hours of compute per quality ladder. Under a 48-hour deadline, you might run a simplified two-pass encode instead of the full optimization pipeline, accepting ~10-15% higher bitrate (more CDN cost) in exchange for faster availability.
- Prioritize quality tiers by audience data. Not all renditions are equal. 1080p and 720p account for 70%+ of streams. Encode those first. 4K can follow. 360p is needed for mobile in emerging markets — queue it alongside 1080p since it is cheap. Skip encoding qualities higher than the source master (a 1080p source should never produce a 4K rendition).
- CDN pre-positioning runs in parallel with later-stage encodes. As soon as the 1080p rendition finishes, start pushing it to regional shields and high-priority edge locations (based on subscriber density and predicted demand from marketing data). Do not wait for all renditions to finish before starting distribution.
- Fault tolerance via checkpointing. Workers process on spot instances to save cost, but spot instances get terminated. Checkpoint each chunk’s progress to S3 so another worker can resume. Without this, a terminated instance wastes its entire chunk. Netflix reportedly saves ~90% on batch encoding compute through spot instances.
- The manifest file is the last artifact. Generate the HLS/DASH manifests only after all segments for a given quality are verified. A premature manifest pointing to missing segments causes playback failures.
- What happens if a chunk transcodes successfully at 720p but fails at 1080p — can you serve a partial quality ladder, and what does the manifest look like?
- Netflix recently invested heavily in AV1 encoding. What is the trade-off between AV1 and H.265 in terms of encode time, decode complexity, and bandwidth savings, and how would that affect your 48-hour deadline?
How does Netflix's CDN selection work when a user presses play, and why did Netflix build Open Connect instead of just using a third-party CDN like Akamai or CloudFront?
How does Netflix's CDN selection work when a user presses play, and why did Netflix build Open Connect instead of just using a third-party CDN like Akamai or CloudFront?
- The play-button flow involves a steering service, not just DNS. When a user presses play, the Netflix client contacts a steering service that considers the user’s ISP, geographic location, current server load, and content availability to select the optimal Open Connect Appliance (OCA). This is not simple DNS-based geographic routing — it is an active, real-time decision based on server health and capacity.
- Open Connect Appliances are hardware boxes inside ISP networks. Netflix ships custom FreeBSD-based servers (with 100+ TB of SSD/HDD storage) and places them directly in ISP data centers. During peak hours, 90%+ of Netflix traffic is served from within the ISP’s own network, never crossing the internet backbone. This reduces latency, eliminates backbone congestion, and is free for the ISP (Netflix pays for the hardware and the ISP benefits from reduced transit costs).
- The economic argument is straightforward. Netflix delivers ~15% of all downstream internet traffic globally. At that scale, paying Akamai or CloudFront per-GB would cost billions annually. The capital expenditure on custom hardware pays for itself within months. A single OCA serving 40 Gbps at a busy ISP replaces what would be enormous transit costs.
- Content popularity drives the fill strategy. Each OCA has limited storage. Netflix’s predictive fill algorithm pushes the most-likely-to-be-watched content to each OCA overnight during off-peak hours. An OCA in Tokyo gets different content than one in Sao Paulo. This is driven by the recommendation engine’s predictions and regional content licensing. The long tail — content rarely watched in that region — is fetched on-demand from a regional cache or origin.
- Fallback tiers handle cache misses. If the closest OCA does not have the content, the request falls back to a regional OCA cluster (shield tier), then to the S3 origin. Each tier adds latency: edge ~5ms, shield ~50ms, origin ~200ms+. The goal is >95% cache hit rate at the edge tier.
- How does content licensing affect CDN architecture? If a title is licensed for the US but not the EU, how does the system prevent EU OCAs from caching and serving it?
- What happens during a major global launch (a new season of a massive show) when every OCA worldwide needs the same content simultaneously — how do you avoid origin overload?
Netflix claims their recommendation engine saves them $1 billion per year. How would you architect a recommendation system that can return personalized results in under 200ms for 200 million subscribers?
Netflix claims their recommendation engine saves them $1 billion per year. How would you architect a recommendation system that can return personalized results in under 200ms for 200 million subscribers?
- The key architectural split is offline training vs. online serving. Training (matrix factorization, deep learning models on viewing history) runs on massive Spark/GPU clusters over hours or days. Serving must return results in under 200ms. You cannot run a full model inference for each page load — the math does not work at 100M+ DAU.
- Pre-compute and cache the heavy lifting. For each user, run the full recommendation model offline (nightly or every few hours) and store the top N (e.g., 1000) candidate titles with scores in a key-value store (Redis or DynamoDB). When the user opens the app, you fetch these pre-computed candidates and apply lightweight re-ranking in real-time based on context (time of day, device, recently watched).
- Online re-ranking is where the 200ms budget is spent. The online layer takes the pre-computed candidate set and applies fast contextual adjustments: suppress titles the user started and abandoned, boost titles matching current viewing session mood, apply business rules (promote Netflix Originals). This re-ranking is a simple model (logistic regression or a small neural net) that runs in single-digit milliseconds.
- Feature store is the connective tissue. Features like “user’s genre affinity vector” or “title popularity score this week” are computed offline and stored in a feature store (Netflix uses their custom system). Both the offline training pipeline and the online serving layer read from the same feature store, ensuring consistency.
- The “rows” on the Netflix homepage are separate recommendations. “Top Picks for You,” “Because You Watched X,” and “Trending Now” are each generated by different algorithms. The page assembly service calls multiple recommendation endpoints in parallel, each with its own latency budget (~50ms each), and assembles the page. A slow endpoint gets a timeout and a fallback (e.g., show globally popular titles).
- How do you handle the cold-start problem for a brand-new user who has no watch history — what do you recommend and what signals do you use?
- Netflix personalizes even the thumbnail artwork for each title per user. How would you architect a system that selects the optimal thumbnail from a set of candidates in real-time?
A user is streaming a movie and their bandwidth drops from 10 Mbps to 2 Mbps mid-stream. Walk me through exactly what happens in the adaptive bitrate system, including the client-side algorithm and the server-side segment structure.
A user is streaming a movie and their bandwidth drops from 10 Mbps to 2 Mbps mid-stream. Walk me through exactly what happens in the adaptive bitrate system, including the client-side algorithm and the server-side segment structure.
- The client player maintains a playback buffer (typically 30-60 seconds ahead). When bandwidth drops, the buffer starts draining faster than it fills. The ABR algorithm detects this by measuring the ratio of segment download time to segment duration. If a 10-second segment at 1080p (5 Mbps) takes 25 seconds to download, the estimated bandwidth is 2 Mbps, and the algorithm must switch down.
- Quality switching happens at segment boundaries, not mid-segment. The player finishes downloading the current segment at the old quality. The next segment request targets a lower quality tier (e.g., 720p at 2.5 Mbps, or 480p at 1 Mbps). The video file segments at each quality level are independently decodable — this is the entire point of the HLS/DASH segment structure.
- Buffer-based algorithms (like Netflix’s) outperform throughput-based ones. Simple throughput-based ABR (measure last download speed, pick matching quality) oscillates badly because bandwidth is noisy. Netflix developed a buffer-based algorithm: if the buffer is above 30 seconds, the player can be aggressive (pick higher quality); if the buffer drops below 10 seconds, it must be conservative (pick lower quality). This avoids oscillation and reduces rebuffering by ~20%.
- On the server/CDN side, each quality tier has its own set of segments. The master manifest (
manifest.m3u8) lists all available quality tiers. Each tier has its own playlist pointing to its segments. The CDN serves whichever segment the client requests. There is no server-side decision-making about quality — the client drives all quality selection. This is critical for CDN cacheability: every client requesting 720p/segment_042 gets the same cached file. - The user experience during the switch: There is a brief visual quality drop (the image becomes slightly softer or more compressed). Well-designed players blend the transition so it is not jarring. Netflix also uses per-shot encoding to ensure each segment is encoded optimally for its content complexity, reducing visible quality differences between tiers.
- What is the trade-off between shorter segments (2 seconds) and longer segments (10 seconds) in terms of encoding efficiency, quality switching responsiveness, and CDN load?
- How would you implement adaptive bitrate for a live stream where you cannot pre-encode segments — what changes in the pipeline?
Netflix operates in 190+ countries, but content licensing means different regions have different content catalogs. How does this constraint affect the system architecture from content storage through to CDN and the client experience?
Netflix operates in 190+ countries, but content licensing means different regions have different content catalogs. How does this constraint affect the system architecture from content storage through to CDN and the client experience?
- Content licensing creates per-region catalogs, which fragments the content graph. A title available in the US may not exist in France due to a pre-existing exclusive deal with a French broadcaster. The metadata service must maintain region-aware catalogs: when a user in France browses, they see a different title list than a user in the US. This is a filter applied at the API layer, not at the CDN layer.
- CDN content must respect geographic restrictions. An Open Connect Appliance in France must not cache or serve US-only content. The fill algorithm (which decides what content to push to each OCA) is region-aware. If a title is not licensed in a region, it is excluded from the fill manifest for OCAs in that region. If a user uses a VPN to appear in the US, the steering service may route them to a US OCA, but Netflix’s VPN detection system actively blocks this.
- The recommendation engine must be license-aware. Recommending a title to a user that the user cannot watch (because it is not licensed in their country) is worse than not recommending it. The candidate generation step must filter by the user’s region. This means the pre-computed recommendation cache is per-region-per-user, not global-per-user — multiplying storage and compute costs.
- Licensing windows are temporal, not just geographic. A title might be available in Germany from January to June, then leave (because the license expired or was sold to another platform). The system must handle time-based activation and deactivation: content metadata includes
available_fromandavailable_untiltimestamps per region. CDN fill algorithms must also purge content from OCAs when the license expires. - Measurement and compliance are required. Content owners want proof that their content was only served in licensed territories. Netflix must maintain detailed access logs showing which regions accessed which titles. This is both a technical requirement (logging at the CDN edge) and a legal obligation (audit-ready reports).
- How would you handle a scenario where a user starts watching a title in the US (licensed), then flies to a country where it is not licensed and tries to resume on the plane’s Wi-Fi?
- What architectural changes would you make to support “download for offline viewing” given licensing constraints — how do you enforce license expiration on a device that may be offline?
Netflix processes 500+ hours of video uploads per minute (YouTube-scale). How would you design the video processing pipeline to handle failures gracefully -- a transcoding worker crashes at chunk 247 of a 720-chunk video?
Netflix processes 500+ hours of video uploads per minute (YouTube-scale). How would you design the video processing pipeline to handle failures gracefully -- a transcoding worker crashes at chunk 247 of a 720-chunk video?
- The pipeline is modeled as a DAG of idempotent tasks, not a monolithic job. Each chunk x quality combination is an independent task (e.g., “transcode chunk 247 at 1080p”). A workflow orchestrator (like Netflix’s Conductor, or Apache Airflow) tracks task states. If a worker crashes on chunk 247, only that task is retried — not the entire video.
- Checkpointing to object storage is the resilience mechanism. Each completed chunk is written to S3 immediately. The orchestrator records “chunk 247 at 1080p: complete, stored at s3://bucket/video123/1080p/chunk_247.ts”. When the final assembly step runs, it reads the manifest of completed chunks. Missing chunks are re-queued.
- Workers are stateless and pull from a task queue. A worker picks up a task from SQS or Kafka, downloads the source chunk from S3, transcodes it, uploads the output to S3, and acknowledges the task. If the worker dies mid-processing, the task visibility timeout expires and another worker picks it up. This is the standard “competing consumers” pattern.
- Idempotency is enforced by task ID. Each task has a deterministic ID (e.g.,
video123-1080p-chunk247). If a worker completes the task but crashes before acknowledging it, the task is picked up again. The new worker writes the output to the same S3 path, overwriting the previous output (which is identical). No duplication, no corruption. - Poison pill detection prevents infinite retry loops. If a specific chunk consistently fails (corrupted source data, codec bug), the orchestrator marks it as failed after N retries (typically 3). The entire video is flagged for manual review. Without this, a single bad chunk could consume worker capacity indefinitely.
- The merge step validates completeness before publishing. Before generating the final HLS/DASH manifest, verify that every chunk for every quality tier exists and passes integrity checks (file size, duration, codec metadata). A missing or corrupt chunk halts publishing and triggers a targeted re-encode.
- How would you prioritize retries for a high-profile launch (a major Netflix Original premiering tonight) vs. a catalog backfill operation — do they share the same worker pool?
- What monitoring and alerting would you put in place to detect a systemic issue (e.g., a codec bug causing all 4K encodes to produce corrupt output) before it affects user-visible content?
The recommendation system is pushing users toward niche, long-tail content (highly personalized). But your CDN team is complaining that cache hit rates have dropped from 95% to 78%. How do you resolve this tension?
The recommendation system is pushing users toward niche, long-tail content (highly personalized). But your CDN team is complaining that cache hit rates have dropped from 95% to 78%. How do you resolve this tension?
- This is a real tension Netflix has documented publicly. The recommendation system’s job is to match users with content they will enjoy, which often means niche titles. But the CDN’s efficiency depends on many users requesting the same content (cache hit = content already on the edge server). These goals directly conflict.
- Quantify the impact before reacting. A drop from 95% to 78% cache hit rate means 22% of requests now hit the shield or origin tier. At 50 Tbps peak bandwidth, that is an additional ~11 Tbps of cross-network traffic. This has a real dollar cost (bandwidth, origin server load) that can be estimated and weighed against the engagement lift from better personalized recommendations.
- Solution 1: Recommendation-aware CDN pre-fill. Feed the recommendation engine’s predictions into the CDN fill algorithm. If the recommendation system predicts that 5,000 users in the Tokyo region will be recommended a specific niche documentary tonight, pre-fill Tokyo OCAs with that title. This turns predicted demand into cache hits. The cost is additional storage on edge servers.
- Solution 2: Tiered caching with recommendation influence. Adjust the CDN cache eviction policy to consider “recommendation score” alongside “access recency.” A title that is actively being recommended to many users in a region gets a higher cache priority than one that was accessed once. This requires a feedback loop from the recommendation service to the CDN control plane.
- Solution 3: Soft constraints in the recommendation algorithm. Add a “CDN friendliness” signal as a small negative weight for titles not currently cached at the user’s nearest edge. This slightly biases recommendations toward cached content without fundamentally compromising personalization. The weight should be small enough that a highly relevant niche title still wins, but it tips the balance for near-equal candidates.
- The organizational dimension matters. This requires the recommendation team and the infrastructure team to share metrics and jointly optimize. In most organizations, these teams have separate OKRs. A staff engineer’s job is to identify this misalignment and propose a shared metric (e.g., “engagement per CDN dollar”).
- How would you design an A/B test to measure the impact of the “CDN friendliness” recommendation signal on both user engagement and infrastructure cost?
- If recommendation-aware pre-fill causes OCA storage to exceed capacity, how would you decide which content to evict — what eviction policy balances recency, popularity, and predicted future demand?
A user reports that video playback on their smart TV starts fine but quality degrades to 360p after 2 minutes and never recovers, even though their internet speed test shows 50 Mbps. How do you debug this?
A user reports that video playback on their smart TV starts fine but quality degrades to 360p after 2 minutes and never recovers, even though their internet speed test shows 50 Mbps. How do you debug this?
- Start with the client-side ABR telemetry. Netflix clients report detailed playback metrics: buffer level over time, estimated bandwidth per segment, quality tier switches, and rebuffer events. Pull this user’s session data. If the client-side bandwidth estimate shows 50 Mbps but the ABR algorithm still selects 360p, the issue is in the ABR logic or buffer state, not the network.
- Check for a specific CDN edge issue. Identify which OCA served this session. If that OCA’s disk is degraded (slow reads), segment download times will be high even though the user’s ISP link is fast. The ABR algorithm measures segment download throughput, not the user’s raw link speed. A slow CDN server looks identical to a slow network from the client’s perspective. Compare this user’s experience against other users served by the same OCA.
- Investigate DRM license latency. If the DRM license server is slow to respond, the player may pause or buffer while waiting for decryption keys, which the ABR algorithm interprets as network congestion. Check the license acquisition latency for this session.
- Check for ISP-level throttling. Some ISPs throttle specific traffic patterns (e.g., sustained high-bandwidth video streams) after an initial burst period. The speed test uses a short burst that does not trigger throttling. This explains “speed test shows 50 Mbps but streaming is slow.” Netflix’s ISP Speed Index tracks this per-ISP. Verify whether other users on the same ISP show similar patterns.
- Smart TV firmware and app version matter. Older smart TV apps may have buggy ABR implementations that fail to recover from a quality downgrade. Check whether this is a known issue for this device model and app version. Netflix maintains device-specific quality caps and workarounds.
- Thermal throttling on the device. Some smart TVs throttle their network chipset when the device overheats (after 2+ hours of streaming). This is a hardware limitation that causes real throughput drops that a network speed test (run for 30 seconds) would never catch.
- How would you build a monitoring dashboard that proactively detects the class of issue described above (CDN edge degradation causing widespread quality drops) before users report it?
- Netflix serves hundreds of device types (smart TVs, phones, tablets, game consoles). How do you handle device-specific ABR tuning at scale?