Skip to main content

Problem Statement

Design a music streaming service like Spotify that handles:
  • Music catalog with millions of songs
  • Real-time audio streaming
  • Personalized playlists and recommendations
  • Search and discovery
  • Offline playback
  • Social features (following, sharing)

Requirements

Functional Requirements

Core Features:
├── Music Playback
│   ├── Stream songs in real-time
│   ├── Queue management
│   ├── Shuffle and repeat
│   ├── Offline downloads
│   └── Cross-device sync

├── Content Discovery
│   ├── Search (songs, artists, albums, playlists)
│   ├── Browse by genre/mood
│   ├── Personalized recommendations
│   └── Radio stations

├── Library Management
│   ├── Save songs/albums/playlists
│   ├── Create playlists
│   ├── Recently played
│   └── Listening history

└── Social Features
    ├── Follow artists/users
    ├── Collaborative playlists
    └── Share music

Non-Functional Requirements

Scale:
├── 400M monthly active users
├── 180M premium subscribers
├── 80M songs in catalog
├── 4B playlists
└── 1M concurrent streams

Performance:
├── Playback start < 200ms
├── Search results < 100ms
├── No buffering during playback
└── Seamless song transitions

Availability:
├── 99.99% uptime for streaming
├── Graceful degradation
└── Offline playback support

Capacity Estimation

Storage:
├── Songs: 80M × 5MB (avg) = 400TB audio files
├── Multiple quality levels: 400TB × 4 = 1.6PB
├── Metadata: 80M × 10KB = 800GB
├── User data: 400M × 50KB = 20TB
└── Playlists: 4B × 1KB = 4TB

Bandwidth:
├── Concurrent streams: 1M
├── Bitrate: 160kbps (average)
├── Bandwidth: 1M × 160kbps = 160Gbps

├── Daily streams: 400M users × 2 hours × 60min/15 songs
├── = 3.2B song plays per day
└── = 37,000 song plays per second

High-Level Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Spotify Architecture                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────────────────────────────┐  │
│  │              │    │            API Gateway               │  │
│  │   Client     │───►│     + Authentication + Rate Limit    │  │
│  │ (Mobile/Web) │    └─────────────────┬────────────────────┘  │
│  │              │                      │                        │
│  └──────┬───────┘                      │                        │
│         │                              │                        │
│         │ Audio Stream                 │ API Calls              │
│         │                              │                        │
│         ▼                              ▼                        │
│  ┌──────────────┐    ┌──────────────────────────────────────┐  │
│  │     CDN      │    │           Microservices              │  │
│  │  (Audio      │    │  ┌─────────┐ ┌─────────┐ ┌────────┐  │  │
│  │   Delivery)  │    │  │Playback │ │ Search  │ │Playlist│  │  │
│  └──────────────┘    │  │ Service │ │ Service │ │Service │  │  │
│                      │  └────┬────┘ └────┬────┘ └───┬────┘  │  │
│                      │       │           │          │        │  │
│                      │  ┌────▼───────────▼──────────▼────┐   │  │
│                      │  │        Message Queue           │   │  │
│                      │  │          (Kafka)               │   │  │
│                      │  └────────────────┬───────────────┘   │  │
│                      └───────────────────┼───────────────────┘  │
│                                          │                       │
│  ┌───────────────────────────────────────┼───────────────────┐  │
│  │              Data Layer               │                   │  │
│  │  ┌──────────┐  ┌──────────┐  ┌───────▼────┐              │  │
│  │  │  Song    │  │  User    │  │Recommendation│             │  │
│  │  │ Catalog  │  │  Data    │  │   Engine     │             │  │
│  │  │(Cassandra│  │(Postgres)│  │   (ML)       │             │  │
│  │  └──────────┘  └──────────┘  └──────────────┘             │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. Audio Ingestion Pipeline

┌─────────────────────────────────────────────────────────────┐
│                 Audio Ingestion Pipeline                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌────────────┐                                             │
│  │  Original  │  FLAC/WAV (lossless)                        │
│  │   Master   │                                             │
│  └─────┬──────┘                                             │
│        │                                                     │
│        ▼                                                     │
│  ┌────────────────────────────────────────────────────────┐ │
│  │               Transcoding Pipeline                      │ │
│  ├────────────────────────────────────────────────────────┤ │
│  │                                                         │ │
│  │  Input → Normalize → Encode → Segment → Store          │ │
│  │                                                         │ │
│  │  Output formats:                                        │ │
│  │  ├── OGG Vorbis 96kbps  (mobile, low quality)          │ │
│  │  ├── OGG Vorbis 160kbps (mobile, normal)               │ │
│  │  ├── OGG Vorbis 320kbps (desktop, high)                │ │
│  │  └── AAC 256kbps (web player)                          │ │
│  │                                                         │ │
│  └────────────────────────────────────────────────────────┘ │
│        │                                                     │
│        ▼                                                     │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                    Storage                              │ │
│  │  ├── Audio files → Object Storage (S3)                 │ │
│  │  ├── Metadata → Cassandra                              │ │
│  │  ├── Waveforms → Redis (for visualization)             │ │
│  │  └── Lyrics → Elasticsearch                            │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘

2. Audio Streaming Service

from dataclasses import dataclass
from typing import Optional, List, Generator
from enum import Enum
import asyncio

class AudioQuality(Enum):
    LOW = "96"
    NORMAL = "160"
    HIGH = "320"
    VERY_HIGH = "320"  # Lossless for premium

@dataclass
class AudioSegment:
    song_id: str
    segment_index: int
    start_time_ms: int
    duration_ms: int
    url: str
    
@dataclass
class PlaybackSession:
    session_id: str
    user_id: str
    device_id: str
    song_id: str
    quality: AudioQuality
    position_ms: int
    is_playing: bool


class AudioStreamingService:
    """
    Handles audio streaming with adaptive bitrate.
    """
    
    SEGMENT_DURATION_MS = 10_000  # 10 seconds per segment
    PREFETCH_SEGMENTS = 3
    
    def __init__(
        self, 
        audio_storage,
        cdn_service,
        session_store,
        analytics
    ):
        self.audio_storage = audio_storage
        self.cdn = cdn_service
        self.sessions = session_store
        self.analytics = analytics
    
    async def start_playback(
        self,
        user_id: str,
        device_id: str,
        song_id: str,
        quality: AudioQuality = AudioQuality.NORMAL,
        start_position_ms: int = 0
    ) -> PlaybackSession:
        """
        Initialize a new playback session.
        """
        # Create session
        session = PlaybackSession(
            session_id=generate_session_id(),
            user_id=user_id,
            device_id=device_id,
            song_id=song_id,
            quality=quality,
            position_ms=start_position_ms,
            is_playing=True
        )
        
        # Store session
        await self.sessions.save(session)
        
        # Get song metadata
        song = await self.audio_storage.get_song_metadata(song_id)
        
        # Generate CDN URLs for initial segments
        start_segment = start_position_ms // self.SEGMENT_DURATION_MS
        segments = await self._get_segment_urls(
            song_id,
            quality,
            start_segment,
            self.PREFETCH_SEGMENTS
        )
        
        # Track playback start
        await self.analytics.track_play_start(
            user_id=user_id,
            song_id=song_id,
            device_id=device_id,
            quality=quality.value
        )
        
        return {
            "session": session,
            "song": song,
            "segments": segments,
            "total_segments": song["duration_ms"] // self.SEGMENT_DURATION_MS + 1
        }
    
    async def get_next_segments(
        self,
        session_id: str,
        current_segment: int
    ) -> List[AudioSegment]:
        """
        Get next segments for continuous playback.
        Called by client as they approach end of buffered content.
        """
        session = await self.sessions.get(session_id)
        if not session:
            raise SessionNotFoundError(session_id)
        
        return await self._get_segment_urls(
            session.song_id,
            session.quality,
            current_segment + 1,
            self.PREFETCH_SEGMENTS
        )
    
    async def _get_segment_urls(
        self,
        song_id: str,
        quality: AudioQuality,
        start_segment: int,
        count: int
    ) -> List[AudioSegment]:
        """
        Generate signed CDN URLs for audio segments.
        """
        segments = []
        
        for i in range(count):
            segment_index = start_segment + i
            
            # Generate signed URL (expires in 1 hour)
            path = f"audio/{song_id}/{quality.value}/{segment_index}.ogg"
            signed_url = await self.cdn.get_signed_url(
                path,
                expires_in=3600
            )
            
            segments.append(AudioSegment(
                song_id=song_id,
                segment_index=segment_index,
                start_time_ms=segment_index * self.SEGMENT_DURATION_MS,
                duration_ms=self.SEGMENT_DURATION_MS,
                url=signed_url
            ))
        
        return segments
    
    async def update_position(
        self,
        session_id: str,
        position_ms: int
    ):
        """
        Update playback position for resume and analytics.
        """
        session = await self.sessions.get(session_id)
        if session:
            session.position_ms = position_ms
            await self.sessions.save(session)
    
    async def handle_quality_switch(
        self,
        session_id: str,
        new_quality: AudioQuality,
        current_segment: int
    ) -> List[AudioSegment]:
        """
        Handle adaptive bitrate switch.
        """
        session = await self.sessions.get(session_id)
        session.quality = new_quality
        await self.sessions.save(session)
        
        # Return new quality segments
        return await self._get_segment_urls(
            session.song_id,
            new_quality,
            current_segment,
            self.PREFETCH_SEGMENTS
        )
    
    async def sync_playback_state(
        self,
        user_id: str,
        device_id: str
    ) -> Optional[PlaybackSession]:
        """
        Get current playback state for cross-device sync.
        """
        # Find active session for user
        active_session = await self.sessions.get_active_for_user(user_id)
        
        if active_session and active_session.device_id != device_id:
            # Transfer playback to this device
            return await self.transfer_playback(
                active_session.session_id,
                device_id
            )
        
        return active_session

3. Recommendation Engine

┌─────────────────────────────────────────────────────────────┐
│              Recommendation Architecture                     │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Data Sources:                                              │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ • Listening history (plays, skips, duration)          │ │
│  │ • Saved library (likes, playlists)                    │ │
│  │ • Audio features (tempo, energy, danceability)        │ │
│  │ • Social signals (follows, collaborative playlists)   │ │
│  │ • Context (time, device, location)                    │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Recommendation Types:                                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                                                         │ │
│  │  1. Collaborative Filtering                            │ │
│  │     "Users like you also listened to..."               │ │
│  │                                                         │ │
│  │  2. Content-Based                                      │ │
│  │     "Because you listened to [song]..."               │ │
│  │                                                         │ │
│  │  3. Audio Analysis                                     │ │
│  │     Similar tempo, key, energy level                   │ │
│  │                                                         │ │
│  │  4. Context-Aware                                      │ │
│  │     Morning workout → upbeat songs                     │ │
│  │     Late night → chill music                           │ │
│  │                                                         │ │
│  │  5. Exploration vs Exploitation                        │ │
│  │     80% familiar, 20% discovery                        │ │
│  │                                                         │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Daily Mix Generation:                                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                                                         │ │
│  │  1. Cluster user's listening into ~6 taste groups      │ │
│  │  2. For each cluster:                                  │ │
│  │     a. Select seed tracks (recently played)            │ │
│  │     b. Find similar songs                              │ │
│  │     c. Add discovery songs (20%)                       │ │
│  │     d. Order for flow (tempo, energy arc)              │ │
│  │  3. Regenerate daily                                   │ │
│  │                                                         │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘
from dataclasses import dataclass
from typing import List, Dict, Tuple
import numpy as np
from sklearn.cluster import KMeans

@dataclass
class Song:
    id: str
    name: str
    artist_id: str
    audio_features: Dict[str, float]  # tempo, energy, danceability, etc.

@dataclass
class UserTaste:
    cluster_id: int
    seed_tracks: List[str]
    feature_centroid: Dict[str, float]


class RecommendationEngine:
    """
    Multi-strategy recommendation engine.
    """
    
    def __init__(
        self,
        song_vectors,      # Pre-computed song embeddings
        user_vectors,      # Pre-computed user embeddings
        audio_features,    # Song audio features
        listening_history, # User listening data
        social_graph       # Follow relationships
    ):
        self.song_vectors = song_vectors
        self.user_vectors = user_vectors
        self.audio_features = audio_features
        self.history = listening_history
        self.social = social_graph
    
    async def get_personalized_recommendations(
        self,
        user_id: str,
        context: Dict = None,
        limit: int = 50
    ) -> List[Song]:
        """
        Get personalized recommendations combining multiple signals.
        """
        # Get user's recent listening
        recent = await self.history.get_recent_plays(user_id, days=30)
        
        # Get candidates from different sources
        candidates = []
        
        # 1. Collaborative filtering (40% weight)
        cf_candidates = await self._collaborative_filtering(user_id, limit=100)
        candidates.extend([(s, 0.4) for s in cf_candidates])
        
        # 2. Content-based from recent plays (30% weight)
        for song_id in recent[:10]:
            similar = await self._content_based_similar(song_id, limit=20)
            candidates.extend([(s, 0.3) for s in similar])
        
        # 3. Audio feature matching (20% weight)
        taste_profile = await self._get_audio_taste_profile(user_id)
        audio_similar = await self._audio_feature_matching(
            taste_profile, 
            limit=50
        )
        candidates.extend([(s, 0.2) for s in audio_similar])
        
        # 4. Discovery/exploration (10% weight)
        discovery = await self._get_discovery_tracks(user_id, limit=20)
        candidates.extend([(s, 0.1) for s in discovery])
        
        # Aggregate scores
        song_scores = {}
        for song, weight in candidates:
            if song.id not in song_scores:
                song_scores[song.id] = {"song": song, "score": 0}
            song_scores[song.id]["score"] += weight
        
        # Apply context modifiers
        if context:
            song_scores = self._apply_context_boost(song_scores, context)
        
        # Filter already played
        played_ids = set(recent)
        final = [
            item for item in song_scores.values()
            if item["song"].id not in played_ids
        ]
        
        # Sort by score and return top N
        final.sort(key=lambda x: x["score"], reverse=True)
        return [item["song"] for item in final[:limit]]
    
    async def _collaborative_filtering(
        self,
        user_id: str,
        limit: int
    ) -> List[Song]:
        """
        Find songs that similar users liked.
        Using pre-computed user embeddings.
        """
        # Get user's embedding
        user_vector = await self.user_vectors.get(user_id)
        
        # Find similar users
        similar_users = await self.user_vectors.find_similar(
            user_vector,
            k=50
        )
        
        # Get songs they liked that this user hasn't heard
        user_songs = set(await self.history.get_all_plays(user_id))
        candidates = []
        
        for similar_user_id, similarity in similar_users:
            their_songs = await self.history.get_recent_plays(
                similar_user_id,
                days=30
            )
            for song_id in their_songs:
                if song_id not in user_songs:
                    song = await self.audio_features.get_song(song_id)
                    candidates.append((song, similarity))
        
        # Sort by weighted score
        candidates.sort(key=lambda x: x[1], reverse=True)
        return [song for song, _ in candidates[:limit]]
    
    async def _content_based_similar(
        self,
        song_id: str,
        limit: int
    ) -> List[Song]:
        """
        Find songs similar to a given song.
        Using pre-computed song embeddings.
        """
        song_vector = await self.song_vectors.get(song_id)
        similar = await self.song_vectors.find_similar(song_vector, k=limit)
        
        return [
            await self.audio_features.get_song(s_id)
            for s_id, _ in similar
        ]
    
    async def generate_daily_mix(
        self,
        user_id: str,
        mix_index: int  # 1-6
    ) -> List[Song]:
        """
        Generate personalized daily mix playlist.
        Each mix focuses on a different taste cluster.
        """
        # Get user's taste clusters
        clusters = await self._cluster_user_taste(user_id, n_clusters=6)
        
        if mix_index > len(clusters):
            return []
        
        cluster = clusters[mix_index - 1]
        
        # Get seed tracks from this cluster
        seeds = cluster.seed_tracks[:5]
        
        playlist = []
        
        # 60% similar to seeds
        for seed_id in seeds:
            similar = await self._content_based_similar(seed_id, limit=5)
            playlist.extend(similar[:3])
        
        # 20% matching audio features
        audio_matches = await self._audio_feature_matching(
            cluster.feature_centroid,
            limit=10
        )
        playlist.extend(audio_matches)
        
        # 20% discovery (new artists in similar space)
        discovery = await self._get_discovery_in_cluster(
            user_id,
            cluster,
            limit=10
        )
        playlist.extend(discovery)
        
        # Order for good listening flow
        playlist = self._order_for_flow(playlist)
        
        return playlist[:50]
    
    def _order_for_flow(self, songs: List[Song]) -> List[Song]:
        """
        Order songs for smooth listening experience.
        Consider tempo, energy transitions.
        """
        if len(songs) <= 1:
            return songs
        
        ordered = [songs[0]]
        remaining = songs[1:]
        
        while remaining:
            last = ordered[-1]
            # Find song with smallest "distance" in audio space
            best_next = min(
                remaining,
                key=lambda s: self._audio_distance(last, s)
            )
            ordered.append(best_next)
            remaining.remove(best_next)
        
        return ordered
    
    def _audio_distance(self, song1: Song, song2: Song) -> float:
        """
        Calculate distance between songs in audio feature space.
        """
        features = ["tempo", "energy", "danceability", "valence"]
        
        distance = 0
        for f in features:
            diff = song1.audio_features.get(f, 0) - song2.audio_features.get(f, 0)
            distance += diff ** 2
        
        return distance ** 0.5

4. Offline Playback

┌─────────────────────────────────────────────────────────────┐
│                   Offline Playback                           │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Download Flow:                                             │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                                                         │ │
│  │  1. User requests download                             │ │
│  │     └── Check: Premium subscription?                   │ │
│  │     └── Check: Download limit not exceeded?            │ │
│  │                                                         │ │
│  │  2. Queue download in background                       │ │
│  │     └── Download all quality levels? No, user's pref   │ │
│  │     └── Download on WiFi only? User setting            │ │
│  │                                                         │ │
│  │  3. Encrypt audio files                                │ │
│  │     └── Device-specific encryption key                 │ │
│  │     └── Key rotated monthly                            │ │
│  │                                                         │ │
│  │  4. Store in secure container                          │ │
│  │     └── Cannot be exported                             │ │
│  │     └── Playable only in app                           │ │
│  │                                                         │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Offline Verification:                                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │                                                         │ │
│  │  • Must go online every 30 days                        │ │
│  │  • Verify subscription still active                    │ │
│  │  • Refresh encryption keys                             │ │
│  │  • If subscription lapses → downloads become           │ │
│  │    unplayable                                          │ │
│  │                                                         │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Data Model

┌─────────────────────────────────────────────────────────────┐
│                    Core Data Models                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Songs (Cassandra - high read throughput)                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ song_id      | uuid (partition key)                    │ │
│  │ name         | text                                    │ │
│  │ artist_id    | uuid                                    │ │
│  │ album_id     | uuid                                    │ │
│  │ duration_ms  | int                                     │ │
│  │ audio_urls   | map<quality, url>                       │ │
│  │ audio_features | map<feature, float>                   │ │
│  │ release_date | timestamp                               │ │
│  │ popularity   | int                                     │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  User Library (Cassandra)                                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ user_id      | uuid (partition key)                    │ │
│  │ item_type    | text (song|album|playlist|artist)       │ │
│  │ item_id      | uuid                                    │ │
│  │ saved_at     | timestamp (clustering key DESC)         │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Playlists (PostgreSQL - complex queries)                   │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ playlist_id  | uuid                                    │ │
│  │ owner_id     | uuid                                    │ │
│  │ name         | text                                    │ │
│  │ description  | text                                    │ │
│  │ is_public    | boolean                                 │ │
│  │ followers    | int                                     │ │
│  │ songs        | uuid[] (ordered)                        │ │
│  │ collaborative| boolean                                 │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
│  Listening History (Kafka → Data Lake)                      │
│  ┌────────────────────────────────────────────────────────┐ │
│  │ user_id      | uuid                                    │ │
│  │ song_id      | uuid                                    │ │
│  │ played_at    | timestamp                               │ │
│  │ duration_ms  | int (how long they listened)            │ │
│  │ context      | text (playlist, album, radio)           │ │
│  │ device_type  | text                                    │ │
│  │ skipped      | boolean                                 │ │
│  └────────────────────────────────────────────────────────┘ │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Design Decisions

CDN Strategy for Audio

Multi-CDN Strategy:
├── Primary: Fastly (low latency)
├── Secondary: Akamai (backup, different regions)
├── Edge caching for popular songs
└── Origin shield to protect storage

Caching Policy:
├── Popular songs (top 10K): Cache everywhere
├── Medium popularity: Cache in regional PoPs
├── Long tail: Cache on demand, lower TTL
└── Pre-warm caches before major releases

Handling 1M Concurrent Streams

Scaling Strategy:
├── Stateless streaming servers
├── Audio served directly from CDN
├── Session state in Redis cluster
├── API servers behind load balancer

├── Geographic distribution:
│   ├── US: 3 regions
│   ├── EU: 2 regions
│   ├── APAC: 2 regions
│   └── LATAM: 1 region

└── Auto-scaling based on:
    ├── Active sessions
    ├── Bandwidth utilization
    └── Time of day patterns

Interview Tips

Key Discussion Points:
  1. Audio delivery: How to ensure no buffering?
  2. Personalization: How to build recommendations?
  3. Offline mode: How to handle DRM?
  4. Cross-device sync: How to sync playback state?
  5. Royalty tracking: How to track plays accurately?