Design: Twitter/X Timeline

Problem Statement

Design a Twitter-like social media platform that:

Users can post tweets (280 characters)
Users can follow other users
Users see a timeline of tweets from people they follow
Support for likes, retweets, and replies

Step 1: Requirements Clarification

Functional Requirements

Core Features

Post tweets (text, images, videos)
Follow/unfollow users
View home timeline (feed)
Like, retweet, reply
User profiles

Extended Features

Search tweets
Trending topics
Notifications
Direct messages

Non-Functional Requirements

Low Latency: Timeline loads in <200ms
High Availability: 99.99% uptime
Eventual Consistency: Acceptable for feeds
Scale: 500M users, 200M DAU

Capacity Estimation

┌─────────────────────────────────────────────────────────────────┐
│                 Twitter Scale Estimation                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Users:                                                         │
│  • 500 million total users                                     │
│  • 200 million DAU                                             │
│  • Average: 200 followers per user                             │
│  • Power users: 10M+ followers (celebrities)                   │
│                                                                 │
│  Tweets:                                                        │
│  • 10% users tweet daily = 20M tweets/day                      │
│  • Tweet QPS = 20M / 86,400 ≈ 230 QPS                          │
│  • Peak: 230 × 3 = 700 QPS                                     │
│                                                                 │
│  Timeline reads:                                                │
│  • 200M DAU × 5 views/day = 1B views/day                       │
│  • Timeline QPS = 1B / 86,400 ≈ 11,600 QPS                     │
│  • Peak: 11,600 × 3 = 35,000 QPS                               │
│                                                                 │
│  Read:Write ratio = 11,600:230 = 50:1                          │
│                                                                 │
│  Storage:                                                       │
│  • Tweet: 280 chars + metadata = 500 bytes                     │
│  • Daily: 20M × 500 = 10 GB                                    │
│  • Yearly: 10 GB × 365 = 3.6 TB                                │
│  • Media: 10% tweets with 1MB image = 2M × 1MB = 2 TB/day     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Step 2: High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                    Twitter Architecture                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                       ┌────────────┐                           │
│                       │   Client   │                           │
│                       └─────┬──────┘                           │
│                             │                                   │
│                       ┌─────▼──────┐                           │
│                       │    CDN     │                           │
│                       └─────┬──────┘                           │
│                             │                                   │
│                       ┌─────▼──────┐                           │
│                       │ API Gateway│                           │
│                       └─────┬──────┘                           │
│                             │                                   │
│    ┌────────────────────────┼────────────────────────┐         │
│    │                        │                        │          │
│    ▼                        ▼                        ▼          │
│ ┌──────────┐         ┌──────────────┐         ┌──────────┐    │
│ │  Tweet   │         │   Timeline   │         │  User    │    │
│ │ Service  │         │   Service    │         │ Service  │    │
│ └────┬─────┘         └──────┬───────┘         └────┬─────┘    │
│      │                      │                      │           │
│      │                      │                      │           │
│ ┌────▼─────┐         ┌──────▼───────┐         ┌────▼─────┐    │
│ │Tweet DB  │         │Timeline Cache│         │ User DB  │    │
│ │(Cassandra)│         │   (Redis)    │         │(Postgres)│    │
│ └──────────┘         └──────────────┘         └──────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Core Services

Service	Responsibility
Tweet Service	Create, read, delete tweets
Timeline Service	Build and serve user timelines
User Service	User profiles, follow relationships
Fan-out Service	Distribute tweets to followers
Search Service	Full-text search on tweets
Notification Service	Push notifications

Step 3: The Timeline Problem

The core challenge is: How do we show a user the latest tweets from everyone they follow?

Approach 1: Fan-out on Read (Pull)

┌─────────────────────────────────────────────────────────────────┐
│                    Fan-out on Read                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  When user opens timeline:                                      │
│  ────────────────────────                                       │
│  1. Get list of followees (200 users)                          │
│  2. For each followee, get recent tweets                       │
│  3. Merge and sort by timestamp                                │
│  4. Return top N tweets                                         │
│                                                                 │
│  User                                                           │
│    │                                                            │
│    ▼                                                            │
│  "Get timeline"                                                │
│    │                                                            │
│    ▼                                                            │
│  ┌─────────────────────────────────────────┐                   │
│  │  SELECT tweets FROM tweet_table         │                   │
│  │  WHERE user_id IN (followee_ids)        │                   │
│  │  ORDER BY created_at DESC               │                   │
│  │  LIMIT 100                              │                   │
│  └─────────────────────────────────────────┘                   │
│                                                                 │
│  + Simple to implement                                         │
│  + No extra storage                                            │
│  - Slow: 200 queries per timeline request                     │
│  - 11,600 QPS × 200 = 2.3M queries/second!                    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Approach 2: Fan-out on Write (Push)

┌─────────────────────────────────────────────────────────────────┐
│                    Fan-out on Write                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  When user posts a tweet:                                       │
│  ────────────────────────                                       │
│  1. Save tweet to database                                     │
│  2. Get all followers (could be millions)                      │
│  3. Push tweet ID to each follower's timeline cache            │
│                                                                 │
│  Tweet Posted                                                   │
│    │                                                            │
│    ▼                                                            │
│  ┌─────────────┐                                               │
│  │ Save Tweet  │                                               │
│  └──────┬──────┘                                               │
│         │                                                       │
│    ┌────┴────┐                                                 │
│    ▼         ▼                                                 │
│  Fan-out to each follower's timeline cache:                    │
│                                                                 │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐          │
│  │Follower 1│ │Follower 2│ │Follower 3│ │    ...   │          │
│  │ Timeline │ │ Timeline │ │ Timeline │ │          │          │
│  │[tweet_id]│ │[tweet_id]│ │[tweet_id]│ │[tweet_id]│          │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘          │
│                                                                 │
│  + Fast reads: O(1) cache lookup                              │
│  + Timeline is pre-computed                                    │
│  - Celebrity problem: 50M followers = 50M writes              │
│  - Wasted writes for inactive users                           │
│  - More storage (timeline caches)                             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Approach 3: Hybrid (What Twitter Uses)

┌─────────────────────────────────────────────────────────────────┐
│                    Hybrid Approach                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Regular Users (< 10K followers):                              │
│  ──────────────────────────────                                 │
│  → Fan-out on Write                                            │
│  → Push to all followers' timelines                            │
│                                                                 │
│  Celebrities (> 10K followers):                                 │
│  ─────────────────────────────                                  │
│  → Fan-out on Read                                             │
│  → Fetch at read time and merge                                │
│                                                                 │
│  Timeline Generation:                                           │
│  ────────────────────                                           │
│  1. Read user's pre-computed timeline (regular users' tweets)  │
│  2. Get list of celebrity followees                            │
│  3. Fetch recent tweets from each celebrity                    │
│  4. Merge both lists, sort by time                             │
│  5. Return top N                                                │
│                                                                 │
│         User's Timeline                                         │
│              │                                                  │
│    ┌─────────┴─────────┐                                       │
│    │                   │                                        │
│    ▼                   ▼                                        │
│  ┌───────────┐   ┌───────────────┐                             │
│  │ Timeline  │   │  Celebrity    │                             │
│  │  Cache    │   │   Tweets      │                             │
│  │ (pushed)  │   │  (pulled)     │                             │
│  └─────┬─────┘   └───────┬───────┘                             │
│        │                 │                                      │
│        └────────┬────────┘                                      │
│                 │                                               │
│           ┌─────▼─────┐                                        │
│           │   Merge   │                                        │
│           │   & Sort  │                                        │
│           └───────────┘                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Step 4: Detailed Component Design

Data Models

-- Users table
CREATE TABLE users (
    user_id         BIGINT PRIMARY KEY,
    username        VARCHAR(50) UNIQUE,
    email           VARCHAR(255),
    display_name    VARCHAR(100),
    bio             TEXT,
    profile_image   VARCHAR(500),
    followers_count BIGINT DEFAULT 0,
    following_count BIGINT DEFAULT 0,
    created_at      TIMESTAMP
);

-- Tweets table (Cassandra)
CREATE TABLE tweets (
    tweet_id        BIGINT,         -- Snowflake ID
    user_id         BIGINT,
    content         TEXT,
    media_urls      LIST<TEXT>,
    created_at      TIMESTAMP,
    likes_count     BIGINT,
    retweets_count  BIGINT,
    replies_count   BIGINT,
    PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);

-- Follow relationships (Graph or Cassandra)
CREATE TABLE followers (
    user_id         BIGINT,
    follower_id     BIGINT,
    created_at      TIMESTAMP,
    PRIMARY KEY (user_id, follower_id)
);

CREATE TABLE following (
    user_id         BIGINT,
    following_id    BIGINT,
    created_at      TIMESTAMP,
    PRIMARY KEY (user_id, following_id)
);

Timeline Cache Structure

# Redis sorted set for each user's timeline
# Key: timeline:{user_id}
# Score: tweet timestamp
# Value: tweet_id

# Add tweet to timeline
ZADD timeline:123 1705312800 "tweet_456789"

# Get latest 50 tweets
ZREVRANGE timeline:123 0 49

# Trim old tweets (keep only 800)
ZREMRANGEBYRANK timeline:123 0 -801

# Timeline structure
{
    "timeline:123": {
        "tweet_999": 1705312800,  # Most recent
        "tweet_998": 1705312700,
        "tweet_997": 1705312600,
        ...
        # Store last 800 tweet IDs
    }
}

# Full tweet data cached separately
{
    "tweet:999": {
        "user_id": 456,
        "content": "Hello world!",
        "created_at": 1705312800,
        "likes": 42
    }
}

Fan-out Service

class FanoutService:
    def __init__(self, follower_service, timeline_cache, message_queue):
        self.follower_service = follower_service
        self.timeline_cache = timeline_cache
        self.queue = message_queue
    
    def fanout_tweet(self, tweet):
        user_id = tweet.user_id
        followers = self.follower_service.get_followers(user_id)
        
        # Check if user is a celebrity
        if len(followers) > 10_000:
            # Don't fan out for celebrities
            # Store tweet in celebrity tweets cache instead
            self.cache_celebrity_tweet(user_id, tweet)
            return
        
        # Fan-out to all followers
        for batch in self.batch(followers, 1000):
            # Process in parallel
            self.queue.publish("fanout", {
                "tweet_id": tweet.id,
                "tweet_time": tweet.created_at,
                "follower_ids": batch
            })
    
    def process_fanout_batch(self, message):
        tweet_id = message["tweet_id"]
        tweet_time = message["tweet_time"]
        
        pipeline = self.timeline_cache.pipeline()
        for follower_id in message["follower_ids"]:
            # Add to timeline sorted set
            pipeline.zadd(
                f"timeline:{follower_id}",
                {tweet_id: tweet_time}
            )
            # Trim to 800 tweets
            pipeline.zremrangebyrank(f"timeline:{follower_id}", 0, -801)
        
        pipeline.execute()

Timeline Service

class TimelineService:
    def __init__(self, timeline_cache, tweet_service, follow_service):
        self.cache = timeline_cache
        self.tweet_service = tweet_service
        self.follow_service = follow_service
    
    def get_timeline(self, user_id, count=50, cursor=None):
        # 1. Get pre-computed timeline (from regular users)
        if cursor:
            timeline_ids = self.cache.zrevrangebyscore(
                f"timeline:{user_id}",
                cursor, 
                "-inf",
                start=0, 
                num=count
            )
        else:
            timeline_ids = self.cache.zrevrange(
                f"timeline:{user_id}",
                0, 
                count - 1
            )
        
        # 2. Get celebrity tweets (fan-out on read)
        celebrity_followees = self.follow_service.get_celebrity_followees(user_id)
        celebrity_tweets = []
        
        for celeb_id in celebrity_followees:
            recent_tweets = self.tweet_service.get_recent_tweets(
                celeb_id, 
                count=10
            )
            celebrity_tweets.extend(recent_tweets)
        
        # 3. Merge and sort
        all_tweet_ids = list(timeline_ids) + [t.id for t in celebrity_tweets]
        
        # Fetch full tweet objects
        tweets = self.tweet_service.get_tweets_batch(all_tweet_ids)
        
        # Sort by time and take top N
        tweets.sort(key=lambda t: t.created_at, reverse=True)
        return tweets[:count]

Tweet Search with Elasticsearch

┌─────────────────────────────────────────────────────────────────┐
│                    Search Architecture                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Tweet Created                                                  │
│      │                                                          │
│      ▼                                                          │
│  ┌──────────┐     ┌──────────────┐     ┌───────────────┐       │
│  │  Kafka   │────►│   Consumer   │────►│ Elasticsearch │       │
│  │          │     │   Service    │     │               │       │
│  └──────────┘     └──────────────┘     └───────┬───────┘       │
│                                                │                │
│                                                ▼                │
│                              ┌─────────────────────────────┐   │
│                              │ GET /tweets/_search         │   │
│                              │ {                           │   │
│                              │   "query": {                │   │
│                              │     "match": {              │   │
│                              │       "content": "keyword"  │   │
│                              │     }                       │   │
│                              │   }                         │   │
│                              │ }                           │   │
│                              └─────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

class TrendingService:
    def __init__(self, redis_client):
        self.redis = redis_client
    
    def track_hashtag(self, hashtag, location="global"):
        """Called when a tweet with hashtag is posted"""
        current_window = self.get_current_window()
        
        # Increment counter in current time window
        self.redis.zincrby(
            f"trending:{location}:{current_window}",
            1,
            hashtag
        )
    
    def get_trending(self, location="global", count=10):
        """Get top trending hashtags"""
        # Get last 5 time windows (e.g., 5 minute windows)
        windows = self.get_recent_windows(5)
        
        # Merge counts from recent windows
        self.redis.zunionstore(
            f"trending:{location}:merged",
            [f"trending:{location}:{w}" for w in windows]
        )
        
        # Get top hashtags
        return self.redis.zrevrange(
            f"trending:{location}:merged",
            0,
            count - 1,
            withscores=True
        )
    
    def get_current_window(self):
        """5-minute time windows"""
        import time
        return int(time.time() / 300) * 300

Step 6: Additional Components

Media Upload

┌─────────────────────────────────────────────────────────────────┐
│                    Media Upload Flow                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Client requests upload URL                                 │
│     POST /api/v1/media/upload-url                              │
│                                                                 │
│  2. Server returns pre-signed S3 URL                           │
│     { "upload_url": "https://s3.../presigned", "media_id": 123}│
│                                                                 │
│  3. Client uploads directly to S3                              │
│     PUT https://s3.../presigned                                │
│                                                                 │
│  4. S3 triggers processing lambda                              │
│     - Generate thumbnails                                      │
│     - Transcode video                                          │
│     - CDN propagation                                          │
│                                                                 │
│  5. Client includes media_id in tweet                          │
│     POST /api/v1/tweets                                        │
│     { "content": "Hello!", "media_ids": [123] }               │
│                                                                 │
│  ┌────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐      │
│  │ Client │───►│   API   │───►│   S3    │───►│   CDN   │      │
│  └────────┘    └─────────┘    └─────────┘    └─────────┘      │
│                    │              │                            │
│                    │              ▼                            │
│                    │         ┌─────────┐                       │
│                    │         │ Lambda  │                       │
│                    │         │ Process │                       │
│                    │         └─────────┘                       │
│                    │              │                            │
│                    │              ▼                            │
│                    │         ┌─────────┐                       │
│                    └────────►│ Media   │                       │
│                              │   DB    │                       │
│                              └─────────┘                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Notifications

class NotificationService:
    def __init__(self, push_service, websocket_service, email_service):
        self.push = push_service
        self.ws = websocket_service
        self.email = email_service
    
    def notify_mention(self, mentioned_user_id, tweet):
        notification = {
            "type": "mention",
            "from_user": tweet.user_id,
            "tweet_id": tweet.id,
            "content": f"@{tweet.user.username} mentioned you"
        }
        
        self.send_notification(mentioned_user_id, notification)
    
    def notify_like(self, tweet_owner_id, liker_id, tweet_id):
        # Batch likes to avoid notification spam
        # "5 people liked your tweet"
        pass
    
    def send_notification(self, user_id, notification):
        # 1. Store in notifications table
        self.store_notification(user_id, notification)
        
        # 2. Send real-time if user is online
        if self.ws.is_connected(user_id):
            self.ws.send(user_id, notification)
        
        # 3. Send push notification
        self.push.send(user_id, notification)

Final Architecture

┌─────────────────────────────────────────────────────────────────┐
│                 Complete Twitter Architecture                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                         ┌──────────────┐                       │
│                         │   Clients    │                       │
│                         └──────┬───────┘                       │
│                                │                                │
│                         ┌──────▼───────┐                       │
│                         │     CDN      │                       │
│                         └──────┬───────┘                       │
│                                │                                │
│                         ┌──────▼───────┐                       │
│                         │ API Gateway  │                       │
│                         └──────┬───────┘                       │
│                                │                                │
│    ┌───────────────────────────┼───────────────────────────┐   │
│    │            │              │              │            │    │
│    ▼            ▼              ▼              ▼            ▼    │
│ ┌──────┐   ┌──────┐      ┌──────────┐   ┌──────┐    ┌──────┐  │
│ │Tweet │   │ User │      │ Timeline │   │Search│    │Notif │  │
│ │ Svc  │   │ Svc  │      │   Svc    │   │ Svc  │    │ Svc  │  │
│ └──┬───┘   └──┬───┘      └────┬─────┘   └──┬───┘    └──┬───┘  │
│    │          │               │            │           │       │
│    │          │               │            │           │       │
│    ▼          ▼               ▼            ▼           ▼       │
│ ┌──────┐  ┌──────┐      ┌──────────┐  ┌───────┐  ┌─────────┐  │
│ │Tweet │  │ User │      │ Timeline │  │Elastic│  │  Push   │  │
│ │  DB  │  │  DB  │      │  Cache   │  │Search │  │ Service │  │
│ │      │  │      │      │ (Redis)  │  │       │  │         │  │
│ └──────┘  └──────┘      └──────────┘  └───────┘  └─────────┘  │
│                                                                 │
│    ┌───────────────────────────────────────────────────────┐   │
│    │                    Kafka                              │   │
│    │  (tweets, fanout, notifications, search indexing)     │   │
│    └───────────────────────────────────────────────────────┘   │
│                                                                 │
│    ┌───────────────────────────────────────────────────────┐   │
│    │                 S3 + CDN (Media)                      │   │
│    └───────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Key Design Decisions

Decision	Choice	Reasoning
Timeline	Hybrid fan-out	Balance between read/write costs
Tweet Storage	Cassandra	High write throughput, time-series friendly
User Data	PostgreSQL	ACID for user operations
Timeline Cache	Redis Sorted Sets	O(log n) insert, O(1) range reads
Search	Elasticsearch	Full-text search, real-time indexing
Media	S3 + CDN	Scalable object storage
Messaging	Kafka	Durable, high-throughput event streaming

Common Interview Questions

How do you handle the celebrity problem?

Use hybrid approach: fan-out on write for regular users, fan-out on read for celebrities (>10K followers). Celebrity tweets are fetched at read time and merged with the pre-computed timeline.

How do you ensure timeline freshness?

Pre-compute timelines via fan-out on write
Short TTL on cache (5 minutes)
Use WebSocket for real-time updates
Periodic refresh on client side

How do you handle tweet deletions?

Mark tweet as deleted in DB (soft delete)
Async job to remove from all timeline caches
Client-side filtering as backup
Accept eventual consistency (deleted tweets may briefly appear)

How do you rank the timeline?

Chronological (simple, predictable)
Algorithmic ranking (engagement prediction)
Hybrid: recent tweets chronologically, older by relevance
Store ranking score in timeline cache

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Problem Statement

​Step 1: Requirements Clarification

​Functional Requirements

Core Features

Extended Features

​Non-Functional Requirements

​Capacity Estimation

​Step 2: High-Level Design

​Core Services

​Step 3: The Timeline Problem

​Approach 1: Fan-out on Read (Pull)

​Approach 2: Fan-out on Write (Push)

​Approach 3: Hybrid (What Twitter Uses)

​Step 4: Detailed Component Design

​Data Models

​Timeline Cache Structure

​Fan-out Service

​Timeline Service

​Step 5: Search and Trending

​Tweet Search with Elasticsearch

​Trending Topics

​Step 6: Additional Components

​Media Upload

​Notifications

​Final Architecture

​Key Design Decisions

​Common Interview Questions