Problem Statement
Design a Twitter-like social media platform that:- Users can post tweets (280 characters)
- Users can follow other users
- Users see a timeline of tweets from people they follow
- Support for likes, retweets, and replies
Step 1: Requirements Clarification
Functional Requirements
Core Features
- Post tweets (text, images, videos)
- Follow/unfollow users
- View home timeline (feed)
- Like, retweet, reply
- User profiles
Extended Features
- Search tweets
- Trending topics
- Notifications
- Direct messages
Non-Functional Requirements
- Low Latency: Timeline loads in <200ms
- High Availability: 99.99% uptime
- Eventual Consistency: Acceptable for feeds
- Scale: 500M users, 200M DAU
Capacity Estimation
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Twitter Scale Estimation │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Users: │
│ • 500 million total users │
│ • 200 million DAU │
│ • Average: 200 followers per user │
│ • Power users: 10M+ followers (celebrities) │
│ │
│ Tweets: │
│ • 10% users tweet daily = 20M tweets/day │
│ • Tweet QPS = 20M / 86,400 ≈ 230 QPS │
│ • Peak: 230 × 3 = 700 QPS │
│ │
│ Timeline reads: │
│ • 200M DAU × 5 views/day = 1B views/day │
│ • Timeline QPS = 1B / 86,400 ≈ 11,600 QPS │
│ • Peak: 11,600 × 3 = 35,000 QPS │
│ │
│ Read:Write ratio = 11,600:230 = 50:1 │
│ │
│ Storage: │
│ • Tweet: 280 chars + metadata = 500 bytes │
│ • Daily: 20M × 500 = 10 GB │
│ • Yearly: 10 GB × 365 = 3.6 TB │
│ • Media: 10% tweets with 1MB image = 2M × 1MB = 2 TB/day │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 2: High-Level Design
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Twitter Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ │
│ │ Client │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ CDN │ │
│ └─────┬──────┘ │
│ │ │
│ ┌─────▼──────┐ │
│ │ API Gateway│ │
│ └─────┬──────┘ │
│ │ │
│ ┌────────────────────────┼────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │
│ │ Tweet │ │ Timeline │ │ User │ │
│ │ Service │ │ Service │ │ Service │ │
│ └────┬─────┘ └──────┬───────┘ └────┬─────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌────▼─────┐ ┌──────▼───────┐ ┌────▼─────┐ │
│ │Tweet DB │ │Timeline Cache│ │ User DB │ │
│ │(Cassandra)│ │ (Redis) │ │(Postgres)│ │
│ └──────────┘ └──────────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Core Services
| Service | Responsibility |
|---|---|
| Tweet Service | Create, read, delete tweets |
| Timeline Service | Build and serve user timelines |
| User Service | User profiles, follow relationships |
| Fan-out Service | Distribute tweets to followers |
| Search Service | Full-text search on tweets |
| Notification Service | Push notifications |
Step 3: The Timeline Problem
The core challenge is: How do we show a user the latest tweets from everyone they follow?Approach 1: Fan-out on Read (Pull)
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Fan-out on Read │
├─────────────────────────────────────────────────────────────────┤
│ │
│ When user opens timeline: │
│ ──────────────────────── │
│ 1. Get list of followees (200 users) │
│ 2. For each followee, get recent tweets │
│ 3. Merge and sort by timestamp │
│ 4. Return top N tweets │
│ │
│ User │
│ │ │
│ ▼ │
│ "Get timeline" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ SELECT tweets FROM tweet_table │ │
│ │ WHERE user_id IN (followee_ids) │ │
│ │ ORDER BY created_at DESC │ │
│ │ LIMIT 100 │ │
│ └─────────────────────────────────────────┘ │
│ │
│ + Simple to implement │
│ + No extra storage │
│ - Slow: 200 queries per timeline request │
│ - 11,600 QPS × 200 = 2.3M queries/second! │
│ │
└─────────────────────────────────────────────────────────────────┘
Approach 2: Fan-out on Write (Push)
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Fan-out on Write │
├─────────────────────────────────────────────────────────────────┤
│ │
│ When user posts a tweet: │
│ ──────────────────────── │
│ 1. Save tweet to database │
│ 2. Get all followers (could be millions) │
│ 3. Push tweet ID to each follower's timeline cache │
│ │
│ Tweet Posted │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Save Tweet │ │
│ └──────┬──────┘ │
│ │ │
│ ┌────┴────┐ │
│ ▼ ▼ │
│ Fan-out to each follower's timeline cache: │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Follower 1│ │Follower 2│ │Follower 3│ │ ... │ │
│ │ Timeline │ │ Timeline │ │ Timeline │ │ │ │
│ │[tweet_id]│ │[tweet_id]│ │[tweet_id]│ │[tweet_id]│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ + Fast reads: O(1) cache lookup │
│ + Timeline is pre-computed │
│ - Celebrity problem: 50M followers = 50M writes │
│ - Wasted writes for inactive users │
│ - More storage (timeline caches) │
│ │
└─────────────────────────────────────────────────────────────────┘
Approach 3: Hybrid (What Twitter Uses)
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Hybrid Approach │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Regular Users (< 10K followers): │
│ ────────────────────────────── │
│ → Fan-out on Write │
│ → Push to all followers' timelines │
│ │
│ Celebrities (> 10K followers): │
│ ───────────────────────────── │
│ → Fan-out on Read │
│ → Fetch at read time and merge │
│ │
│ Timeline Generation: │
│ ──────────────────── │
│ 1. Read user's pre-computed timeline (regular users' tweets) │
│ 2. Get list of celebrity followees │
│ 3. Fetch recent tweets from each celebrity │
│ 4. Merge both lists, sort by time │
│ 5. Return top N │
│ │
│ User's Timeline │
│ │ │
│ ┌─────────┴─────────┐ │
│ │ │ │
│ ▼ ▼ │
│ ┌───────────┐ ┌───────────────┐ │
│ │ Timeline │ │ Celebrity │ │
│ │ Cache │ │ Tweets │ │
│ │ (pushed) │ │ (pulled) │ │
│ └─────┬─────┘ └───────┬───────┘ │
│ │ │ │
│ └────────┬────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ Merge │ │
│ │ & Sort │ │
│ └───────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 4: Detailed Component Design
Data Models
Copy
-- Users table
CREATE TABLE users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
email VARCHAR(255),
display_name VARCHAR(100),
bio TEXT,
profile_image VARCHAR(500),
followers_count BIGINT DEFAULT 0,
following_count BIGINT DEFAULT 0,
created_at TIMESTAMP
);
-- Tweets table (Cassandra)
CREATE TABLE tweets (
tweet_id BIGINT, -- Snowflake ID
user_id BIGINT,
content TEXT,
media_urls LIST<TEXT>,
created_at TIMESTAMP,
likes_count BIGINT,
retweets_count BIGINT,
replies_count BIGINT,
PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);
-- Follow relationships (Graph or Cassandra)
CREATE TABLE followers (
user_id BIGINT,
follower_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, follower_id)
);
CREATE TABLE following (
user_id BIGINT,
following_id BIGINT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, following_id)
);
Timeline Cache Structure
Copy
# Redis sorted set for each user's timeline
# Key: timeline:{user_id}
# Score: tweet timestamp
# Value: tweet_id
# Add tweet to timeline
ZADD timeline:123 1705312800 "tweet_456789"
# Get latest 50 tweets
ZREVRANGE timeline:123 0 49
# Trim old tweets (keep only 800)
ZREMRANGEBYRANK timeline:123 0 -801
# Timeline structure
{
"timeline:123": {
"tweet_999": 1705312800, # Most recent
"tweet_998": 1705312700,
"tweet_997": 1705312600,
...
# Store last 800 tweet IDs
}
}
# Full tweet data cached separately
{
"tweet:999": {
"user_id": 456,
"content": "Hello world!",
"created_at": 1705312800,
"likes": 42
}
}
Fan-out Service
Copy
class FanoutService:
def __init__(self, follower_service, timeline_cache, message_queue):
self.follower_service = follower_service
self.timeline_cache = timeline_cache
self.queue = message_queue
def fanout_tweet(self, tweet):
user_id = tweet.user_id
followers = self.follower_service.get_followers(user_id)
# Check if user is a celebrity
if len(followers) > 10_000:
# Don't fan out for celebrities
# Store tweet in celebrity tweets cache instead
self.cache_celebrity_tweet(user_id, tweet)
return
# Fan-out to all followers
for batch in self.batch(followers, 1000):
# Process in parallel
self.queue.publish("fanout", {
"tweet_id": tweet.id,
"tweet_time": tweet.created_at,
"follower_ids": batch
})
def process_fanout_batch(self, message):
tweet_id = message["tweet_id"]
tweet_time = message["tweet_time"]
pipeline = self.timeline_cache.pipeline()
for follower_id in message["follower_ids"]:
# Add to timeline sorted set
pipeline.zadd(
f"timeline:{follower_id}",
{tweet_id: tweet_time}
)
# Trim to 800 tweets
pipeline.zremrangebyrank(f"timeline:{follower_id}", 0, -801)
pipeline.execute()
Timeline Service
Copy
class TimelineService:
def __init__(self, timeline_cache, tweet_service, follow_service):
self.cache = timeline_cache
self.tweet_service = tweet_service
self.follow_service = follow_service
def get_timeline(self, user_id, count=50, cursor=None):
# 1. Get pre-computed timeline (from regular users)
if cursor:
timeline_ids = self.cache.zrevrangebyscore(
f"timeline:{user_id}",
cursor,
"-inf",
start=0,
num=count
)
else:
timeline_ids = self.cache.zrevrange(
f"timeline:{user_id}",
0,
count - 1
)
# 2. Get celebrity tweets (fan-out on read)
celebrity_followees = self.follow_service.get_celebrity_followees(user_id)
celebrity_tweets = []
for celeb_id in celebrity_followees:
recent_tweets = self.tweet_service.get_recent_tweets(
celeb_id,
count=10
)
celebrity_tweets.extend(recent_tweets)
# 3. Merge and sort
all_tweet_ids = list(timeline_ids) + [t.id for t in celebrity_tweets]
# Fetch full tweet objects
tweets = self.tweet_service.get_tweets_batch(all_tweet_ids)
# Sort by time and take top N
tweets.sort(key=lambda t: t.created_at, reverse=True)
return tweets[:count]
Step 5: Search and Trending
Tweet Search with Elasticsearch
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Search Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Tweet Created │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ Kafka │────►│ Consumer │────►│ Elasticsearch │ │
│ │ │ │ Service │ │ │ │
│ └──────────┘ └──────────────┘ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ GET /tweets/_search │ │
│ │ { │ │
│ │ "query": { │ │
│ │ "match": { │ │
│ │ "content": "keyword" │ │
│ │ } │ │
│ │ } │ │
│ │ } │ │
│ └─────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Trending Topics
Copy
class TrendingService:
def __init__(self, redis_client):
self.redis = redis_client
def track_hashtag(self, hashtag, location="global"):
"""Called when a tweet with hashtag is posted"""
current_window = self.get_current_window()
# Increment counter in current time window
self.redis.zincrby(
f"trending:{location}:{current_window}",
1,
hashtag
)
def get_trending(self, location="global", count=10):
"""Get top trending hashtags"""
# Get last 5 time windows (e.g., 5 minute windows)
windows = self.get_recent_windows(5)
# Merge counts from recent windows
self.redis.zunionstore(
f"trending:{location}:merged",
[f"trending:{location}:{w}" for w in windows]
)
# Get top hashtags
return self.redis.zrevrange(
f"trending:{location}:merged",
0,
count - 1,
withscores=True
)
def get_current_window(self):
"""5-minute time windows"""
import time
return int(time.time() / 300) * 300
Step 6: Additional Components
Media Upload
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Media Upload Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Client requests upload URL │
│ POST /api/v1/media/upload-url │
│ │
│ 2. Server returns pre-signed S3 URL │
│ { "upload_url": "https://s3.../presigned", "media_id": 123}│
│ │
│ 3. Client uploads directly to S3 │
│ PUT https://s3.../presigned │
│ │
│ 4. S3 triggers processing lambda │
│ - Generate thumbnails │
│ - Transcode video │
│ - CDN propagation │
│ │
│ 5. Client includes media_id in tweet │
│ POST /api/v1/tweets │
│ { "content": "Hello!", "media_ids": [123] } │
│ │
│ ┌────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Client │───►│ API │───►│ S3 │───►│ CDN │ │
│ └────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────┐ │
│ │ │ Lambda │ │
│ │ │ Process │ │
│ │ └─────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────┐ │
│ └────────►│ Media │ │
│ │ DB │ │
│ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Notifications
Copy
class NotificationService:
def __init__(self, push_service, websocket_service, email_service):
self.push = push_service
self.ws = websocket_service
self.email = email_service
def notify_mention(self, mentioned_user_id, tweet):
notification = {
"type": "mention",
"from_user": tweet.user_id,
"tweet_id": tweet.id,
"content": f"@{tweet.user.username} mentioned you"
}
self.send_notification(mentioned_user_id, notification)
def notify_like(self, tweet_owner_id, liker_id, tweet_id):
# Batch likes to avoid notification spam
# "5 people liked your tweet"
pass
def send_notification(self, user_id, notification):
# 1. Store in notifications table
self.store_notification(user_id, notification)
# 2. Send real-time if user is online
if self.ws.is_connected(user_id):
self.ws.send(user_id, notification)
# 3. Send push notification
self.push.send(user_id, notification)
Final Architecture
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Complete Twitter Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ Clients │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ CDN │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────▼───────┐ │
│ │ API Gateway │ │
│ └──────┬───────┘ │
│ │ │
│ ┌───────────────────────────┼───────────────────────────┐ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────────┐ ┌──────┐ ┌──────┐ │
│ │Tweet │ │ User │ │ Timeline │ │Search│ │Notif │ │
│ │ Svc │ │ Svc │ │ Svc │ │ Svc │ │ Svc │ │
│ └──┬───┘ └──┬───┘ └────┬─────┘ └──┬───┘ └──┬───┘ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────┐ ┌──────┐ ┌──────────┐ ┌───────┐ ┌─────────┐ │
│ │Tweet │ │ User │ │ Timeline │ │Elastic│ │ Push │ │
│ │ DB │ │ DB │ │ Cache │ │Search │ │ Service │ │
│ │ │ │ │ │ (Redis) │ │ │ │ │ │
│ └──────┘ └──────┘ └──────────┘ └───────┘ └─────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Kafka │ │
│ │ (tweets, fanout, notifications, search indexing) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ S3 + CDN (Media) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions
| Decision | Choice | Reasoning |
|---|---|---|
| Timeline | Hybrid fan-out | Balance between read/write costs |
| Tweet Storage | Cassandra | High write throughput, time-series friendly |
| User Data | PostgreSQL | ACID for user operations |
| Timeline Cache | Redis Sorted Sets | O(log n) insert, O(1) range reads |
| Search | Elasticsearch | Full-text search, real-time indexing |
| Media | S3 + CDN | Scalable object storage |
| Messaging | Kafka | Durable, high-throughput event streaming |
Common Interview Questions
How do you handle the celebrity problem?
How do you handle the celebrity problem?
Use hybrid approach: fan-out on write for regular users, fan-out on read for celebrities (>10K followers). Celebrity tweets are fetched at read time and merged with the pre-computed timeline.
How do you ensure timeline freshness?
How do you ensure timeline freshness?
- Pre-compute timelines via fan-out on write
- Short TTL on cache (5 minutes)
- Use WebSocket for real-time updates
- Periodic refresh on client side
How do you handle tweet deletions?
How do you handle tweet deletions?
- Mark tweet as deleted in DB (soft delete)
- Async job to remove from all timeline caches
- Client-side filtering as backup
- Accept eventual consistency (deleted tweets may briefly appear)
How do you rank the timeline?
How do you rank the timeline?
- Chronological (simple, predictable)
- Algorithmic ranking (engagement prediction)
- Hybrid: recent tweets chronologically, older by relevance
- Store ranking score in timeline cache