Problem Statement
Design a ride-hailing service like Uber that:- Matches riders with nearby drivers in real-time
- Tracks driver locations continuously
- Calculates fare estimates and handles payments
- Scales to millions of concurrent rides
This is a hard interview problem. Focus on real-time location tracking, efficient geo-matching, and the dispatch algorithm. The location update rate (every 4 seconds) creates significant scale challenges.
Step 1: Requirements
Functional Requirements
Rider Features
- Request a ride
- See nearby drivers
- Track ride in real-time
- Pay for ride
- Rate driver
Driver Features
- Go online/offline
- Accept/reject rides
- Navigate to pickup
- Navigate to destination
- View earnings
Non-Functional Requirements
- Low Latency: Match rider in < 10 seconds
- Real-time: Location updates every 4 seconds
- High Availability: 99.99% uptime
- Consistency: No double-booking of drivers
Capacity Estimation
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Uber Scale │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Users: │
│ • 100 million monthly active riders │
│ • 5 million drivers │
│ • 20 million rides per day │
│ • 230 rides per second (peak: 500+ RPS) │
│ │
│ Location Updates: │
│ • 2 million active drivers (peak) │
│ • Update every 4 seconds │
│ • 2M / 4 = 500,000 updates per second! │
│ │
│ Data per update: │
│ • Driver ID (8 bytes) │
│ • Latitude (8 bytes) │
│ • Longitude (8 bytes) │
│ • Timestamp (8 bytes) │
│ • ~50 bytes per update │
│ • 500K × 50 = 25 MB/second of location data │
│ │
│ Storage (per day): │
│ • 25 MB × 60 × 60 × 24 = 2.16 TB/day of location data │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 2: High-Level Design
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Uber Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ │
│ │ Rider │ │ Driver │ │
│ │ App │ │ App │ │
│ └────┬────┘ └────┬────┘ │
│ │ │ │
│ │ WebSocket │ │
│ │ connection │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Load Balancer │ │
│ └─────────────────────┬───────────────────┘ │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ │ │ │ │
│ ┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐ │
│ │ Trip │ │ Location │ │ Matching │ │
│ │ Service │ │ Service │ │ Service │ │
│ └────┬────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ │ │ │ │
│ ┌────▼────┐ ┌─────▼─────┐ ┌─────▼─────┐ │
│ │ Trip DB │ │ Redis │ │ Matching │ │
│ │(Postgres│ │(Location) │ │ Queue │ │
│ └─────────┘ └───────────┘ └───────────┘ │
│ │
│ Additional Services: │
│ • Payment Service • Notification Service │
│ • Pricing Service • Maps/ETA Service │
│ • User Service • Rating Service │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 3: Location Tracking
The Challenge
500,000 location updates per second! How do we:- Store current locations efficiently
- Query “drivers near point X” quickly
- Handle the write throughput
Solution: Geospatial Indexing
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Location Storage │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Option 1: Geohash + Redis │
│ ───────────────────────── │
│ │
│ Geohash: Encode lat/lng into string │
│ • Nearby locations share prefix │
│ • "9q8yy" = San Francisco area │
│ • "9q8yyb" = smaller area within SF │
│ │
│ Redis Structure: │
│ ┌──────────────────────────────────────────────┐ │
│ │ GEOADD drivers <lng> <lat> driver_id │ │
│ │ GEOSEARCH drivers FROMMEMBER point │ │
│ │ BYRADIUS 3 km │ │
│ └──────────────────────────────────────────────┘ │
│ │
│ Option 2: QuadTree (In-Memory) │
│ ────────────────────────────── │
│ │
│ ┌─────────────────────┐ │
│ │ World Map │ │
│ └─────────┬───────────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ │ │ │ │
│ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐ │
│ │ NW │ │ NE │ │ ...│ │
│ └──┬──┘ └─────┘ └─────┘ │
│ │ │
│ ┌───┼───┐ │
│ │ │ │ │
│ ┌▼┐ ┌▼┐ ┌▼┐ │
│ │ │ │ │ │ │ Keep splitting until │
│ └─┘ └─┘ └─┘ few drivers per cell │
│ │
│ Search: Find cell, check neighboring cells │
│ │
└─────────────────────────────────────────────────────────────────┘
Location Service Implementation
Copy
class LocationService:
"""
Handles 500K location updates per second
Uses Redis GEO commands for efficient geospatial queries
"""
def __init__(self, redis_cluster):
self.redis = redis_cluster
# Shard by city for scale
def update_location(self, driver_id: str, lat: float, lng: float):
"""Update driver's current location"""
city = self.get_city(lat, lng) # e.g., "san_francisco"
# Redis GEO command - O(log N)
self.redis.geoadd(
f"drivers:{city}:available",
lng, lat, driver_id
)
# Also store in hash for quick lookup
self.redis.hset(
f"driver:{driver_id}",
mapping={
"lat": lat,
"lng": lng,
"last_updated": time.time()
}
)
def find_nearby_drivers(
self, lat: float, lng: float,
radius_km: float = 3.0, limit: int = 10
) -> List[Driver]:
"""Find available drivers within radius"""
city = self.get_city(lat, lng)
# Redis GEOSEARCH - O(N+log(M)) where N = elements returned
drivers = self.redis.geosearch(
f"drivers:{city}:available",
longitude=lng,
latitude=lat,
radius=radius_km,
unit="km",
count=limit,
sort="ASC", # Closest first
withdist=True,
withcoord=True
)
return drivers
def mark_driver_unavailable(self, driver_id: str, city: str):
"""Remove driver from available pool"""
self.redis.zrem(f"drivers:{city}:available", driver_id)
Step 4: Matching Algorithm
The Dispatch Problem
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Matching Algorithm │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Goal: Match rider to best available driver │
│ │
│ Simple approach: Closest driver │
│ Better approach: Consider multiple factors │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Matching Score │ │
│ │ │ │
│ │ Score = w₁ × Distance + │ │
│ │ w₂ × ETA + │ │
│ │ w₃ × Driver Rating + │ │
│ │ w₄ × Driver Acceptance Rate + │ │
│ │ w₅ × Car Type Match + │ │
│ │ w₆ × Driver's Time Since Last Ride │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Process: │
│ ───────── │
│ 1. Rider requests ride │
│ 2. Find N nearest available drivers (N = 10-20) │
│ 3. Score each driver │
│ 4. Send request to top driver │
│ 5. Wait for response (15-20 seconds) │
│ 6. If rejected/timeout → try next driver │
│ 7. If all reject → expand radius and retry │
│ │
│ Uber's approach: Batched matching │
│ ───────────────────────────────────── │
│ Every 2 seconds, run matching for ALL pending requests │
│ Optimize globally (Hungarian algorithm) │
│ Better matches overall, slight delay acceptable │
│ │
└─────────────────────────────────────────────────────────────────┘
Matching Service
Copy
class MatchingService:
"""
Matches riders with optimal drivers
"""
def match_rider(self, ride_request: RideRequest) -> Optional[Driver]:
# 1. Find nearby available drivers
nearby_drivers = self.location_service.find_nearby_drivers(
ride_request.pickup_lat,
ride_request.pickup_lng,
radius_km=5.0,
limit=20
)
if not nearby_drivers:
return None
# 2. Score each driver
scored_drivers = []
for driver in nearby_drivers:
score = self.calculate_match_score(ride_request, driver)
scored_drivers.append((driver, score))
# 3. Sort by score (higher is better)
scored_drivers.sort(key=lambda x: x[1], reverse=True)
# 4. Try each driver in order
for driver, score in scored_drivers:
accepted = self.send_ride_request(driver, ride_request)
if accepted:
return driver
return None
def calculate_match_score(
self, request: RideRequest, driver: Driver
) -> float:
# ETA from driver to pickup
eta = self.maps_service.get_eta(
driver.lat, driver.lng,
request.pickup_lat, request.pickup_lng
)
# Scoring weights
score = 0
score += (1 / eta) * 30 # Lower ETA = higher score
score += driver.rating * 20 # 4.9 rating = 98 points
score += driver.acceptance_rate * 15 # 90% = 13.5 points
score += driver.idle_time * 0.5 # Fair distribution
return score
Step 5: Trip State Machine
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Trip State Machine │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ │
│ │REQUESTED │ ───Matched───► ┌──────────┐ │
│ └──────────┘ │ ACCEPTED │ │
│ │ └────┬─────┘ │
│ │ │ │
│ No match Driver arrives │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │CANCELLED │ │ ARRIVED │ │
│ └──────────┘ └────┬─────┘ │
│ ▲ │ │
│ │ Rider boards │
│ Cancel │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ ├────────────────────│IN_PROGRESS│ │
│ │ └────┬─────┘ │
│ │ │ │
│ │ Reach destination │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ │ │COMPLETED │ │
│ │ └────┬─────┘ │
│ │ │ │
│ │ Payment processed │
│ │ │ │
│ │ ▼ │
│ │ ┌──────────┐ │
│ └───────────────────│ PAID │ │
│ └──────────┘ │
│ │
│ Each state change: │
│ • Update database │
│ • Notify rider & driver via WebSocket │
│ • Emit event for analytics │
│ │
└─────────────────────────────────────────────────────────────────┘
Step 6: Pricing & Surge
Dynamic Pricing
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Surge Pricing │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Why surge? Balance supply and demand │
│ │
│ Demand > Supply → Increase prices → More drivers come │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Supply/Demand by Area │ │
│ │ │ │
│ │ Area Riders Drivers Ratio Surge │ │
│ │ ───── ────── ─────── ───── ───── │ │
│ │ Downtown 500 100 5.0 2.5x │ │
│ │ Airport 200 80 2.5 1.5x │ │
│ │ Suburbs 50 60 0.83 1.0x │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Surge calculation: │
│ │
│ demand_ratio = active_requests / available_drivers │
│ │
│ if demand_ratio <= 1.0: │
│ surge = 1.0 │
│ elif demand_ratio <= 2.0: │
│ surge = 1.0 + (demand_ratio - 1) * 0.5 # 1.0 - 1.5x │
│ elif demand_ratio <= 4.0: │
│ surge = 1.5 + (demand_ratio - 2) * 0.5 # 1.5 - 2.5x │
│ else: │
│ surge = min(demand_ratio * 0.5, 5.0) # Cap at 5x │
│ │
└─────────────────────────────────────────────────────────────────┘
Fare Calculation
Copy
class PricingService:
def calculate_fare(
self,
pickup: Location,
dropoff: Location,
car_type: str = "UberX"
) -> FareEstimate:
# Get route details
route = self.maps_service.get_route(pickup, dropoff)
distance_km = route.distance_km
duration_min = route.duration_min
# Base fare components
rates = self.get_rates(car_type)
base_fare = rates.base_fare # $2.50
distance_fare = distance_km * rates.per_km # $1.50/km
time_fare = duration_min * rates.per_min # $0.20/min
booking_fee = rates.booking_fee # $2.00
subtotal = base_fare + distance_fare + time_fare + booking_fee
# Apply surge
surge_multiplier = self.get_surge_multiplier(pickup)
total = subtotal * surge_multiplier
# Apply minimum fare
total = max(total, rates.minimum_fare)
return FareEstimate(
low=total * 0.9,
high=total * 1.1,
surge=surge_multiplier,
breakdown={
"base": base_fare,
"distance": distance_fare,
"time": time_fare,
"booking_fee": booking_fee,
"surge": surge_multiplier
}
)
Step 7: Data Models
Copy
-- Users table
CREATE TABLE users (
id UUID PRIMARY KEY,
type VARCHAR(10), -- 'rider' or 'driver'
name VARCHAR(100),
email VARCHAR(100) UNIQUE,
phone VARCHAR(20) UNIQUE,
rating DECIMAL(3,2) DEFAULT 5.0,
created_at TIMESTAMP
);
-- Drivers additional info
CREATE TABLE driver_profiles (
user_id UUID PRIMARY KEY REFERENCES users(id),
license_number VARCHAR(50),
car_model VARCHAR(100),
car_plate VARCHAR(20),
car_type VARCHAR(20), -- 'UberX', 'UberXL', 'Black'
is_online BOOLEAN DEFAULT FALSE,
acceptance_rate DECIMAL(3,2) DEFAULT 1.0
);
-- Trips table
CREATE TABLE trips (
id UUID PRIMARY KEY,
rider_id UUID REFERENCES users(id),
driver_id UUID REFERENCES users(id),
status VARCHAR(20),
pickup_lat DECIMAL(10, 7),
pickup_lng DECIMAL(10, 7),
pickup_address VARCHAR(255),
dropoff_lat DECIMAL(10, 7),
dropoff_lng DECIMAL(10, 7),
dropoff_address VARCHAR(255),
estimated_fare DECIMAL(10, 2),
actual_fare DECIMAL(10, 2),
surge_multiplier DECIMAL(3, 2) DEFAULT 1.0,
distance_km DECIMAL(8, 2),
duration_min INT,
requested_at TIMESTAMP,
accepted_at TIMESTAMP,
started_at TIMESTAMP,
completed_at TIMESTAMP,
rider_rating INT,
driver_rating INT
);
-- Location history (for route reconstruction, ETAs)
CREATE TABLE location_history (
driver_id UUID,
timestamp TIMESTAMP,
lat DECIMAL(10, 7),
lng DECIMAL(10, 7),
PRIMARY KEY (driver_id, timestamp)
) WITH (timescaledb.continuous);
-- Use TimescaleDB for efficient time-series storage
Final Architecture
Copy
┌─────────────────────────────────────────────────────────────────┐
│ Complete Uber Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Rider App │ │ Driver App │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ │ WebSocket │ │
│ └────────────┬───────────────┘ │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ Load Balancer │ │
│ └─────────┬─────────┘ │
│ │ │
│ ┌─────────────────────────┼─────────────────────────┐ │
│ │ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────┐ ┌────────┐ ┌──────┐│┌──────┐┌────────┐┌────────┐ │
│ │ Trip │ │Location│ │Match ││ │Price│ │Payment │ │ User │ │
│ │ Svc │ │ Svc │ │ Svc ││ │ Svc │ │ Svc │ │ Svc │ │
│ └──┬───┘ └───┬────┘ └──┬───┘│└──┬───┘ └───┬────┘ └───┬────┘ │
│ │ │ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────┐ ┌────────┐ ┌──────┐│┌──────┐ ┌────────┐ ┌────────┐ │
│ │Postgres│ │ Redis │ │Kafka ││ │Cache │ │ Stripe │ │Postgres│ │
│ │(Trips) │ │(Geo) │ │ ││ │ │ │ │ │(Users) │ │
│ └────────┘ └────────┘ └──────┘│└──────┘ └────────┘ └────────┘ │
│ │ │
│ ┌──────────▼──────────┐ │
│ │ Maps Service │ │
│ │ (ETA, Directions) │ │
│ │ Google/Mapbox API │ │
│ └─────────────────────┘ │
│ │
│ Real-time Push: │
│ • WebSocket for rider/driver updates │
│ • Location updates every 4 seconds │
│ • Trip status changes pushed instantly │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Design Decisions
| Decision | Choice | Reasoning |
|---|---|---|
| Location Store | Redis + Geo commands | Fast writes, efficient geo queries |
| Matching | Score-based + batch | Balance speed and optimality |
| Real-time | WebSockets | Bidirectional, low latency |
| Trip DB | PostgreSQL | ACID for payments, consistency |
| Analytics | Kafka + TimescaleDB | Time-series location data |
| Maps | External API | Don’t reinvent Google Maps |
Common Interview Questions
How do you ensure a driver isn't assigned to two rides?
How do you ensure a driver isn't assigned to two rides?
- Use distributed lock when assigning (Redis SETNX)
- Mark driver unavailable atomically before sending request
- Use database transaction with driver status check
- If request times out, release lock and mark available again
- Idempotency key for each matching attempt
How do you handle driver going offline mid-ride?
How do you handle driver going offline mid-ride?
- Detect: No location updates for 30+ seconds
- Notify rider: “Driver connectivity issue”
- If prolonged: Auto-cancel and rematch
- Trip marked as “interrupted” - rider not charged
- Support ticket auto-created
- Driver rating affected if happens frequently
How do you scale location updates?
How do you scale location updates?
- Shard Redis by city/region
- Use Redis Cluster for horizontal scaling
- Client-side throttling (don’t update if < 10m moved)
- UDP for location updates (some loss OK)
- Batch updates on server side
- Drop stale updates (older than 10 seconds)
How does ETA work?
How does ETA work?
- Get route from Maps API (Google/Mapbox)
- Consider current traffic conditions
- Historical data for time-of-day patterns
- Adjust for driver behavior (driving style)
- Update ETA as driver moves
- Cache common routes
How do you prevent fraud?
How do you prevent fraud?
- GPS spoofing detection (impossible movements)
- Verify rider/driver at same location at pickup
- Monitor unusual patterns (circular routes)
- Photo verification at start
- ML model for anomaly detection
- Manual review queue for flagged rides