Skip to main content
Networking Protocols for System Design

Why Networking Matters

Every distributed system communicates over networks. Understanding networking is crucial for:
  • Latency optimization - Where does delay come from?
  • Protocol selection - HTTP vs WebSocket vs gRPC
  • Debugging issues - Why is my API slow?
  • Security design - TLS, firewalls, VPNs

The OSI Model (Simplified)

┌────────────────────────────────────────────────────────────────┐
│                      OSI Model (7 Layers)                       │
├─────────┬──────────────────────────────────────────────────────┤
│ Layer 7 │ Application  │ HTTP, HTTPS, WebSocket, gRPC, DNS    │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 6 │ Presentation │ SSL/TLS, Encryption, Compression     │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 5 │ Session      │ Session management, Authentication   │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 4 │ Transport    │ TCP, UDP                              │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 3 │ Network      │ IP, Routing                           │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 2 │ Data Link    │ Ethernet, MAC addresses              │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 1 │ Physical     │ Cables, Radio waves                  │
└─────────┴──────────────────────────────────────────────────────┘

For system design, focus on Layers 4 and 7

DNS (Domain Name System)

DNS translates human-readable domain names to IP addresses.

DNS Resolution Flow

User types: www.example.com


    ┌──────────────────┐
    │  Browser Cache   │ ← Check local cache first
    └────────┬─────────┘
             │ Cache miss

    ┌──────────────────┐
    │    OS Cache      │ ← Check OS DNS cache
    └────────┬─────────┘
             │ Cache miss

    ┌──────────────────┐
    │ Recursive DNS    │ ← ISP's DNS resolver
    │    Resolver      │
    └────────┬─────────┘

   ┌─────────┴─────────┐
   ▼                   ▼
┌──────┐         ┌──────────┐         ┌──────────┐
│ Root │────────►│   TLD    │────────►│Authorit- │
│Server│         │(.com,.io)│         │ative DNS │
└──────┘         └──────────┘         └──────────┘
   │                  │                    │
   │ "Ask .com TLD"   │ "Ask ns.example"  │ "IP: 93.184.216.34"
   └──────────────────┴────────────────────┘

DNS Record Types

RecordPurposeExample
AMaps domain to IPv4example.com → 93.184.216.34
AAAAMaps domain to IPv6example.com → 2606:2800:220:1:...
CNAMEAlias to another domainwww.example.com → example.com
MXMail serverexample.com → mail.example.com
TXTText data (verification)SPF, DKIM records
NSNameserverexample.com → ns1.provider.com

DNS in System Design

┌─────────────────────────────────────────────────────────────────┐
│                    DNS-Based Load Balancing                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   DNS Query: api.example.com                                    │
│        │                                                        │
│        ▼                                                        │
│   ┌──────────────────────────────────────┐                     │
│   │         DNS Round Robin               │                     │
│   │   Returns different IPs each time     │                     │
│   └──────────────────────────────────────┘                     │
│        │                                                        │
│   ┌────┴────┬────────────────┐                                 │
│   ▼         ▼                ▼                                 │
│ Server 1  Server 2        Server 3                             │
│ 1.1.1.1   2.2.2.2         3.3.3.3                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pros: Simple, no extra infrastructure
Cons: No health checks, TTL caching delays failover
DNS TTL Matters: Low TTL (60s) = faster failover, more DNS queries. High TTL (3600s) = better caching, slower failover. Common: 300s (5 min)

TCP vs UDP

TCP (Transmission Control Protocol)

┌─────────────────────────────────────────────────────────────────┐
│                    TCP Three-Way Handshake                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│     Client                              Server                  │
│        │                                   │                    │
│        │──────── SYN (seq=x) ─────────────►│                   │
│        │                                   │                    │
│        │◄────── SYN-ACK (seq=y, ack=x+1) ──│                   │
│        │                                   │                    │
│        │──────── ACK (ack=y+1) ────────────►│                   │
│        │                                   │                    │
│        │◄═══════ Connection Established ═══►│                   │
│        │                                   │                    │
│        │←───────── Data Transfer ──────────→│                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

TCP vs UDP Comparison

FeatureTCPUDP
ConnectionConnection-orientedConnectionless
ReliabilityGuaranteed deliveryBest effort
OrderingIn-order deliveryNo ordering
SpeedSlower (overhead)Faster
Use CaseHTTP, databasesVideo streaming, gaming, DNS
Header Size20-60 bytes8 bytes

When to Use What

Use TCP

  • Web applications (HTTP/HTTPS)
  • File transfers
  • Database connections
  • Email (SMTP, IMAP)
  • When data integrity matters

Use UDP

  • Live video/audio streaming
  • Online gaming
  • DNS queries
  • IoT sensors
  • When speed > reliability

HTTP/HTTPS

HTTP Request/Response

┌─────────────────────────────────────────────────────────────────┐
│                    HTTP Request                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GET /api/users/123 HTTP/1.1                                   │
│  Host: api.example.com                                          │
│  Authorization: Bearer eyJhbGciOiJIUzI1...                     │
│  Content-Type: application/json                                 │
│  Accept: application/json                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    HTTP Response                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  HTTP/1.1 200 OK                                                │
│  Content-Type: application/json                                 │
│  Cache-Control: max-age=3600                                    │
│  X-RateLimit-Remaining: 99                                      │
│                                                                 │
│  {                                                              │
│    "id": 123,                                                   │
│    "name": "John Doe",                                          │
│    "email": "[email protected]"                                  │
│  }                                                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

HTTP Methods

MethodPurposeIdempotentSafe
GETRetrieve resourceYesYes
POSTCreate resourceNoNo
PUTReplace resourceYesNo
PATCHPartial updateNoNo
DELETERemove resourceYesNo
HEADGet headers onlyYesYes
OPTIONSGet allowed methodsYesYes

HTTP Status Codes

┌───────────────────────────────────────────────────────────────┐
│                    HTTP Status Codes                          │
├───────────────────────────────────────────────────────────────┤
│                                                               │
│  1xx Informational  │  100 Continue, 101 Switching Protocols │
│  ───────────────────┼───────────────────────────────────────│
│  2xx Success        │  200 OK                                │
│                     │  201 Created                           │
│                     │  204 No Content                        │
│  ───────────────────┼───────────────────────────────────────│
│  3xx Redirection    │  301 Moved Permanently                 │
│                     │  302 Found (temporary)                 │
│                     │  304 Not Modified                      │
│  ───────────────────┼───────────────────────────────────────│
│  4xx Client Error   │  400 Bad Request                       │
│                     │  401 Unauthorized                      │
│                     │  403 Forbidden                         │
│                     │  404 Not Found                         │
│                     │  429 Too Many Requests                 │
│  ───────────────────┼───────────────────────────────────────│
│  5xx Server Error   │  500 Internal Server Error             │
│                     │  502 Bad Gateway                       │
│                     │  503 Service Unavailable               │
│                     │  504 Gateway Timeout                   │
│                                                               │
└───────────────────────────────────────────────────────────────┘

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1                    HTTP/2                     HTTP/3
┌─────────────────┐        ┌─────────────────┐       ┌─────────────────┐
│ Request 1       │        │ ┌───┬───┬───┐   │       │ ┌───┬───┬───┐   │
│ ─────────────── │        │ │ 1 │ 2 │ 3 │   │       │ │ 1 │ 2 │ 3 │   │
│ Response 1      │        │ └───┴───┴───┘   │       │ └───┴───┴───┘   │
│ ═══════════════ │        │  Multiplexed    │       │  Over QUIC     │
│ Request 2       │        │  Binary frames  │       │  (UDP based)    │
│ ─────────────── │        │  on single TCP  │       │  0-RTT resume   │
│ Response 2      │        │                 │       │                 │
│ ═══════════════ │        │ Server Push     │       │ No head-of-line │
│ (Sequential)    │        │ Header Compress │       │  blocking       │
└─────────────────┘        └─────────────────┘       └─────────────────┘

6 connections max          Single connection         Faster, resilient
Head-of-line blocking      per origin                to network changes

HTTPS/TLS Handshake

┌─────────────────────────────────────────────────────────────────┐
│                    TLS 1.3 Handshake                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Client                                         Server          │
│     │                                              │            │
│     │──── ClientHello + Key Share ────────────────►│           │
│     │     (Supported ciphers, client random)       │            │
│     │                                              │            │
│     │◄─── ServerHello + Key Share + Certificate ──│           │
│     │     (Selected cipher, server random, cert)   │            │
│     │                                              │            │
│     │     [Both compute shared secret]             │            │
│     │                                              │            │
│     │◄════════ Encrypted Application Data ════════►│           │
│     │                                              │            │
│                                                                 │
│  TLS 1.3: 1-RTT handshake (down from 2-RTT in TLS 1.2)        │
│  0-RTT resumption for returning clients                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

WebSockets

WebSocket vs HTTP

HTTP (Request-Response)              WebSocket (Bidirectional)

Client          Server               Client          Server
   │──── GET ─────►│                    │──── Upgrade ──►│
   │◄─── 200 ──────│                    │◄─── 101 ───────│
   │               │                    │                │
   │──── GET ─────►│                    │◄══════════════►│
   │◄─── 200 ──────│                    │  Full-duplex   │
   │               │                    │  connection    │
   │ (Poll again)  │                    │                │
   │──── GET ─────►│                    │◄══════════════►│
   │◄─── 200 ──────│                    │                │

Each request = new TCP     Single persistent connection
connection overhead        Low latency, real-time

WebSocket Use Cases

Real-time Chat

WhatsApp, Slack, Discord - instant message delivery

Live Updates

Stock prices, sports scores, notifications

Gaming

Multiplayer games, real-time player positions

Collaboration

Google Docs, Figma - live editing

WebSocket Scaling Challenge

WebSocket Scaling with Pub/Sub
┌─────────────────────────────────────────────────────────────────┐
│                WebSocket Scaling with Pub/Sub                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│     User A                                       User B         │
│        │                                            │           │
│        │ WS                                      WS │           │
│        ▼                                            ▼           │
│   ┌─────────┐                                  ┌─────────┐     │
│   │Server 1 │                                  │Server 2 │     │
│   └────┬────┘                                  └────┬────┘     │
│        │                                            │           │
│        └──────────────┬─────────────────────────────┘           │
│                       │                                         │
│                ┌──────▼──────┐                                 │
│                │    Redis    │  Pub/Sub for cross-server       │
│                │   Pub/Sub   │  message broadcasting           │
│                └─────────────┘                                 │
│                                                                 │
│  Problem: User A on Server 1 messages User B on Server 2       │
│  Solution: Publish to Redis, all servers subscribe             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

WebSocket Implementation

Production-ready WebSocket server with connection management:
import asyncio
import json
from typing import Dict, Set, Optional, Any
from dataclasses import dataclass, field
from datetime import datetime
import uuid
import aioredis
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from contextlib import asynccontextmanager

@dataclass
class Connection:
    """Represents a WebSocket connection"""
    websocket: WebSocket
    user_id: str
    connection_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    connected_at: datetime = field(default_factory=datetime.utcnow)
    subscriptions: Set[str] = field(default_factory=set)
    metadata: Dict[str, Any] = field(default_factory=dict)

class ConnectionManager:
    """
    Production WebSocket connection manager with Redis pub/sub
    for multi-server deployments
    """
    
    def __init__(self, redis_url: str = "redis://localhost"):
        self.redis_url = redis_url
        self.connections: Dict[str, Connection] = {}
        self.user_connections: Dict[str, Set[str]] = {}
        self.room_connections: Dict[str, Set[str]] = {}
        self.redis: Optional[aioredis.Redis] = None
        self.pubsub: Optional[aioredis.client.PubSub] = None
    
    async def initialize(self) -> None:
        """Initialize Redis connection for pub/sub"""
        self.redis = await aioredis.from_url(self.redis_url)
        self.pubsub = self.redis.pubsub()
        
        # Start listening for broadcast messages
        asyncio.create_task(self._redis_listener())
    
    async def connect(
        self, 
        websocket: WebSocket, 
        user_id: str,
        metadata: Optional[Dict] = None
    ) -> Connection:
        """Accept and register a new WebSocket connection"""
        await websocket.accept()
        
        connection = Connection(
            websocket=websocket,
            user_id=user_id,
            metadata=metadata or {}
        )
        
        # Register connection
        self.connections[connection.connection_id] = connection
        
        if user_id not in self.user_connections:
            self.user_connections[user_id] = set()
        self.user_connections[user_id].add(connection.connection_id)
        
        # Notify user came online
        await self._publish_presence(user_id, "online")
        
        return connection
    
    async def disconnect(self, connection: Connection) -> None:
        """Clean up a disconnected WebSocket"""
        conn_id = connection.connection_id
        user_id = connection.user_id
        
        # Remove from rooms
        for room in connection.subscriptions:
            if room in self.room_connections:
                self.room_connections[room].discard(conn_id)
        
        # Remove from user connections
        if user_id in self.user_connections:
            self.user_connections[user_id].discard(conn_id)
            
            # If no more connections, user is offline
            if not self.user_connections[user_id]:
                del self.user_connections[user_id]
                await self._publish_presence(user_id, "offline")
        
        # Remove connection
        del self.connections[conn_id]
    
    async def join_room(self, connection: Connection, room: str) -> None:
        """Subscribe connection to a room"""
        connection.subscriptions.add(room)
        
        if room not in self.room_connections:
            self.room_connections[room] = set()
            # Subscribe to Redis channel for this room
            await self.pubsub.subscribe(f"room:{room}")
        
        self.room_connections[room].add(connection.connection_id)
    
    async def leave_room(self, connection: Connection, room: str) -> None:
        """Unsubscribe connection from a room"""
        connection.subscriptions.discard(room)
        
        if room in self.room_connections:
            self.room_connections[room].discard(connection.connection_id)
    
    async def send_to_user(self, user_id: str, message: Dict) -> int:
        """Send message to all connections of a user"""
        sent = 0
        
        # Local connections
        if user_id in self.user_connections:
            for conn_id in self.user_connections[user_id]:
                if conn_id in self.connections:
                    try:
                        await self.connections[conn_id].websocket.send_json(message)
                        sent += 1
                    except Exception:
                        pass
        
        # Broadcast to other servers via Redis
        await self.redis.publish(
            f"user:{user_id}",
            json.dumps(message)
        )
        
        return sent
    
    async def broadcast_to_room(self, room: str, message: Dict) -> int:
        """Broadcast message to all connections in a room"""
        sent = 0
        
        # Local connections
        if room in self.room_connections:
            for conn_id in self.room_connections[room]:
                if conn_id in self.connections:
                    try:
                        await self.connections[conn_id].websocket.send_json(message)
                        sent += 1
                    except Exception:
                        pass
        
        # Broadcast to other servers via Redis
        await self.redis.publish(
            f"room:{room}",
            json.dumps(message)
        )
        
        return sent
    
    async def _redis_listener(self) -> None:
        """Listen for messages from other servers"""
        async for message in self.pubsub.listen():
            if message['type'] != 'message':
                continue
            
            channel = message['channel'].decode()
            data = json.loads(message['data'])
            
            if channel.startswith('room:'):
                room = channel.split(':', 1)[1]
                await self._local_broadcast_room(room, data)
            elif channel.startswith('user:'):
                user_id = channel.split(':', 1)[1]
                await self._local_send_user(user_id, data)
    
    async def _local_broadcast_room(self, room: str, message: Dict) -> None:
        """Broadcast to local room connections only"""
        if room in self.room_connections:
            for conn_id in self.room_connections[room]:
                if conn_id in self.connections:
                    try:
                        await self.connections[conn_id].websocket.send_json(message)
                    except Exception:
                        pass
    
    async def _local_send_user(self, user_id: str, message: Dict) -> None:
        """Send to local user connections only"""
        if user_id in self.user_connections:
            for conn_id in self.user_connections[user_id]:
                if conn_id in self.connections:
                    try:
                        await self.connections[conn_id].websocket.send_json(message)
                    except Exception:
                        pass
    
    async def _publish_presence(self, user_id: str, status: str) -> None:
        """Publish user presence change"""
        await self.redis.publish(
            "presence",
            json.dumps({"user_id": user_id, "status": status})
        )

# FastAPI WebSocket endpoint
app = FastAPI()
manager = ConnectionManager()

@app.on_event("startup")
async def startup():
    await manager.initialize()

@app.websocket("/ws/{user_id}")
async def websocket_endpoint(websocket: WebSocket, user_id: str):
    connection = await manager.connect(websocket, user_id)
    
    try:
        while True:
            data = await websocket.receive_json()
            
            # Handle different message types
            msg_type = data.get("type")
            
            if msg_type == "join_room":
                await manager.join_room(connection, data["room"])
                
            elif msg_type == "leave_room":
                await manager.leave_room(connection, data["room"])
                
            elif msg_type == "room_message":
                await manager.broadcast_to_room(
                    data["room"],
                    {
                        "type": "message",
                        "room": data["room"],
                        "from": user_id,
                        "content": data["content"],
                        "timestamp": datetime.utcnow().isoformat()
                    }
                )
                
            elif msg_type == "direct_message":
                await manager.send_to_user(
                    data["to"],
                    {
                        "type": "direct_message",
                        "from": user_id,
                        "content": data["content"],
                        "timestamp": datetime.utcnow().isoformat()
                    }
                )
                
    except WebSocketDisconnect:
        await manager.disconnect(connection)

gRPC

gRPC vs REST

FeatureRESTgRPC
ProtocolHTTP/1.1 or HTTP/2HTTP/2
PayloadJSON (text)Protobuf (binary)
ContractOpenAPI (optional).proto files (required)
StreamingLimitedBidirectional streaming
BrowserNative supportRequires gRPC-Web
Code GenOptionalBuilt-in
SpeedSlower10x faster

gRPC Communication Patterns

┌─────────────────────────────────────────────────────────────────┐
│                    gRPC Patterns                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Unary (Request-Response)                                   │
│     Client ────── Request ──────► Server                       │
│     Client ◄───── Response ─────── Server                       │
│                                                                 │
│  2. Server Streaming                                            │
│     Client ────── Request ──────► Server                       │
│     Client ◄═══ Stream of data ══ Server                       │
│                                                                 │
│  3. Client Streaming                                            │
│     Client ═══ Stream of data ══► Server                       │
│     Client ◄───── Response ─────── Server                       │
│                                                                 │
│  4. Bidirectional Streaming                                     │
│     Client ◄═══════════════════► Server                        │
│             Both stream freely                                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When to Use gRPC

Use gRPC

  • Microservices communication
  • Low latency requirements
  • Strong typing needed
  • Streaming data
  • Internal services

Avoid gRPC

  • Public APIs (browser clients)
  • Simple CRUD operations
  • Team unfamiliar with Protobuf
  • Debugging ease is priority

Long Polling vs SSE vs WebSocket

┌───────────────────────────────────────────────────────────────────────┐
│                    Real-Time Communication Options                     │
├───────────────────┬───────────────────┬───────────────────────────────┤
│    Long Polling   │       SSE         │        WebSocket              │
├───────────────────┼───────────────────┼───────────────────────────────┤
│                   │                   │                               │
│  Client ── GET ──►│  Client ── GET ──►│  Client ── Upgrade ──►        │
│  Server holds...  │  Server streams   │  Bidirectional                │
│  Until data ready │  event: update    │                               │
│  Client ◄─ data ─ │  data: {...}      │  Client ◄═══════════► Server │
│  Repeat           │  data: {...}      │                               │
│                   │  (one direction)  │                               │
│                   │                   │                               │
├───────────────────┼───────────────────┼───────────────────────────────┤
│ Many TCP connects │ One TCP, server→  │ One TCP, both ways            │
│ HTTP compatible   │ HTTP compatible   │ Different protocol            │
│ Simple            │ Auto-reconnect    │ Most flexible                 │
│ Higher latency    │ Medium latency    │ Lowest latency                │
│                   │                   │                               │
│ Use: Fallback     │ Use: Notifications│ Use: Chat, gaming             │
│ for legacy        │ Live feeds        │ Collaboration                 │
└───────────────────┴───────────────────┴───────────────────────────────┘

Network Latency Budget

Where Time Goes

┌─────────────────────────────────────────────────────────────────┐
│                Request Latency Breakdown                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DNS Lookup:           1-50ms     (cached: <1ms)               │
│  TCP Handshake:        1 RTT     (50-150ms cross-continent)   │
│  TLS Handshake:        1-2 RTT   (50-300ms)                    │
│  Request Transfer:     Varies    (size / bandwidth)            │
│  Server Processing:    Varies    (your code)                   │
│  Response Transfer:    Varies    (size / bandwidth)            │
│                                                                 │
│  Example: USA → Europe API call                                 │
│  ─────────────────────────────────                              │
│  DNS:         5ms   (cached)                                   │
│  TCP:        75ms   (1 RTT)                                    │
│  TLS:       150ms   (2 RTT)                                    │
│  Request:    10ms   (small payload)                            │
│  Server:     50ms   (DB query + processing)                    │
│  Response:   20ms   (JSON response)                            │
│  ─────────────────────────────────                              │
│  TOTAL:     310ms                                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Optimization Strategies

OptimizationLatency SavedTrade-off
CDN100-200msCost, cache invalidation
Keep-Alive150-300msConnection limits
HTTP/2VariableServer support needed
Compression10-50msCPU overhead
Edge Computing100-200msComplexity
DNS Prefetch50msAdditional requests
Interview Tip: When discussing latency, mention geographic distribution. “Users in Singapore accessing servers in US-East will have ~200ms RTT just from physics.”

Key Takeaways

ConceptRemember
DNSFirst hop, cache TTLs matter, can be used for load balancing
TCP vs UDPTCP = reliable, UDP = fast; choose based on use case
HTTP/2Multiplexing, server push, header compression
WebSocketReal-time bidirectional, needs pub/sub for scaling
gRPCFast binary protocol, great for microservices
LatencyMinimize RTTs, use CDNs, keep connections alive