> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Networking Fundamentals

> Essential networking concepts for system design - DNS, HTTP, TCP/UDP, WebSockets

<Frame>
  <img src="https://mintcdn.com/devweeekends/2f8Rfaato9LS1FSq/images/system-design/networking-protocols.svg?fit=max&auto=format&n=2f8Rfaato9LS1FSq&q=85&s=429607f35a22a6e18c6630e43b6ddc24" alt="Networking Protocols for System Design" width="1080" height="1080" data-path="images/system-design/networking-protocols.svg" />
</Frame>

## Why Networking Matters

Every distributed system communicates over networks. Understanding networking is crucial for:

* **Latency optimization** - Where does delay come from?
* **Protocol selection** - HTTP vs WebSocket vs gRPC
* **Debugging issues** - Why is my API slow?
* **Security design** - TLS, firewalls, VPNs

Think of the network as the road system connecting buildings in a city. TCP is like certified mail -- guaranteed delivery, in order, with a receipt. UDP is like shouting across the street -- fast, no guarantee they heard you, but good enough for many situations. HTTP is like a form you fill out at a government office window: you submit a request, wait, and get a response. WebSockets are like a phone call -- once connected, both sides talk freely. The reason you need to understand networking for system design is that the road system is always the bottleneck. The fastest database in the world is useless if the network between your service and the database adds 200ms of latency.

<Tip>
  **Practical tip**: When debugging slow API calls, always measure where time is actually spent. Use the decomposition: DNS lookup + TCP handshake + TLS handshake + time-to-first-byte + transfer time. In most system design interviews, the latency bottleneck is not the protocol -- it is geography (speed of light through fiber) or serialization. A senior engineer asks "where are the servers relative to the users?" before optimizing the protocol.
</Tip>

## The OSI Model (Simplified)

```
┌────────────────────────────────────────────────────────────────┐
│                      OSI Model (7 Layers)                       │
├─────────┬──────────────────────────────────────────────────────┤
│ Layer 7 │ Application  │ HTTP, HTTPS, WebSocket, gRPC, DNS    │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 6 │ Presentation │ SSL/TLS, Encryption, Compression     │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 5 │ Session      │ Session management, Authentication   │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 4 │ Transport    │ TCP, UDP                              │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 3 │ Network      │ IP, Routing                           │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 2 │ Data Link    │ Ethernet, MAC addresses              │
├─────────┼──────────────────────────────────────────────────────┤
│ Layer 1 │ Physical     │ Cables, Radio waves                  │
└─────────┴──────────────────────────────────────────────────────┘

For system design, focus on Layers 4 and 7
```

## DNS (Domain Name System)

DNS translates human-readable domain names to IP addresses.

### DNS Resolution Flow

```
User types: www.example.com
              │
              ▼
    ┌──────────────────┐
    │  Browser Cache   │ ← Check local cache first
    └────────┬─────────┘
             │ Cache miss
             ▼
    ┌──────────────────┐
    │    OS Cache      │ ← Check OS DNS cache
    └────────┬─────────┘
             │ Cache miss
             ▼
    ┌──────────────────┐
    │ Recursive DNS    │ ← ISP's DNS resolver
    │    Resolver      │
    └────────┬─────────┘
             │
   ┌─────────┴─────────┐
   ▼                   ▼
┌──────┐         ┌──────────┐         ┌──────────┐
│ Root │────────►│   TLD    │────────►│Authorit- │
│Server│         │(.com,.io)│         │ative DNS │
└──────┘         └──────────┘         └──────────┘
   │                  │                    │
   │ "Ask .com TLD"   │ "Ask ns.example"  │ "IP: 93.184.216.34"
   └──────────────────┴────────────────────┘
```

### DNS Record Types

| Record    | Purpose                  | Example                             |
| --------- | ------------------------ | ----------------------------------- |
| **A**     | Maps domain to IPv4      | `example.com → 93.184.216.34`       |
| **AAAA**  | Maps domain to IPv6      | `example.com → 2606:2800:220:1:...` |
| **CNAME** | Alias to another domain  | `www.example.com → example.com`     |
| **MX**    | Mail server              | `example.com → mail.example.com`    |
| **TXT**   | Text data (verification) | SPF, DKIM records                   |
| **NS**    | Nameserver               | `example.com → ns1.provider.com`    |

### DNS in System Design

```
┌─────────────────────────────────────────────────────────────────┐
│                    DNS-Based Load Balancing                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   DNS Query: api.example.com                                    │
│        │                                                        │
│        ▼                                                        │
│   ┌──────────────────────────────────────┐                     │
│   │         DNS Round Robin               │                     │
│   │   Returns different IPs each time     │                     │
│   └──────────────────────────────────────┘                     │
│        │                                                        │
│   ┌────┴────┬────────────────┐                                 │
│   ▼         ▼                ▼                                 │
│ Server 1  Server 2        Server 3                             │
│ 1.1.1.1   2.2.2.2         3.3.3.3                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pros: Simple, no extra infrastructure
Cons: No health checks, TTL caching delays failover
```

<Warning>
  **DNS TTL Matters**: Low TTL (60s) = faster failover, more DNS queries. High TTL (3600s) = better caching, slower failover. Common: 300s (5 min)
</Warning>

## TCP vs UDP

### TCP (Transmission Control Protocol)

```
┌─────────────────────────────────────────────────────────────────┐
│                    TCP Three-Way Handshake                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│     Client                              Server                  │
│        │                                   │                    │
│        │──────── SYN (seq=x) ─────────────►│                   │
│        │                                   │                    │
│        │◄────── SYN-ACK (seq=y, ack=x+1) ──│                   │
│        │                                   │                    │
│        │──────── ACK (ack=y+1) ────────────►│                   │
│        │                                   │                    │
│        │◄═══════ Connection Established ═══►│                   │
│        │                                   │                    │
│        │←───────── Data Transfer ──────────→│                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### TCP vs UDP Comparison

| Feature         | TCP                 | UDP                          |
| --------------- | ------------------- | ---------------------------- |
| **Connection**  | Connection-oriented | Connectionless               |
| **Reliability** | Guaranteed delivery | Best effort                  |
| **Ordering**    | In-order delivery   | No ordering                  |
| **Speed**       | Slower (overhead)   | Faster                       |
| **Use Case**    | HTTP, databases     | Video streaming, gaming, DNS |
| **Header Size** | 20-60 bytes         | 8 bytes                      |

### When to Use What

<CardGroup cols={2}>
  <Card title="Use TCP" icon="shield">
    * Web applications (HTTP/HTTPS)
    * File transfers
    * Database connections
    * Email (SMTP, IMAP)
    * When data integrity matters
  </Card>

  <Card title="Use UDP" icon="bolt">
    * Live video/audio streaming
    * Online gaming
    * DNS queries
    * IoT sensors
    * When speed > reliability
  </Card>
</CardGroup>

## HTTP/HTTPS

### HTTP Request/Response

```
┌─────────────────────────────────────────────────────────────────┐
│                    HTTP Request                                 │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  GET /api/users/123 HTTP/1.1                                   │
│  Host: api.example.com                                          │
│  Authorization: Bearer eyJhbGciOiJIUzI1...                     │
│  Content-Type: application/json                                 │
│  Accept: application/json                                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    HTTP Response                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  HTTP/1.1 200 OK                                                │
│  Content-Type: application/json                                 │
│  Cache-Control: max-age=3600                                    │
│  X-RateLimit-Remaining: 99                                      │
│                                                                 │
│  {                                                              │
│    "id": 123,                                                   │
│    "name": "John Doe",                                          │
│    "email": "john@example.com"                                  │
│  }                                                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### HTTP Methods

| Method      | Purpose             | Idempotent | Safe |
| ----------- | ------------------- | ---------- | ---- |
| **GET**     | Retrieve resource   | Yes        | Yes  |
| **POST**    | Create resource     | No         | No   |
| **PUT**     | Replace resource    | Yes        | No   |
| **PATCH**   | Partial update      | No         | No   |
| **DELETE**  | Remove resource     | Yes        | No   |
| **HEAD**    | Get headers only    | Yes        | Yes  |
| **OPTIONS** | Get allowed methods | Yes        | Yes  |

### HTTP Status Codes

```
┌───────────────────────────────────────────────────────────────┐
│                    HTTP Status Codes                          │
├───────────────────────────────────────────────────────────────┤
│                                                               │
│  1xx Informational  │  100 Continue, 101 Switching Protocols │
│  ───────────────────┼───────────────────────────────────────│
│  2xx Success        │  200 OK                                │
│                     │  201 Created                           │
│                     │  204 No Content                        │
│  ───────────────────┼───────────────────────────────────────│
│  3xx Redirection    │  301 Moved Permanently                 │
│                     │  302 Found (temporary)                 │
│                     │  304 Not Modified                      │
│  ───────────────────┼───────────────────────────────────────│
│  4xx Client Error   │  400 Bad Request                       │
│                     │  401 Unauthorized                      │
│                     │  403 Forbidden                         │
│                     │  404 Not Found                         │
│                     │  429 Too Many Requests                 │
│  ───────────────────┼───────────────────────────────────────│
│  5xx Server Error   │  500 Internal Server Error             │
│                     │  502 Bad Gateway                       │
│                     │  503 Service Unavailable               │
│                     │  504 Gateway Timeout                   │
│                                                               │
└───────────────────────────────────────────────────────────────┘
```

### HTTP/1.1 vs HTTP/2 vs HTTP/3

```
HTTP/1.1                    HTTP/2                     HTTP/3
┌─────────────────┐        ┌─────────────────┐       ┌─────────────────┐
│ Request 1       │        │ ┌───┬───┬───┐   │       │ ┌───┬───┬───┐   │
│ ─────────────── │        │ │ 1 │ 2 │ 3 │   │       │ │ 1 │ 2 │ 3 │   │
│ Response 1      │        │ └───┴───┴───┘   │       │ └───┴───┴───┘   │
│ ═══════════════ │        │  Multiplexed    │       │  Over QUIC     │
│ Request 2       │        │  Binary frames  │       │  (UDP based)    │
│ ─────────────── │        │  on single TCP  │       │  0-RTT resume   │
│ Response 2      │        │                 │       │                 │
│ ═══════════════ │        │ Server Push     │       │ No head-of-line │
│ (Sequential)    │        │ Header Compress │       │  blocking       │
└─────────────────┘        └─────────────────┘       └─────────────────┘

6 connections max          Single connection         Faster, resilient
Head-of-line blocking      per origin                to network changes
```

### HTTPS/TLS Handshake

```
┌─────────────────────────────────────────────────────────────────┐
│                    TLS 1.3 Handshake                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Client                                         Server          │
│     │                                              │            │
│     │──── ClientHello + Key Share ────────────────►│           │
│     │     (Supported ciphers, client random)       │            │
│     │                                              │            │
│     │◄─── ServerHello + Key Share + Certificate ──│           │
│     │     (Selected cipher, server random, cert)   │            │
│     │                                              │            │
│     │     [Both compute shared secret]             │            │
│     │                                              │            │
│     │◄════════ Encrypted Application Data ════════►│           │
│     │                                              │            │
│                                                                 │
│  TLS 1.3: 1-RTT handshake (down from 2-RTT in TLS 1.2)        │
│  0-RTT resumption for returning clients                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## WebSockets

### WebSocket vs HTTP

```
HTTP (Request-Response)              WebSocket (Bidirectional)

Client          Server               Client          Server
   │──── GET ─────►│                    │──── Upgrade ──►│
   │◄─── 200 ──────│                    │◄─── 101 ───────│
   │               │                    │                │
   │──── GET ─────►│                    │◄══════════════►│
   │◄─── 200 ──────│                    │  Full-duplex   │
   │               │                    │  connection    │
   │ (Poll again)  │                    │                │
   │──── GET ─────►│                    │◄══════════════►│
   │◄─── 200 ──────│                    │                │

Each request = new TCP     Single persistent connection
connection overhead        Low latency, real-time
```

### WebSocket Use Cases

<CardGroup cols={2}>
  <Card title="Real-time Chat" icon="comments">
    WhatsApp, Slack, Discord - instant message delivery
  </Card>

  <Card title="Live Updates" icon="chart-line">
    Stock prices, sports scores, notifications
  </Card>

  <Card title="Gaming" icon="gamepad">
    Multiplayer games, real-time player positions
  </Card>

  <Card title="Collaboration" icon="users">
    Google Docs, Figma - live editing
  </Card>
</CardGroup>

### WebSocket Scaling Challenge

<img src="https://mintcdn.com/devweeekends/2f8Rfaato9LS1FSq/images/system-design/websocket-scaling.svg?fit=max&auto=format&n=2f8Rfaato9LS1FSq&q=85&s=a94d1d75328ed3858b62c328d4c6dd9d" alt="WebSocket Scaling with Pub/Sub" width="800" height="500" data-path="images/system-design/websocket-scaling.svg" />

```
┌─────────────────────────────────────────────────────────────────┐
│                WebSocket Scaling with Pub/Sub                   │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│     User A                                       User B         │
│        │                                            │           │
│        │ WS                                      WS │           │
│        ▼                                            ▼           │
│   ┌─────────┐                                  ┌─────────┐     │
│   │Server 1 │                                  │Server 2 │     │
│   └────┬────┘                                  └────┬────┘     │
│        │                                            │           │
│        └──────────────┬─────────────────────────────┘           │
│                       │                                         │
│                ┌──────▼──────┐                                 │
│                │    Redis    │  Pub/Sub for cross-server       │
│                │   Pub/Sub   │  message broadcasting           │
│                └─────────────┘                                 │
│                                                                 │
│  Problem: User A on Server 1 messages User B on Server 2       │
│  Solution: Publish to Redis, all servers subscribe             │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

## WebSocket Implementation

Production-ready WebSocket server with connection management:

<Tabs>
  <Tab title="Python">
    ```python theme={null}
    import asyncio
    import json
    from typing import Dict, Set, Optional, Any
    from dataclasses import dataclass, field
    from datetime import datetime
    import uuid
    import aioredis
    from fastapi import FastAPI, WebSocket, WebSocketDisconnect
    from contextlib import asynccontextmanager

    @dataclass
    class Connection:
        """Represents a WebSocket connection"""
        websocket: WebSocket
        user_id: str
        connection_id: str = field(default_factory=lambda: str(uuid.uuid4()))
        connected_at: datetime = field(default_factory=datetime.utcnow)
        subscriptions: Set[str] = field(default_factory=set)
        metadata: Dict[str, Any] = field(default_factory=dict)

    class ConnectionManager:
        """
        Production WebSocket connection manager with Redis pub/sub
        for multi-server deployments
        """
        
        def __init__(self, redis_url: str = "redis://localhost"):
            self.redis_url = redis_url
            self.connections: Dict[str, Connection] = {}
            self.user_connections: Dict[str, Set[str]] = {}
            self.room_connections: Dict[str, Set[str]] = {}
            self.redis: Optional[aioredis.Redis] = None
            self.pubsub: Optional[aioredis.client.PubSub] = None
        
        async def initialize(self) -> None:
            """Initialize Redis connection for pub/sub"""
            self.redis = await aioredis.from_url(self.redis_url)
            self.pubsub = self.redis.pubsub()
            
            # Start listening for broadcast messages
            asyncio.create_task(self._redis_listener())
        
        async def connect(
            self, 
            websocket: WebSocket, 
            user_id: str,
            metadata: Optional[Dict] = None
        ) -> Connection:
            """Accept and register a new WebSocket connection"""
            await websocket.accept()
            
            connection = Connection(
                websocket=websocket,
                user_id=user_id,
                metadata=metadata or {}
            )
            
            # Register connection
            self.connections[connection.connection_id] = connection
            
            if user_id not in self.user_connections:
                self.user_connections[user_id] = set()
            self.user_connections[user_id].add(connection.connection_id)
            
            # Notify user came online
            await self._publish_presence(user_id, "online")
            
            return connection
        
        async def disconnect(self, connection: Connection) -> None:
            """Clean up a disconnected WebSocket"""
            conn_id = connection.connection_id
            user_id = connection.user_id
            
            # Remove from rooms
            for room in connection.subscriptions:
                if room in self.room_connections:
                    self.room_connections[room].discard(conn_id)
            
            # Remove from user connections
            if user_id in self.user_connections:
                self.user_connections[user_id].discard(conn_id)
                
                # If no more connections, user is offline
                if not self.user_connections[user_id]:
                    del self.user_connections[user_id]
                    await self._publish_presence(user_id, "offline")
            
            # Remove connection
            del self.connections[conn_id]
        
        async def join_room(self, connection: Connection, room: str) -> None:
            """Subscribe connection to a room"""
            connection.subscriptions.add(room)
            
            if room not in self.room_connections:
                self.room_connections[room] = set()
                # Subscribe to Redis channel for this room
                await self.pubsub.subscribe(f"room:{room}")
            
            self.room_connections[room].add(connection.connection_id)
        
        async def leave_room(self, connection: Connection, room: str) -> None:
            """Unsubscribe connection from a room"""
            connection.subscriptions.discard(room)
            
            if room in self.room_connections:
                self.room_connections[room].discard(connection.connection_id)
        
        async def send_to_user(self, user_id: str, message: Dict) -> int:
            """Send message to all connections of a user"""
            sent = 0
            
            # Local connections
            if user_id in self.user_connections:
                for conn_id in self.user_connections[user_id]:
                    if conn_id in self.connections:
                        try:
                            await self.connections[conn_id].websocket.send_json(message)
                            sent += 1
                        except Exception:
                            pass
            
            # Broadcast to other servers via Redis
            await self.redis.publish(
                f"user:{user_id}",
                json.dumps(message)
            )
            
            return sent
        
        async def broadcast_to_room(self, room: str, message: Dict) -> int:
            """Broadcast message to all connections in a room"""
            sent = 0
            
            # Local connections
            if room in self.room_connections:
                for conn_id in self.room_connections[room]:
                    if conn_id in self.connections:
                        try:
                            await self.connections[conn_id].websocket.send_json(message)
                            sent += 1
                        except Exception:
                            pass
            
            # Broadcast to other servers via Redis
            await self.redis.publish(
                f"room:{room}",
                json.dumps(message)
            )
            
            return sent
        
        async def _redis_listener(self) -> None:
            """Listen for messages from other servers"""
            async for message in self.pubsub.listen():
                if message['type'] != 'message':
                    continue
                
                channel = message['channel'].decode()
                data = json.loads(message['data'])
                
                if channel.startswith('room:'):
                    room = channel.split(':', 1)[1]
                    await self._local_broadcast_room(room, data)
                elif channel.startswith('user:'):
                    user_id = channel.split(':', 1)[1]
                    await self._local_send_user(user_id, data)
        
        async def _local_broadcast_room(self, room: str, message: Dict) -> None:
            """Broadcast to local room connections only"""
            if room in self.room_connections:
                for conn_id in self.room_connections[room]:
                    if conn_id in self.connections:
                        try:
                            await self.connections[conn_id].websocket.send_json(message)
                        except Exception:
                            pass
        
        async def _local_send_user(self, user_id: str, message: Dict) -> None:
            """Send to local user connections only"""
            if user_id in self.user_connections:
                for conn_id in self.user_connections[user_id]:
                    if conn_id in self.connections:
                        try:
                            await self.connections[conn_id].websocket.send_json(message)
                        except Exception:
                            pass
        
        async def _publish_presence(self, user_id: str, status: str) -> None:
            """Publish user presence change"""
            await self.redis.publish(
                "presence",
                json.dumps({"user_id": user_id, "status": status})
            )

    # FastAPI WebSocket endpoint
    app = FastAPI()
    manager = ConnectionManager()

    @app.on_event("startup")
    async def startup():
        await manager.initialize()

    @app.websocket("/ws/{user_id}")
    async def websocket_endpoint(websocket: WebSocket, user_id: str):
        connection = await manager.connect(websocket, user_id)
        
        try:
            while True:
                data = await websocket.receive_json()
                
                # Handle different message types
                msg_type = data.get("type")
                
                if msg_type == "join_room":
                    await manager.join_room(connection, data["room"])
                    
                elif msg_type == "leave_room":
                    await manager.leave_room(connection, data["room"])
                    
                elif msg_type == "room_message":
                    await manager.broadcast_to_room(
                        data["room"],
                        {
                            "type": "message",
                            "room": data["room"],
                            "from": user_id,
                            "content": data["content"],
                            "timestamp": datetime.utcnow().isoformat()
                        }
                    )
                    
                elif msg_type == "direct_message":
                    await manager.send_to_user(
                        data["to"],
                        {
                            "type": "direct_message",
                            "from": user_id,
                            "content": data["content"],
                            "timestamp": datetime.utcnow().isoformat()
                        }
                    )
                    
        except WebSocketDisconnect:
            await manager.disconnect(connection)
    ```
  </Tab>

  <Tab title="JavaScript">
    ```javascript theme={null}
    const WebSocket = require('ws');
    const Redis = require('ioredis');
    const { v4: uuidv4 } = require('uuid');

    class Connection {
      constructor(ws, userId, metadata = {}) {
        this.ws = ws;
        this.userId = userId;
        this.connectionId = uuidv4();
        this.connectedAt = new Date();
        this.subscriptions = new Set();
        this.metadata = metadata;
        this.isAlive = true;
      }

      send(message) {
        if (this.ws.readyState === WebSocket.OPEN) {
          this.ws.send(JSON.stringify(message));
          return true;
        }
        return false;
      }
    }

    class ConnectionManager {
      constructor(redisUrl = 'redis://localhost') {
        this.redis = new Redis(redisUrl);
        this.subscriber = new Redis(redisUrl);
        this.connections = new Map();
        this.userConnections = new Map();
        this.roomConnections = new Map();
        
        this.setupRedisSubscriber();
        this.startHeartbeat();
      }

      setupRedisSubscriber() {
        this.subscriber.on('message', (channel, message) => {
          const data = JSON.parse(message);
          
          if (channel.startsWith('room:')) {
            const room = channel.split(':')[1];
            this.localBroadcastRoom(room, data);
          } else if (channel.startsWith('user:')) {
            const userId = channel.split(':')[1];
            this.localSendUser(userId, data);
          }
        });
      }

      startHeartbeat() {
        setInterval(() => {
          for (const [connId, conn] of this.connections) {
            if (!conn.isAlive) {
              this.disconnect(conn);
              continue;
            }
            conn.isAlive = false;
            conn.ws.ping();
          }
        }, 30000);
      }

      async connect(ws, userId, metadata = {}) {
        const connection = new Connection(ws, userId, metadata);

        // Setup event handlers
        ws.on('pong', () => {
          connection.isAlive = true;
        });

        // Register connection
        this.connections.set(connection.connectionId, connection);

        if (!this.userConnections.has(userId)) {
          this.userConnections.set(userId, new Set());
        }
        this.userConnections.get(userId).add(connection.connectionId);

        // Publish presence
        await this.publishPresence(userId, 'online');

        return connection;
      }

      async disconnect(connection) {
        const { connectionId, userId, subscriptions } = connection;

        // Remove from rooms
        for (const room of subscriptions) {
          if (this.roomConnections.has(room)) {
            this.roomConnections.get(room).delete(connectionId);
          }
        }

        // Remove from user connections
        if (this.userConnections.has(userId)) {
          const userConns = this.userConnections.get(userId);
          userConns.delete(connectionId);

          if (userConns.size === 0) {
            this.userConnections.delete(userId);
            await this.publishPresence(userId, 'offline');
          }
        }

        // Close WebSocket
        try {
          connection.ws.terminate();
        } catch (e) {}

        this.connections.delete(connectionId);
      }

      async joinRoom(connection, room) {
        connection.subscriptions.add(room);

        if (!this.roomConnections.has(room)) {
          this.roomConnections.set(room, new Set());
          await this.subscriber.subscribe(`room:${room}`);
        }

        this.roomConnections.get(room).add(connection.connectionId);
      }

      async leaveRoom(connection, room) {
        connection.subscriptions.delete(room);

        if (this.roomConnections.has(room)) {
          this.roomConnections.get(room).delete(connection.connectionId);
        }
      }

      async sendToUser(userId, message) {
        let sent = 0;

        // Local connections
        if (this.userConnections.has(userId)) {
          for (const connId of this.userConnections.get(userId)) {
            const conn = this.connections.get(connId);
            if (conn && conn.send(message)) {
              sent++;
            }
          }
        }

        // Broadcast to other servers
        await this.redis.publish(`user:${userId}`, JSON.stringify(message));

        return sent;
      }

      async broadcastToRoom(room, message) {
        let sent = 0;

        // Local connections
        if (this.roomConnections.has(room)) {
          for (const connId of this.roomConnections.get(room)) {
            const conn = this.connections.get(connId);
            if (conn && conn.send(message)) {
              sent++;
            }
          }
        }

        // Broadcast to other servers
        await this.redis.publish(`room:${room}`, JSON.stringify(message));

        return sent;
      }

      localBroadcastRoom(room, message) {
        if (this.roomConnections.has(room)) {
          for (const connId of this.roomConnections.get(room)) {
            const conn = this.connections.get(connId);
            if (conn) conn.send(message);
          }
        }
      }

      localSendUser(userId, message) {
        if (this.userConnections.has(userId)) {
          for (const connId of this.userConnections.get(userId)) {
            const conn = this.connections.get(connId);
            if (conn) conn.send(message);
          }
        }
      }

      async publishPresence(userId, status) {
        await this.redis.publish('presence', JSON.stringify({
          userId,
          status,
          timestamp: new Date().toISOString()
        }));
      }
    }

    // Express + WS Server
    const express = require('express');
    const http = require('http');
    const url = require('url');

    const app = express();
    const server = http.createServer(app);
    const wss = new WebSocket.Server({ noServer: true });
    const manager = new ConnectionManager();

    server.on('upgrade', (request, socket, head) => {
      const pathname = url.parse(request.url).pathname;
      const match = pathname.match(/^\/ws\/(.+)$/);

      if (match) {
        const userId = match[1];
        
        wss.handleUpgrade(request, socket, head, async (ws) => {
          const connection = await manager.connect(ws, userId);

          ws.on('message', async (data) => {
            try {
              const message = JSON.parse(data);
              await handleMessage(connection, message);
            } catch (e) {
              console.error('Message error:', e);
            }
          });

          ws.on('close', () => {
            manager.disconnect(connection);
          });

          ws.on('error', (error) => {
            console.error('WebSocket error:', error);
            manager.disconnect(connection);
          });
        });
      } else {
        socket.destroy();
      }
    });

    async function handleMessage(connection, message) {
      const { type, room, to, content } = message;

      switch (type) {
        case 'join_room':
          await manager.joinRoom(connection, room);
          break;

        case 'leave_room':
          await manager.leaveRoom(connection, room);
          break;

        case 'room_message':
          await manager.broadcastToRoom(room, {
            type: 'message',
            room,
            from: connection.userId,
            content,
            timestamp: new Date().toISOString()
          });
          break;

        case 'direct_message':
          await manager.sendToUser(to, {
            type: 'direct_message',
            from: connection.userId,
            content,
            timestamp: new Date().toISOString()
          });
          break;
      }
    }

    server.listen(3000, () => {
      console.log('WebSocket server running on port 3000');
    });

    module.exports = { ConnectionManager, Connection };
    ```
  </Tab>
</Tabs>

## gRPC

### gRPC vs REST

| Feature       | REST               | gRPC                    |
| ------------- | ------------------ | ----------------------- |
| **Protocol**  | HTTP/1.1 or HTTP/2 | HTTP/2                  |
| **Payload**   | JSON (text)        | Protobuf (binary)       |
| **Contract**  | OpenAPI (optional) | .proto files (required) |
| **Streaming** | Limited            | Bidirectional streaming |
| **Browser**   | Native support     | Requires gRPC-Web       |
| **Code Gen**  | Optional           | Built-in                |
| **Speed**     | Slower             | 10x faster              |

### gRPC Communication Patterns

```
┌─────────────────────────────────────────────────────────────────┐
│                    gRPC Patterns                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. Unary (Request-Response)                                   │
│     Client ────── Request ──────► Server                       │
│     Client ◄───── Response ─────── Server                       │
│                                                                 │
│  2. Server Streaming                                            │
│     Client ────── Request ──────► Server                       │
│     Client ◄═══ Stream of data ══ Server                       │
│                                                                 │
│  3. Client Streaming                                            │
│     Client ═══ Stream of data ══► Server                       │
│     Client ◄───── Response ─────── Server                       │
│                                                                 │
│  4. Bidirectional Streaming                                     │
│     Client ◄═══════════════════► Server                        │
│             Both stream freely                                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### When to Use gRPC

<CardGroup cols={2}>
  <Card title="Use gRPC" icon="check">
    * Microservices communication
    * Low latency requirements
    * Strong typing needed
    * Streaming data
    * Internal services
  </Card>

  <Card title="Avoid gRPC" icon="xmark">
    * Public APIs (browser clients)
    * Simple CRUD operations
    * Team unfamiliar with Protobuf
    * Debugging ease is priority
  </Card>
</CardGroup>

## Long Polling vs SSE vs WebSocket

```
┌───────────────────────────────────────────────────────────────────────┐
│                    Real-Time Communication Options                     │
├───────────────────┬───────────────────┬───────────────────────────────┤
│    Long Polling   │       SSE         │        WebSocket              │
├───────────────────┼───────────────────┼───────────────────────────────┤
│                   │                   │                               │
│  Client ── GET ──►│  Client ── GET ──►│  Client ── Upgrade ──►        │
│  Server holds...  │  Server streams   │  Bidirectional                │
│  Until data ready │  event: update    │                               │
│  Client ◄─ data ─ │  data: {...}      │  Client ◄═══════════► Server │
│  Repeat           │  data: {...}      │                               │
│                   │  (one direction)  │                               │
│                   │                   │                               │
├───────────────────┼───────────────────┼───────────────────────────────┤
│ Many TCP connects │ One TCP, server→  │ One TCP, both ways            │
│ HTTP compatible   │ HTTP compatible   │ Different protocol            │
│ Simple            │ Auto-reconnect    │ Most flexible                 │
│ Higher latency    │ Medium latency    │ Lowest latency                │
│                   │                   │                               │
│ Use: Fallback     │ Use: Notifications│ Use: Chat, gaming             │
│ for legacy        │ Live feeds        │ Collaboration                 │
└───────────────────┴───────────────────┴───────────────────────────────┘
```

## Network Latency Budget

### Where Time Goes

```
┌─────────────────────────────────────────────────────────────────┐
│                Request Latency Breakdown                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  DNS Lookup:           1-50ms     (cached: <1ms)               │
│  TCP Handshake:        1 RTT     (50-150ms cross-continent)   │
│  TLS Handshake:        1-2 RTT   (50-300ms)                    │
│  Request Transfer:     Varies    (size / bandwidth)            │
│  Server Processing:    Varies    (your code)                   │
│  Response Transfer:    Varies    (size / bandwidth)            │
│                                                                 │
│  Example: USA → Europe API call                                 │
│  ─────────────────────────────────                              │
│  DNS:         5ms   (cached)                                   │
│  TCP:        75ms   (1 RTT)                                    │
│  TLS:       150ms   (2 RTT)                                    │
│  Request:    10ms   (small payload)                            │
│  Server:     50ms   (DB query + processing)                    │
│  Response:   20ms   (JSON response)                            │
│  ─────────────────────────────────                              │
│  TOTAL:     310ms                                               │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

### Optimization Strategies

| Optimization       | Latency Saved | Trade-off                |
| ------------------ | ------------- | ------------------------ |
| **CDN**            | 100-200ms     | Cost, cache invalidation |
| **Keep-Alive**     | 150-300ms     | Connection limits        |
| **HTTP/2**         | Variable      | Server support needed    |
| **Compression**    | 10-50ms       | CPU overhead             |
| **Edge Computing** | 100-200ms     | Complexity               |
| **DNS Prefetch**   | 50ms          | Additional requests      |

<Tip>
  **Interview Tip**: When discussing latency, mention geographic distribution. "Users in Singapore accessing servers in US-East will have \~200ms RTT just from physics."
</Tip>

## Key Takeaways

| Concept        | Remember                                                     |
| -------------- | ------------------------------------------------------------ |
| **DNS**        | First hop, cache TTLs matter, can be used for load balancing |
| **TCP vs UDP** | TCP = reliable, UDP = fast; choose based on use case         |
| **HTTP/2**     | Multiplexing, server push, header compression                |
| **WebSocket**  | Real-time bidirectional, needs pub/sub for scaling           |
| **gRPC**       | Fast binary protocol, great for microservices                |
| **Latency**    | Minimize RTTs, use CDNs, keep connections alive              |

## Interview Deep-Dive Questions

<AccordionGroup>
  <Accordion title="Q1: A user reports that your web application takes 3 seconds to load the first page, but subsequent pages are fast. Walk me through every network-level factor contributing to that first-page latency and how you would reduce it." icon="clock">
    **What the interviewer is really testing:** Whether you understand the full request lifecycle from browser to server and back -- DNS, TCP, TLS, HTTP -- and can systematically identify optimization opportunities at each layer.

    **Strong Answer:**

    * The 3-second first-page load breaks down into sequential network costs that only apply to the first request. Subsequent pages are fast because connections are reused and resources are cached. Let me walk through each phase.
    * DNS resolution: if the browser has no cache entry for your domain, it goes through the recursive resolver chain (browser cache, OS cache, ISP resolver, root servers, TLD servers, authoritative server). This can take 50-200ms depending on geography and cache state. Fix: use DNS prefetching (`dns-prefetch` link header), set reasonable TTLs on DNS records (300-600 seconds is typical -- too low means frequent lookups, too high means slow failover).
    * TCP handshake: one round trip (SYN, SYN-ACK, ACK). For a user 100ms away from the server, that is 100ms. Fix: use a CDN so the TCP connection terminates at a nearby edge node rather than the origin server. Consider TCP Fast Open (TFO) which allows data in the SYN packet on subsequent connections.
    * TLS handshake: TLS 1.2 requires two additional round trips (200ms for a 100ms RTT user). TLS 1.3 reduces this to one round trip, and 0-RTT resumption eliminates it entirely for returning visitors. Fix: upgrade to TLS 1.3, enable session resumption, use OCSP stapling to avoid the client making a separate request to check certificate revocation.
    * HTTP request and response: the actual data transfer. If the page requires multiple resources (HTML, CSS, JavaScript, images), HTTP/1.1 loads them sequentially per connection (browsers open 6 parallel connections, but that is still a bottleneck). HTTP/2 multiplexes all requests over a single connection, eliminating head-of-line blocking at the HTTP layer. HTTP/3 (QUIC) goes further by eliminating head-of-line blocking at the transport layer.
    * Server processing: Time-to-first-byte (TTFB) depends on server-side processing. If the server needs to query a database, render a template, and assemble the response, that adds latency. Fix: server-side caching, precomputed pages for common routes, edge-side rendering.
    * Total optimization: DNS (50ms saved with prefetch) + TCP (100ms saved with CDN) + TLS (100ms saved with TLS 1.3) + HTTP multiplexing (200ms saved with HTTP/2) + server-side caching (500ms saved with CDN cache hit) can bring the 3-second load down to under 500ms.
    * **Example:** Cloudflare's performance measurements show that switching from TLS 1.2 to TLS 1.3 saves one full RTT per new connection. For users in Australia connecting to US servers (200ms RTT), that is a 200ms improvement on every first visit. Combined with their CDN edge nodes in Sydney, the TCP+TLS cost drops from 600ms to under 50ms.

    **Follow-up: The site uses HTTP/2 and a CDN, but mobile users in India still report slow loads. Desktop users in the same region are fine. What networking factors specific to mobile could explain this?**

    Mobile networks add several latency sources: (1) Radio resource allocation -- on LTE/5G, the device must negotiate a radio channel before any data can flow, adding 50-100ms. (2) Higher RTTs on cellular networks -- typical LTE RTT is 30-50ms even to nearby towers, versus 5-10ms for wired broadband. (3) TCP slow start interacts badly with mobile -- high RTT means the congestion window grows slowly, so large resources take many round trips to fully transfer. (4) Packet loss on mobile is higher, causing TCP retransmissions. Fix: aggressive resource compression, smaller initial page payloads (aim for under 14KB to fit in the first TCP congestion window), lazy loading of non-critical resources, and consider QUIC/HTTP3 which handles packet loss better than TCP because it avoids head-of-line blocking across streams.
  </Accordion>

  <Accordion title="Q2: Your microservices architecture currently uses REST for all inter-service communication. The team is evaluating gRPC. When would you recommend the switch, and what are the operational costs people underestimate?" icon="plug">
    **What the interviewer is really testing:** Whether you can make nuanced protocol decisions based on actual requirements rather than hype, and whether you understand the operational implications beyond raw performance.

    **Strong Answer:**

    * gRPC is not universally better than REST -- it is better for specific use cases, and it comes with operational costs that are easy to underestimate. The decision should be driven by concrete pain points, not benchmarks.
    * When gRPC makes sense: (1) High-throughput internal service communication where the protobuf binary encoding saves significant bandwidth (a 1KB JSON payload might be 300 bytes in protobuf -- at millions of requests per second, that bandwidth savings is real). (2) Strict API contracts are needed -- protobuf schemas enforce types at compile time, catching breaking changes before deployment. (3) You need streaming (server-streaming, client-streaming, or bidirectional streaming) -- gRPC has first-class streaming support, while REST over HTTP/1.1 does not. (4) Latency-sensitive internal paths where JSON parsing overhead matters (protobuf deserialization is 2-10x faster than JSON parsing).
    * When REST is still the right choice: (1) Public-facing APIs -- browsers do not natively support gRPC (you need gRPC-Web or a proxy), and developer experience with REST is far more accessible. (2) Services that are called infrequently -- the performance difference is negligible at low volume. (3) Teams without protobuf experience -- the learning curve is real and affects velocity.
    * Operational costs people underestimate: (1) Debugging is harder -- binary protobuf payloads are not human-readable in packet captures or logs. You need tools like `grpcurl` or Postman's gRPC support. With REST, you can `curl` an endpoint and read the JSON response. (2) Load balancing is more complex -- gRPC uses HTTP/2 with long-lived connections. A standard L4 load balancer will route all requests from one connection to one backend. You need L7 (application-layer) load balancing that understands HTTP/2 frames, or client-side load balancing. (3) Schema evolution requires discipline -- adding a field to a protobuf message is backward-compatible, but removing or renumbering a field is a breaking change that can cause silent data corruption. (4) Monitoring and tracing middleware needs to understand gRPC status codes (which are different from HTTP status codes).
    * My recommendation for the team: introduce gRPC selectively on the highest-traffic internal paths first. Keep REST for public APIs and low-volume internal services. Run both protocols through the same service mesh so you get consistent observability regardless of protocol.
    * **Example:** Google uses gRPC internally for almost all service-to-service communication (it was built for this purpose), but their public APIs (Maps, Gmail, etc.) offer REST endpoints because developer adoption matters more than protocol efficiency for external consumers. Internally, they report that gRPC's streaming support was a bigger factor than raw performance in their adoption decision.

    **Follow-up: Your team adopts gRPC for the critical path between the API Gateway and the Order Service. Requests are being unevenly distributed -- one Order Service instance is getting 80% of traffic while three others are idle. What is happening?**

    This is the classic gRPC load balancing problem. gRPC uses HTTP/2, which multiplexes all requests over a single long-lived TCP connection. If the API Gateway opens one connection to each backend, the L4 load balancer assigned the connection to one backend and all subsequent requests flow through that same connection. Solutions: (1) Use L7 load balancing (Envoy, nginx with gRPC support) that can distribute individual gRPC requests across backends, not just connections. (2) Use client-side load balancing where the API Gateway maintains connections to all backends and round-robins requests itself (gRPC libraries support this natively with name resolvers). (3) If using Kubernetes, use a service mesh like Istio which handles per-request load balancing transparently.
  </Accordion>

  <Accordion title="Q3: Explain the TCP three-way handshake to me. Then tell me why it matters for system design, not just for a networking exam." icon="handshake">
    **What the interviewer is really testing:** Whether you understand networking fundamentals at a practical level and can connect low-level protocol behavior to high-level system design decisions.

    **Strong Answer:**

    * The three-way handshake establishes a TCP connection: (1) Client sends SYN with an initial sequence number. (2) Server responds with SYN-ACK, acknowledging the client's sequence number and providing its own. (3) Client sends ACK, acknowledging the server's sequence number. The connection is now established and data can flow.
    * This takes one round trip (the SYN goes out, SYN-ACK comes back, ACK goes out with or before the first data packet). For system design, this means every new TCP connection costs at minimum one RTT before any application data is exchanged.
    * Why this matters for system design: (1) Connection pooling is critical for microservices. If Service A calls Service B 1000 times per second and opens a new connection each time, you are paying 1000 handshakes per second. At 1ms RTT within a datacenter, that is tolerable but wasteful. At 100ms RTT across regions, that is 100 seconds of cumulative handshake time per second of operation -- a disaster. Connection pools reuse established connections, amortizing the handshake cost. (2) TCP slow start means even after the handshake, the connection starts with a small congestion window (typically 10 segments, or \~14KB). It takes multiple round trips to ramp up to full throughput. This is why large file downloads are slow at the beginning and why serving a 100KB response over a fresh connection takes longer than serving it over a warm connection. (3) Keep-alive connections (HTTP keep-alive, gRPC persistent connections) avoid repeated handshakes. The trade-off is that each open connection consumes memory on both client and server (kernel buffers, file descriptors). A server with 100K idle keep-alive connections can consume significant memory. (4) CDNs and edge proxies work partly by terminating TCP connections close to the user. The handshake RTT between the user and the CDN edge is 5ms instead of 150ms to the origin. The CDN can maintain a warm, pre-established connection pool to the origin.
    * UDP skips the handshake entirely, which is why DNS uses UDP for small queries (the entire query and response fit in one round trip), and why QUIC (the transport under HTTP/3) uses UDP with its own connection establishment that can be done in 0-RTT for returning visitors.
    * **Example:** When Cloudflare analyzed their traffic, they found that 40% of the latency for typical web requests was TCP and TLS handshake overhead. By moving their edge nodes closer to users and enabling TLS 1.3 with 0-RTT, they eliminated most of this overhead for repeat visitors. This is a direct system design implication of the three-way handshake cost.

    **Follow-up: A service behind your load balancer is running out of ephemeral ports and you see thousands of connections in TIME\_WAIT state. What is happening and how do you fix it?**

    TIME\_WAIT is a TCP state where a closed connection lingers for 2x the maximum segment lifetime (typically 60 seconds on Linux) to ensure delayed packets from the old connection do not corrupt a new connection on the same port. If a service opens and closes many short-lived connections rapidly (e.g., a microservice making thousands of HTTP calls per second without connection pooling), it exhausts the ephemeral port range (typically 28,232 ports on Linux). Fixes: (1) Use connection pooling -- this is the primary fix. Reuse connections instead of opening new ones. (2) Increase the ephemeral port range via `net.ipv4.ip_local_port_range`. (3) Enable `net.ipv4.tcp_tw_reuse` to allow reusing TIME\_WAIT sockets for new outbound connections (safe for client-initiated connections). (4) Never enable `tcp_tw_recycle` -- it breaks NAT and was removed from Linux 4.12. The root cause is almost always missing connection pooling.
  </Accordion>
</AccordionGroup>
