Module 5: Transport Layer

The Transport Layer (Layer 4) provides end-to-end communication services for applications. While Layer 3 (IP) gets packets to the right machine, Layer 4 gets data to the right application on that machine using port numbers, and optionally ensures reliability. Think of it this way: IP is the street address that gets the mail to the right building. The port number is the apartment number that gets it to the right resident. Port 80 is the “HTTP apartment,” port 443 is the “HTTPS apartment,” port 22 is the “SSH apartment.”

5.1 TCP vs UDP

The two main transport protocols represent a fundamental trade-off: reliability vs speed.

Feature	TCP (Transmission Control Protocol)	UDP (User Datagram Protocol)
Connection	Connection-oriented (Handshake)	Connectionless (Fire and Forget)
Reliability	Reliable (ACKs, Retransmission)	Unreliable (No ACKs)
Ordering	Ordered delivery	Unordered
Speed	Slower (Overhead)	Faster (Low overhead)
Header Size	20-60 bytes	8 bytes
Use Cases	Web (HTTP), Email (SMTP), File Transfer	Streaming, VoIP, Gaming, DNS

The analogy: TCP is like sending a package via certified mail. You get a tracking number, delivery confirmation, and if it gets lost, it is re-sent. UDP is like shouting across a room — fast, no setup needed, but if the other person does not hear you, tough luck. Why not always use TCP? Because for some applications, getting old data late is worse than not getting it at all. In a live video call, if a frame is lost, you do not want TCP to pause and retransmit it — by the time it arrives, the conversation has moved on. You would rather drop the frame and show the next one. That is why real-time media uses UDP.

Common misconception: “UDP is unreliable, so it is bad.” Not true. UDP is intentionally simple, which makes it perfect for applications that either handle reliability themselves (like DNS, which just resends the query if there is no response) or do not need it (like live streaming). Many modern protocols like QUIC (used by HTTP/3) are actually built on top of UDP, adding custom reliability where needed while avoiding TCP’s head-of-line blocking problem.

5.2 TCP Three-Way Handshake

To establish a connection, TCP uses a 3-step process. This is like a phone call: you dial (SYN), the other person says “hello?” (SYN-ACK), and you say “hi, it’s me” (ACK). Only then does the conversation begin.

SYN: Client sends a SYN (Synchronize) packet with an initial sequence number (ISN), e.g., seq=100. This says “I want to talk, and I will start numbering my bytes from 100.”
SYN-ACK: Server responds with SYN-ACK. It acknowledges the client’s sequence (ack=101, meaning “I got everything up to byte 100, send 101 next”) and provides its own sequence number (seq=300).
ACK: Client sends ACK (ack=301), confirming it received the server’s sequence number. The connection is now established and data can flow.

What the packets actually look like

Client (192.168.1.10:54321)              Server (93.184.216.34:443)
   │                                          │
   │─── SYN ─────────────────────────────────►│
   │    Flags: SYN                            │
   │    Seq: 100                              │
   │    Window: 65535                         │
   │                                          │
   │◄── SYN-ACK ─────────────────────────────│
   │    Flags: SYN, ACK                       │
   │    Seq: 300, Ack: 101                    │
   │    Window: 65535                         │
   │                                          │
   │─── ACK ─────────────────────────────────►│
   │    Flags: ACK                            │
   │    Seq: 101, Ack: 301                    │
   │                                          │
   │    Connection established. Data flows.   │

Why three steps and not two?

Two steps would be enough to establish one direction, but TCP is bidirectional. Both sides need to synchronize their sequence numbers. The three-way handshake allows both parties to agree on starting sequence numbers and confirm that the other side is listening. Without the final ACK, the server would not know if the client actually received its SYN-ACK.

SYN Flood Attack: An attacker sends thousands of SYN packets but never completes the handshake (never sends the final ACK). The server allocates resources for each half-open connection, eventually running out of memory. This is one of the oldest and most common DDoS attacks. Modern servers defend against it using SYN cookies — a technique where the server does not allocate resources until the handshake completes.

5.3 Flow & Congestion Control

These are two separate but related mechanisms. Confusing them is a common mistake.

Flow Control: Prevents the sender from overwhelming the receiver. Uses a receive window — the receiver advertises how much buffer space it has available. If the receiver says “my window is 16 KB,” the sender will not send more than 16 KB without getting an acknowledgment. Analogy: imagine pouring water into someone’s cupped hands. Flow control is them saying “slow down, my hands are almost full.”
Congestion Control: Prevents the sender from overwhelming the network (routers, links between them). Uses algorithms like Slow Start (begin with a small sending rate, double it each round-trip until loss is detected) and Congestion Avoidance (after detecting congestion, increase the rate slowly). Analogy: imagine merging onto a highway. You start slow, accelerate gradually, and if you see brake lights ahead (packet loss), you slow back down.

How they work together

Sending rate = min(Receiver Window, Congestion Window)

Receiver Window: "I can accept 64 KB right now"
Congestion Window: "The network can handle 32 KB right now"

Actual sending rate: 32 KB (the smaller of the two)

Why this matters in practice

On high-latency links (e.g., a connection from New York to Singapore with 200ms RTT), TCP’s slow start means it takes several round trips before the connection reaches full speed. This is why a fresh TCP connection feels slower than an established one, and why long-distance transfers often underutilize available bandwidth. Solutions like TCP BBR (Google’s congestion control algorithm) and increasing initial congestion window help mitigate this.

Troubleshooting slow transfers: If iperf3 shows your connection maxing out well below the link capacity, the culprit is often TCP windowing or congestion control — not the physical link. Check if the TCP window is scaling properly (ss -ti on Linux shows the congestion window) and whether there is packet loss on the path (mtr will reveal this).

Next Module

Module 6: Application Layer

HTTP, DNS, and more.

Interview Deep-Dive

When would you choose UDP over TCP? Walk me through a real scenario where TCP would actually be worse.

Strong Answer:

The core trade-off is reliability versus latency. TCP guarantees ordered, complete delivery through acknowledgments, retransmissions, and sequence numbers. UDP provides none of these — it is fire-and-forget with 8 bytes of header overhead versus TCP’s 20-60 bytes.
The classic example where TCP is worse is live video conferencing. If a video frame is lost, TCP would pause the stream, retransmit the lost packet, and then deliver everything in order. By the time the retransmitted frame arrives (one RTT later), the conversation has moved on. Displaying a 200ms-old frame is worse than just skipping it and showing the next one. UDP lets the application make that decision — drop the stale frame, render the next one, and the user sees a brief pixelation instead of a freeze.
DNS is another great example. A DNS query is a single small packet with a single response. If the response does not come back, the client simply retransmits the query after a short timeout. Building a TCP three-way handshake (3 packets) just to send one query packet would triple the latency for every DNS lookup. This is why DNS uses UDP on port 53 by default (though it falls back to TCP for responses larger than 512 bytes or when using DNSSEC).
The most interesting modern example is QUIC (HTTP/3), which is built on UDP. Google built QUIC because TCP has a head-of-line blocking problem: if one packet is lost, TCP stalls all data on the connection until the retransmission arrives. QUIC implements multiple independent streams over a single UDP connection, so a lost packet on one stream does not block the others. QUIC also integrates TLS 1.3 directly, reducing connection setup to a single round trip (zero for repeat connections).

Follow-up: You mentioned TCP’s head-of-line blocking. Can you explain that in more detail and why HTTP/2 made it worse?Head-of-line blocking in TCP happens because TCP guarantees in-order delivery of the byte stream. If bytes 1000-1500 are lost, TCP cannot deliver bytes 1501-3000 to the application even though they arrived fine — it must wait for the retransmission of 1000-1500 first. With HTTP/1.1, each request used a separate TCP connection, so a lost packet on one connection only blocked that one request. HTTP/2 multiplexes all requests onto a single TCP connection for efficiency. But now, a single lost packet stalls every multiplexed stream on that connection. At a 2% packet loss rate (common on mobile networks), HTTP/2 can actually perform worse than HTTP/1.1. QUIC solves this by implementing stream multiplexing above the UDP layer — each stream has its own sequence space, so a loss on stream A does not block stream B. The data structures and algorithms for reliability are similar to TCP, but QUIC applies them per-stream instead of per-connection.

Explain the TCP three-way handshake. Why three steps and not two? What would break if we skipped the final ACK?

Strong Answer:

The three-way handshake establishes a TCP connection: (1) the client sends SYN with its initial sequence number, (2) the server responds with SYN-ACK containing its own sequence number and acknowledging the client’s, (3) the client sends ACK confirming the server’s sequence number. After this, data flows bidirectionally.
Two steps would only synchronize one direction. With SYN and SYN-ACK alone, the server has told the client its sequence number, but the server has no confirmation that the client actually received the SYN-ACK. Without the final ACK, the server would not know if the client is ready to receive data. The server might start sending data that the client is not prepared for.
More critically, skipping the final ACK enables the SYN flood attack. The server allocates resources (memory for the connection state, an entry in the connection table) after sending SYN-ACK, then waits for the final ACK. If an attacker sends millions of SYNs but never completes the handshake, the server fills its connection table with half-open connections and eventually runs out of resources, denying service to legitimate clients.
The defense is SYN cookies, which is brilliant in its simplicity. Instead of allocating state after SYN-ACK, the server encodes the connection parameters into the sequence number of the SYN-ACK itself (using a cryptographic hash). When the final ACK arrives, the server can reconstruct the connection state from the acknowledged sequence number. If the ACK never comes, no resources were wasted.

Follow-up: What about TCP connection teardown? Why is it four steps (FIN, ACK, FIN, ACK) instead of three?TCP uses a four-way close because the connection is full-duplex — each direction is closed independently. When the client sends FIN, it is saying “I am done sending data.” The server ACKs this but may still have data to send. Only when the server is also done does it send its own FIN, which the client ACKs. This asymmetric close allows the server to finish transmitting any remaining data after the client has signaled it is done. In practice, the server’s ACK and FIN are often combined into a single packet (making it effectively three packets), but the protocol allows them to be separate. The client then enters a TIME_WAIT state (typically 2 x MSL, around 60 seconds) to ensure any delayed packets from this connection are not confused with a new connection on the same port. TIME_WAIT can cause issues on busy servers — if you see thousands of connections in TIME_WAIT, it means the server is opening and closing many short-lived connections. Solutions include enabling SO_REUSEADDR, connection pooling, or switching to keep-alive connections.

Explain the difference between flow control and congestion control. When debugging a slow file transfer, which one is more likely the culprit?

Strong Answer:

Flow control prevents the sender from overwhelming the receiver. The receiver advertises a receive window — how many bytes it can accept before its buffer fills up. If the receiver is slow (maybe a busy application not reading from the socket fast enough), it shrinks the window, telling the sender to slow down. This is a point-to-point mechanism between sender and receiver.
Congestion control prevents the sender from overwhelming the network — the routers and links between sender and receiver. It uses a congestion window that starts small (slow start) and grows exponentially until packet loss is detected, then grows linearly (congestion avoidance). Packet loss is the signal that some router in the path is dropping packets because its buffers are full.
The actual sending rate is the minimum of the two windows: min(receive window, congestion window). If the receive window is 64 KB but the congestion window is 32 KB, the sender sends at 32 KB per window.
For debugging a slow file transfer, congestion control is almost always the culprit, especially on high-latency links. On a path with 200ms RTT (like New York to Singapore), TCP’s slow start means it takes multiple round trips to ramp up to full speed. A fresh connection might start with a 10-segment initial window (about 14 KB), and even with exponential growth, it takes several seconds to fully utilize a 1 Gbps link. Tools like ss -ti on Linux show the current congestion window (cwnd) and whether the connection is window-limited. If iperf3 shows throughput well below link capacity, the problem is typically TCP windowing, not the physical link.

Follow-up: You mentioned TCP BBR. What is it and how does it differ from traditional congestion control algorithms like Cubic?Traditional algorithms like Cubic (the Linux default before BBR) are loss-based: they increase the sending rate until they detect packet loss, interpret loss as congestion, and back off. The problem is that modern networks have large buffers in routers. The connection fills these buffers before seeing loss, adding hundreds of milliseconds of latency (bufferbloat). You get full throughput but terrible latency. BBR (Bottleneck Bandwidth and Round-trip propagation time), developed by Google, is model-based instead of loss-based. It continuously measures the maximum bandwidth and minimum RTT of the path, then targets a sending rate that matches the bottleneck bandwidth without filling buffers. The result is near-optimal throughput with much lower latency. Google reported 2-25x throughput improvements on their backbone after deploying BBR, especially on lossy links where loss-based algorithms would back off aggressively. The trade-off is that BBR can be unfair to Cubic flows sharing the same bottleneck — BBR tends to capture more bandwidth. BBR v2 addresses some of these fairness concerns.

What is a TCP port, and what would happen if two applications on the same server try to listen on the same port?

Strong Answer:

A port is a 16-bit number (0-65535) that identifies a specific application or service on a machine. While the IP address gets the packet to the right machine, the port gets it to the right application. The combination of IP address and port is called a socket. A unique TCP connection is identified by the 4-tuple: source IP, source port, destination IP, destination port.
If two applications try to bind to the same port on the same interface, the second one gets a “Address already in use” error (EADDRINUSE). The OS kernel enforces that only one process can listen on a given IP:port combination at a time. You can see what is already bound with ss -tuln on Linux or netstat -an on Windows.
There is a nuance: a process can bind to a specific interface (e.g., 127.0.0.1:8080) while another binds to a different interface (e.g., 10.0.1.5:8080). These are different sockets and do not conflict. But if one process binds to 0.0.0.0:8080 (all interfaces), no other process can bind to port 8080 on any interface.
The SO_REUSEADDR socket option allows a process to bind to a port that is in TIME_WAIT state (after a previous connection closed). This is commonly used on servers to avoid the “address already in use” error during restarts. SO_REUSEPORT (Linux 3.9+) goes further, allowing multiple processes to bind to the same port simultaneously — the kernel distributes incoming connections across them. Nginx uses this for its worker process model.

Follow-up: What are ephemeral ports and why do they matter for firewall rules?Ephemeral ports are the temporary, high-numbered ports (typically 1024-65535, or 32768-60999 on modern Linux) that the OS assigns to the client side of a connection. When your browser connects to a web server on port 443, the browser’s source port might be 52431 — that is an ephemeral port assigned by the OS. The server’s response comes back to that ephemeral port. This matters critically for stateless firewalls and NACLs. If you allow inbound TCP on port 443 but forget to allow outbound on ephemeral ports (or vice versa for return traffic), the connection fails because the response packets are blocked. Stateful firewalls and security groups handle this automatically by tracking connection state and allowing return traffic. But NACLs in AWS are stateless, so you must explicitly allow ephemeral port ranges. This is one of the most common NACL misconfigurations I have encountered — people allow the well-known port but forget the ephemeral range for return traffic.

​Module 5: Transport Layer

​5.1 TCP vs UDP

​5.2 TCP Three-Way Handshake

​What the packets actually look like

​Why three steps and not two?

​5.3 Flow & Congestion Control

​How they work together

​Why this matters in practice

​Next Module

Module 6: Application Layer

​Interview Deep-Dive

Module 5: Transport Layer

5.1 TCP vs UDP

5.2 TCP Three-Way Handshake

What the packets actually look like

Why three steps and not two?

5.3 Flow & Congestion Control

How they work together

Why this matters in practice

Next Module

Interview Deep-Dive