> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Network Troubleshooting

> Master the essential tools and techniques for diagnosing and resolving network issues

# Module 14: Network Troubleshooting

When networks fail, you need systematic approaches and the right tools to diagnose issues. The difference between a junior and senior engineer is not that the senior knows more tools -- it is that the senior follows a systematic process instead of randomly trying things. This module covers the essential troubleshooting toolkit and, more importantly, the mental framework for diagnosing network issues efficiently.

Think of network troubleshooting like debugging a plumbing problem. You do not start by ripping open walls -- you start at the faucet (application layer) and work backward. Is the faucet open? Is there water pressure at the valve? Is the main line connected? Each test eliminates an entire category of problems. The OSI model gives you the same systematic, bottom-up (or top-down) approach for networks.

<Frame>
  <img src="https://mintcdn.com/devweeekends/X0Fp4X8lMl-ZftoO/images/courses/networking-mastery/troubleshooting-flow.svg?fit=max&auto=format&n=X0Fp4X8lMl-ZftoO&q=85&s=2473f13c734f3fc0bbb201ce4c223923" alt="Network Troubleshooting Flowchart" width="1080" height="1080" data-path="images/courses/networking-mastery/troubleshooting-flow.svg" />
</Frame>

<Info>
  **Estimated Time**: 3-4 hours\
  **Difficulty**: Intermediate\
  **Prerequisites**: All previous modules
</Info>

***

## 14.1 The Troubleshooting Mindset

### The OSI Troubleshooting Approach

Start from the bottom and work up. This is the golden rule of network troubleshooting. You cannot have a working Layer 7 (HTTP) if Layer 3 (IP routing) is broken, and Layer 3 cannot work if Layer 1 (the physical cable) is unplugged. Always rule out lower layers first.

```
Layer 7: Application    "Is the application configured correctly?"
Layer 6: Presentation   "Is data being encrypted/decoded properly?"
Layer 5: Session        "Is the session established?"
Layer 4: Transport      "Is the port open? Is TCP/UDP working?"
Layer 3: Network        "Can I reach the IP? Is routing correct?"
Layer 2: Data Link      "Is the MAC address reachable? VLAN correct?"
Layer 1: Physical       "Is the cable plugged in? Is there link light?"
```

**The most common mistake**: Jumping straight to Layer 7 ("maybe the application config is wrong") when the actual problem is at Layer 1 ("the cat chewed through the Ethernet cable"). Experienced engineers have learned this lesson the hard way, often after spending hours debugging application code when a quick `ping` would have revealed no network connectivity at all.

### Quick Diagnostic Checklist

<Steps>
  <Step title="Physical Layer">
    * Is the cable connected?
    * Is the link light on?
    * Is WiFi connected?
  </Step>

  <Step title="Network Layer">
    * Do I have an IP address?
    * Can I ping the gateway?
    * Can I ping external IPs?
  </Step>

  <Step title="DNS">
    * Can I resolve domain names?
    * Is DNS server reachable?
  </Step>

  <Step title="Application">
    * Is the port open?
    * Is the service running?
    * Are there firewall blocks?
  </Step>
</Steps>

***

## 14.2 Essential Network Tools

### ping - Test Basic Connectivity

The most basic tool. Tests if a host is reachable via ICMP.

```bash theme={null}
# Basic ping
ping google.com

# Specific count
ping -c 4 google.com       # Linux/Mac
ping -n 4 google.com       # Windows

# Continuous with timestamp
ping -c 100 -D google.com
```

**Output Analysis:**

```
PING google.com (142.250.190.46): 56 data bytes
64 bytes from 142.250.190.46: icmp_seq=0 ttl=117 time=12.3 ms
64 bytes from 142.250.190.46: icmp_seq=1 ttl=117 time=11.8 ms
64 bytes from 142.250.190.46: icmp_seq=2 ttl=117 time=14.2 ms

--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.8/12.8/14.2/1.0 ms
```

| Metric      | Meaning                            |
| ----------- | ---------------------------------- |
| ttl         | Time To Live - decrements each hop |
| time        | Round-trip latency                 |
| packet loss | % of packets that didn't return    |

<Warning>
  **Ping can be misleading!** Some hosts block ICMP (ping), so no response does not always mean the host is down. AWS Security Groups, for example, do not allow ICMP by default. A server could be perfectly healthy and serving HTTP traffic while ignoring all pings. Always follow up with a port-specific test (`nc -zv host 443` or `curl`) before concluding a host is down.

  Conversely, a successful ping does not mean your application works. Ping only proves Layer 3 (IP) connectivity. The application (Layer 7) could be crashed while the OS still responds to pings.
</Warning>

***

### traceroute / tracert - Trace the Path

Shows every hop between you and the destination.

```bash theme={null}
# Linux/Mac
traceroute google.com

# Windows
tracert google.com

# Using TCP instead of ICMP (bypasses ICMP blocks)
traceroute -T -p 443 google.com
```

**Output:**

```
traceroute to google.com (142.250.190.46), 30 hops max
 1  192.168.1.1 (192.168.1.1)  1.234 ms  1.123 ms  1.456 ms
 2  10.0.0.1 (10.0.0.1)  5.678 ms  5.432 ms  5.789 ms
 3  * * *                                             ← No response
 4  72.14.215.85 (72.14.215.85)  10.123 ms  9.876 ms
 5  142.250.190.46 (142.250.190.46)  12.345 ms  11.987 ms
```

**Interpreting Results:**

| Pattern                 | Meaning                                               |
| ----------------------- | ----------------------------------------------------- |
| `* * *`                 | Hop doesn't respond to probes (firewall/ICMP blocked) |
| High latency at one hop | Possible congestion at that point                     |
| Increasing latency      | Normal - each hop adds time                           |
| Sudden huge increase    | Potential bottleneck                                  |

<Tip>
  **Reading traceroute like a pro**: A single `* * *` hop in the middle is usually harmless -- many routers are configured not to respond to traceroute probes. But if every hop after a certain point shows `* * *` or the trace never completes, that is where the path is broken. Also, do not panic about high latency on one intermediate hop -- routers deprioritize ICMP responses, so the displayed time for that hop may look bad while actual forwarding is fine. What matters is whether latency increases *and stays high* for all subsequent hops.
</Tip>

***

### netstat / ss - View Network Connections

See what's connected to your machine.

```bash theme={null}
# Linux (ss is modern replacement for netstat)
ss -tuln                    # TCP/UDP listening ports
ss -tunp                    # Include process names
ss -s                       # Summary statistics

# Windows
netstat -an                 # All connections, numeric
netstat -ano                # Include process IDs
netstat -b                  # Show executable names (admin required)

# Mac
netstat -an | grep LISTEN   # Listening ports
lsof -i -P                  # Better alternative
```

**Common Flags:**

| Flag | Meaning                       |
| ---- | ----------------------------- |
| -t   | TCP connections               |
| -u   | UDP connections               |
| -l   | Listening sockets only        |
| -n   | Numeric (don't resolve names) |
| -p   | Show process                  |

**Output Example:**

```
State    Recv-Q Send-Q  Local Address:Port   Peer Address:Port
LISTEN   0      128     0.0.0.0:22            0.0.0.0:*         ← SSH listening
LISTEN   0      128     0.0.0.0:80            0.0.0.0:*         ← HTTP listening
ESTAB    0      0       192.168.1.10:52431   93.184.216.34:443 ← Active HTTPS
```

***

### nslookup / dig - DNS Queries

Query DNS servers directly.

```bash theme={null}
# nslookup (simple, cross-platform)
nslookup google.com
nslookup google.com 8.8.8.8           # Use specific DNS server
nslookup -type=MX google.com          # Query MX records

# dig (more detailed, Linux/Mac)
dig google.com
dig google.com MX                      # MX records
dig @8.8.8.8 google.com               # Use specific server
dig +trace google.com                  # Show full resolution path
dig +short google.com                  # Just the IP
```

**dig Output Explained:**

```bash theme={null}
$ dig example.com

;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A      93.184.216.34
                        ↑                         ↑
                       TTL                      Answer

;; Query time: 25 msec
;; SERVER: 8.8.8.8#53
```

***

### curl / wget - Test HTTP(S)

Make HTTP requests from command line.

```bash theme={null}
# Basic GET request
curl https://api.example.com/health

# Verbose (see headers, SSL handshake)
curl -v https://example.com

# Only headers
curl -I https://example.com

# Follow redirects
curl -L https://example.com

# Timing breakdown
curl -w "@curl-format.txt" -o /dev/null -s https://example.com

# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com
```

**Timing Format File (curl-format.txt):**

```
time_namelookup:  %{time_namelookup}s\n
time_connect:     %{time_connect}s\n
time_appconnect:  %{time_appconnect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_starttransfer: %{time_starttransfer}s\n
time_total:       %{time_total}s\n
```

***

### telnet / nc (netcat) - Test Port Connectivity

Check if a port is open and accepting connections.

```bash theme={null}
# telnet (basic)
telnet google.com 443
telnet mail.example.com 25

# netcat (more powerful)
nc -zv google.com 443           # Test if port is open
nc -zv google.com 80-443        # Scan port range
nc -l 8080                       # Listen on port (create server)

# Test with timeout
nc -zv -w 5 google.com 443      # 5 second timeout
```

**PowerShell Alternative (Windows):**

```powershell theme={null}
Test-NetConnection google.com -Port 443
Test-NetConnection -ComputerName google.com -Port 443 -InformationLevel Detailed
```

***

### tcpdump / Wireshark - Packet Capture

See exactly what's happening on the network.

```bash theme={null}
# tcpdump (command line)
sudo tcpdump -i eth0                          # All traffic on eth0
sudo tcpdump -i eth0 port 80                  # Only port 80
sudo tcpdump -i eth0 host 192.168.1.10       # Only this host
sudo tcpdump -i eth0 -w capture.pcap          # Save to file
sudo tcpdump -i eth0 -c 100                   # Capture 100 packets

# Common filters
sudo tcpdump 'tcp port 443'                   # HTTPS traffic
sudo tcpdump 'icmp'                           # Ping traffic
sudo tcpdump 'src 192.168.1.10'              # From specific IP
sudo tcpdump 'dst port 53'                    # DNS queries
```

**Wireshark:** GUI alternative with powerful analysis. Open `.pcap` files from tcpdump. Wireshark is indispensable for deep debugging -- you can see every packet, decode protocol headers, follow TCP streams, and filter by any field. If `ping` and `curl` tell you "something is wrong," Wireshark tells you exactly **what** is wrong at the packet level.

<Tip>
  **When to use tcpdump vs Wireshark**: Use `tcpdump` on remote servers where you only have CLI access. Capture to a `.pcap` file and then download it for analysis in Wireshark on your local machine. Do not try to run Wireshark directly on a production server -- it requires a GUI and consumes significant resources.
</Tip>

***

### mtr - Combined ping + traceroute

Continuous traceroute with statistics.

```bash theme={null}
mtr google.com
mtr --report google.com        # Generate report and exit
mtr --tcp --port 443 google.com # Use TCP
```

**Output:**

```
                             Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.1.1               0.0%    10    1.2   1.3   1.0   2.1   0.3
 2. 10.0.0.1                  0.0%    10    5.4   5.2   4.8   6.1   0.4
 3. 72.14.215.85              0.0%    10   10.2  10.5   9.8  12.3   0.8
 4. 142.250.190.46            0.0%    10   12.1  12.3  11.5  14.2   0.9
```

***

## 14.3 Common Issues and Solutions

### Issue: "Cannot Reach Website"

<Accordion title="Diagnostic Steps">
  ```bash theme={null}
  # 1. Check if you have network
  ping 8.8.8.8

  # 2. Check DNS resolution
  nslookup example.com

  # 3. Check if website responds
  curl -I https://example.com

  # 4. Check your routing
  traceroute example.com

  # 5. Check if port is blocked locally
  sudo iptables -L -n  # Linux
  netsh advfirewall show allprofiles  # Windows
  ```
</Accordion>

### Issue: "Connection Refused"

<Accordion title="Diagnostic Steps">
  ```bash theme={null}
  # Service is not running or not listening on that port

  # 1. Check if service is running
  systemctl status nginx
  ps aux | grep nginx

  # 2. Check what's listening
  ss -tuln | grep 80

  # 3. Check if it's bound to the right interface
  # 0.0.0.0:80 = all interfaces
  # 127.0.0.1:80 = localhost only
  ```
</Accordion>

### Issue: "Connection Timeout"

<Accordion title="Diagnostic Steps">
  ```bash theme={null}
  # Either host is unreachable or firewall is dropping packets

  # 1. Check basic connectivity
  ping target-host

  # 2. Check specific port
  nc -zv target-host 443

  # 3. Check route
  traceroute target-host

  # 4. Check firewall
  # On target: sudo iptables -L -n
  # On cloud: Check Security Groups, NACLs
  ```
</Accordion>

### Issue: "Slow Network"

<Accordion title="Diagnostic Steps">
  ```bash theme={null}
  # 1. Check latency
  ping -c 10 target

  # 2. Check path for bottlenecks
  mtr target

  # 3. Check bandwidth
  iperf3 -c target  # Requires iperf3 server

  # 4. Check for packet loss
  ping -c 100 target | tail -2

  # 5. Check for DNS slowness
  time nslookup example.com
  ```
</Accordion>

### Issue: "Intermittent Connectivity"

<Accordion title="Diagnostic Steps">
  ```bash theme={null}
  # 1. Long-running ping to detect drops
  ping -c 1000 target

  # 2. Continuous mtr
  mtr target

  # 3. Check interface errors
  ip -s link show eth0
  # Look for: RX errors, TX errors, dropped

  # 4. Check system logs
  dmesg | grep -i network
  journalctl -u NetworkManager
  ```
</Accordion>

***

## 14.4 Network Diagnostic Flowchart

```
Start: "Can't reach X"
           │
           ▼
    ┌──────────────┐
    │ Can you ping │──No──► Check physical connection
    │   gateway?   │        Check IP configuration
    └──────┬───────┘        (ip addr / ipconfig)
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you ping │──No──► Check firewall
    │   8.8.8.8?   │        Check routing (ip route)
    └──────┬───────┘
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you      │──No──► Check DNS settings
    │ nslookup X?  │        Try different DNS server
    └──────┬───────┘
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you curl │──No──► Check if service is running
    │   X:port?    │        Check target firewall
    └──────┬───────┘        Check nc -zv X port
           │ Yes
           ▼
    Connection works!
    Issue might be application-level
```

***

## 14.5 Reading Log Files

### Common Log Locations

| System            | Location                             |
| ----------------- | ------------------------------------ |
| Linux syslog      | /var/log/syslog, /var/log/messages   |
| Network Manager   | journalctl -u NetworkManager         |
| Nginx             | /var/log/nginx/access.log, error.log |
| Apache            | /var/log/apache2/ or /var/log/httpd/ |
| AWS VPC Flow Logs | CloudWatch Logs                      |

### Useful Log Commands

```bash theme={null}
# Follow log in real-time
tail -f /var/log/syslog

# Search for errors
grep -i error /var/log/syslog

# Last 100 lines
tail -100 /var/log/nginx/error.log

# Filter by time (journalctl)
journalctl --since "1 hour ago"
journalctl --since "2024-01-01 12:00:00"
```

***

## 14.6 Cloud-Specific Troubleshooting

### AWS Troubleshooting Checklist

```
□ Security Group allows traffic (inbound rules)
□ NACL allows traffic (both inbound AND outbound)
□ Route table has correct routes
□ Internet Gateway attached (for public subnets)
□ NAT Gateway configured (for private subnets)
□ Elastic IP associated (if needed)
□ VPC Flow Logs enabled for debugging
```

### VPC Flow Log Analysis

```
2 123456789012 eni-abc123 10.0.1.10 10.0.2.20 443 49152 6 10 840 1234567890 1234567899 ACCEPT OK
│ │            │          │          │         │   │    │ │  │   │          │          │      │
│ │            │          │          │         │   │    │ │  │   │          │          │      └─ Log status
│ │            │          │          │         │   │    │ │  │   │          │          └─ Action
│ │            │          │          │         │   │    │ │  │   │          └─ End time
│ │            │          │          │         │   │    │ │  │   └─ Start time
│ │            │          │          │         │   │    │ │  └─ Bytes
│ │            │          │          │         │   │    │ └─ Packets
│ │            │          │          │         │   │    └─ Protocol (6=TCP)
│ │            │          │          │         │   └─ Dest port
│ │            │          │          │         └─ Source port
│ │            │          │          └─ Dest IP
│ │            │          └─ Source IP
│ │            └─ Network interface
│ └─ Account ID
└─ Version
```

***

## 14.7 Key Takeaways

<CardGroup cols={2}>
  <Card title="Start at Layer 1" icon="plug">
    Always check physical connectivity first. Many "network issues" are unplugged cables.
  </Card>

  <Card title="Ping isn't Everything" icon="xmark">
    ICMP can be blocked. Use nc/telnet to test specific ports.
  </Card>

  <Card title="Know Your Tools" icon="toolbox">
    ping, traceroute, dig, curl, netstat, tcpdump - master these.
  </Card>

  <Card title="Check Logs" icon="file-lines">
    Logs often have the answer. Know where to find them.
  </Card>
</CardGroup>

***

## Next Module

<Card title="Module 15: VPNs & Tunneling" icon="arrow-right" href="/courses/networking-mastery/15-vpn-tunneling">
  Understand VPN technologies, tunneling protocols, and secure remote access.
</Card>
