Module 14: Network Troubleshooting
14.1 The Troubleshooting Mindset
The OSI Troubleshooting Approach
Quick Diagnostic Checklist
14.2 Essential Network Tools
ping - Test Basic Connectivity
traceroute / tracert - Trace the Path
netstat / ss - View Network Connections
nslookup / dig - DNS Queries
curl / wget - Test HTTP(S)
telnet / nc (netcat) - Test Port Connectivity
tcpdump / Wireshark - Packet Capture
mtr - Combined ping + traceroute
14.3 Common Issues and Solutions
Issue: “Cannot Reach Website”
Issue: “Connection Refused”
Issue: “Connection Timeout”
Issue: “Slow Network”
Issue: “Intermittent Connectivity”
14.4 Network Diagnostic Flowchart
14.5 Reading Log Files
Common Log Locations
Useful Log Commands
14.6 Cloud-Specific Troubleshooting
AWS Troubleshooting Checklist
VPC Flow Log Analysis
14.7 Key Takeaways
Next Module

Module 14: Network Troubleshooting

When networks fail, you need systematic approaches and the right tools to diagnose issues. This module covers the essential troubleshooting toolkit every engineer should master.

Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: All previous modules

14.1 The Troubleshooting Mindset

The OSI Troubleshooting Approach

Start from the bottom and work up:

Layer 7: Application    "Is the application configured correctly?"
Layer 6: Presentation   "Is data being encrypted/decoded properly?"
Layer 5: Session        "Is the session established?"
Layer 4: Transport      "Is the port open? Is TCP/UDP working?"
Layer 3: Network        "Can I reach the IP? Is routing correct?"
Layer 2: Data Link      "Is the MAC address reachable?"
Layer 1: Physical       "Is the cable plugged in? Is there power?"

Quick Diagnostic Checklist

Physical Layer

Is the cable connected?
Is the link light on?
Is WiFi connected?

Network Layer

Do I have an IP address?
Can I ping the gateway?
Can I ping external IPs?

DNS

Can I resolve domain names?
Is DNS server reachable?

Application

Is the port open?
Is the service running?
Are there firewall blocks?

14.2 Essential Network Tools

ping - Test Basic Connectivity

The most basic tool. Tests if a host is reachable via ICMP.

# Basic ping
ping google.com

# Specific count
ping -c 4 google.com       # Linux/Mac
ping -n 4 google.com       # Windows

# Continuous with timestamp
ping -c 100 -D google.com

Output Analysis:

PING google.com (142.250.190.46): 56 data bytes
64 bytes from 142.250.190.46: icmp_seq=0 ttl=117 time=12.3 ms
64 bytes from 142.250.190.46: icmp_seq=1 ttl=117 time=11.8 ms
64 bytes from 142.250.190.46: icmp_seq=2 ttl=117 time=14.2 ms

--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.8/12.8/14.2/1.0 ms

Metric	Meaning
ttl	Time To Live - decrements each hop
time	Round-trip latency
packet loss	% of packets that didn’t return

Ping can be misleading! Some hosts block ICMP (ping), so no response doesn’t always mean the host is down. The service might still be running.

traceroute / tracert - Trace the Path

Shows every hop between you and the destination.

# Linux/Mac
traceroute google.com

# Windows
tracert google.com

# Using TCP instead of ICMP (bypasses ICMP blocks)
traceroute -T -p 443 google.com

Output:

traceroute to google.com (142.250.190.46), 30 hops max
192.168.1.1 (192.168.1.1)  1.234 ms  1.123 ms  1.456 ms
10.0.0.1 (10.0.0.1)  5.678 ms  5.432 ms  5.789 ms
* * *                                             ← No response
72.14.215.85 (72.14.215.85)  10.123 ms  9.876 ms
142.250.190.46 (142.250.190.46)  12.345 ms  11.987 ms

Interpreting Results:

Pattern	Meaning
`* * *`	Hop doesn’t respond to probes (firewall/ICMP blocked)
High latency at one hop	Possible congestion at that point
Increasing latency	Normal - each hop adds time
Sudden huge increase	Potential bottleneck

netstat / ss - View Network Connections

See what’s connected to your machine.

# Linux (ss is modern replacement for netstat)
ss -tuln                    # TCP/UDP listening ports
ss -tunp                    # Include process names
ss -s                       # Summary statistics

# Windows
netstat -an                 # All connections, numeric
netstat -ano                # Include process IDs
netstat -b                  # Show executable names (admin required)

# Mac
netstat -an | grep LISTEN   # Listening ports
lsof -i -P                  # Better alternative

Common Flags:

Flag	Meaning
-t	TCP connections
-u	UDP connections
-l	Listening sockets only
-n	Numeric (don’t resolve names)
-p	Show process

Output Example:

State    Recv-Q Send-Q  Local Address:Port   Peer Address:Port
LISTEN   0      128     0.0.0.0:22            0.0.0.0:*         ← SSH listening
LISTEN   0      128     0.0.0.0:80            0.0.0.0:*         ← HTTP listening
ESTAB    0      0       192.168.1.10:52431   93.184.216.34:443 ← Active HTTPS

nslookup / dig - DNS Queries

Query DNS servers directly.

# nslookup (simple, cross-platform)
nslookup google.com
nslookup google.com 8.8.8.8           # Use specific DNS server
nslookup -type=MX google.com          # Query MX records

# dig (more detailed, Linux/Mac)
dig google.com
dig google.com MX                      # MX records
dig @8.8.8.8 google.com               # Use specific server
dig +trace google.com                  # Show full resolution path
dig +short google.com                  # Just the IP

dig Output Explained:

$ dig example.com

;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A      93.184.216.34
                        ↑                         ↑
                       TTL                      Answer

;; Query time: 25 msec
;; SERVER: 8.8.8.8#53

curl / wget - Test HTTP(S)

Make HTTP requests from command line.

# Basic GET request
curl https://api.example.com/health

# Verbose (see headers, SSL handshake)
curl -v https://example.com

# Only headers
curl -I https://example.com

# Follow redirects
curl -L https://example.com

# Timing breakdown
curl -w "@curl-format.txt" -o /dev/null -s https://example.com

# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com

Timing Format File (curl-format.txt):

time_namelookup:  %{time_namelookup}s\n
time_connect:     %{time_connect}s\n
time_appconnect:  %{time_appconnect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_starttransfer: %{time_starttransfer}s\n
time_total:       %{time_total}s\n

telnet / nc (netcat) - Test Port Connectivity

Check if a port is open and accepting connections.

# telnet (basic)
telnet google.com 443
telnet mail.example.com 25

# netcat (more powerful)
nc -zv google.com 443           # Test if port is open
nc -zv google.com 80-443        # Scan port range
nc -l 8080                       # Listen on port (create server)

# Test with timeout
nc -zv -w 5 google.com 443      # 5 second timeout

PowerShell Alternative (Windows):

Test-NetConnection google.com -Port 443
Test-NetConnection -ComputerName google.com -Port 443 -InformationLevel Detailed

tcpdump / Wireshark - Packet Capture

See exactly what’s happening on the network.

# tcpdump (command line)
sudo tcpdump -i eth0                          # All traffic on eth0
sudo tcpdump -i eth0 port 80                  # Only port 80
sudo tcpdump -i eth0 host 192.168.1.10       # Only this host
sudo tcpdump -i eth0 -w capture.pcap          # Save to file
sudo tcpdump -i eth0 -c 100                   # Capture 100 packets

# Common filters
sudo tcpdump 'tcp port 443'                   # HTTPS traffic
sudo tcpdump 'icmp'                           # Ping traffic
sudo tcpdump 'src 192.168.1.10'              # From specific IP
sudo tcpdump 'dst port 53'                    # DNS queries

Wireshark: GUI alternative with powerful analysis. Open .pcap files from tcpdump.

mtr - Combined ping + traceroute

Continuous traceroute with statistics.

mtr google.com
mtr --report google.com        # Generate report and exit
mtr --tcp --port 443 google.com # Use TCP

Output:

                             Loss%   Snt   Last   Avg  Best  Wrst StDev
192.168.1.1               0.0%    10    1.2   1.3   1.0   2.1   0.3
10.0.0.1                  0.0%    10    5.4   5.2   4.8   6.1   0.4
72.14.215.85              0.0%    10   10.2  10.5   9.8  12.3   0.8
142.250.190.46            0.0%    10   12.1  12.3  11.5  14.2   0.9

14.3 Common Issues and Solutions

Issue: “Cannot Reach Website”

Diagnostic Steps

# 1. Check if you have network
ping 8.8.8.8

# 2. Check DNS resolution
nslookup example.com

# 3. Check if website responds
curl -I https://example.com

# 4. Check your routing
traceroute example.com

# 5. Check if port is blocked locally
sudo iptables -L -n  # Linux
netsh advfirewall show allprofiles  # Windows

Issue: “Connection Refused”

Diagnostic Steps

# Service is not running or not listening on that port

# 1. Check if service is running
systemctl status nginx
ps aux | grep nginx

# 2. Check what's listening
ss -tuln | grep 80

# 3. Check if it's bound to the right interface
# 0.0.0.0:80 = all interfaces
# 127.0.0.1:80 = localhost only

Issue: “Connection Timeout”

Diagnostic Steps

# Either host is unreachable or firewall is dropping packets

# 1. Check basic connectivity
ping target-host

# 2. Check specific port
nc -zv target-host 443

# 3. Check route
traceroute target-host

# 4. Check firewall
# On target: sudo iptables -L -n
# On cloud: Check Security Groups, NACLs

Issue: “Slow Network”

Diagnostic Steps

# 1. Check latency
ping -c 10 target

# 2. Check path for bottlenecks
mtr target

# 3. Check bandwidth
iperf3 -c target  # Requires iperf3 server

# 4. Check for packet loss
ping -c 100 target | tail -2

# 5. Check for DNS slowness
time nslookup example.com

Issue: “Intermittent Connectivity”

Diagnostic Steps

# 1. Long-running ping to detect drops
ping -c 1000 target

# 2. Continuous mtr
mtr target

# 3. Check interface errors
ip -s link show eth0
# Look for: RX errors, TX errors, dropped

# 4. Check system logs
dmesg | grep -i network
journalctl -u NetworkManager

14.4 Network Diagnostic Flowchart

Start: "Can't reach X"
           │
           ▼
    ┌──────────────┐
    │ Can you ping │──No──► Check physical connection
    │   gateway?   │        Check IP configuration
    └──────┬───────┘        (ip addr / ipconfig)
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you ping │──No──► Check firewall
    │   8.8.8.8?   │        Check routing (ip route)
    └──────┬───────┘
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you      │──No──► Check DNS settings
    │ nslookup X?  │        Try different DNS server
    └──────┬───────┘
           │ Yes
           ▼
    ┌──────────────┐
    │ Can you curl │──No──► Check if service is running
    │   X:port?    │        Check target firewall
    └──────┬───────┘        Check nc -zv X port
           │ Yes
           ▼
    Connection works!
    Issue might be application-level

14.5 Reading Log Files

Common Log Locations

System	Location
Linux syslog	/var/log/syslog, /var/log/messages
Network Manager	journalctl -u NetworkManager
Nginx	/var/log/nginx/access.log, error.log
Apache	/var/log/apache2/ or /var/log/httpd/
AWS VPC Flow Logs	CloudWatch Logs

Useful Log Commands

# Follow log in real-time
tail -f /var/log/syslog

# Search for errors
grep -i error /var/log/syslog

# Last 100 lines
tail -100 /var/log/nginx/error.log

# Filter by time (journalctl)
journalctl --since "1 hour ago"
journalctl --since "2024-01-01 12:00:00"

14.6 Cloud-Specific Troubleshooting

AWS Troubleshooting Checklist

□ Security Group allows traffic (inbound rules)
□ NACL allows traffic (both inbound AND outbound)
□ Route table has correct routes
□ Internet Gateway attached (for public subnets)
□ NAT Gateway configured (for private subnets)
□ Elastic IP associated (if needed)
□ VPC Flow Logs enabled for debugging

VPC Flow Log Analysis

2 123456789012 eni-abc123 10.0.1.10 10.0.2.20 443 49152 6 10 840 1234567890 1234567899 ACCEPT OK
│ │            │          │          │         │   │    │ │  │   │          │          │      │
│ │            │          │          │         │   │    │ │  │   │          │          │      └─ Log status
│ │            │          │          │         │   │    │ │  │   │          │          └─ Action
│ │            │          │          │         │   │    │ │  │   │          └─ End time
│ │            │          │          │         │   │    │ │  │   └─ Start time
│ │            │          │          │         │   │    │ │  └─ Bytes
│ │            │          │          │         │   │    │ └─ Packets
│ │            │          │          │         │   │    └─ Protocol (6=TCP)
│ │            │          │          │         │   └─ Dest port
│ │            │          │          │         └─ Source port
│ │            │          │          └─ Dest IP
│ │            │          └─ Source IP
│ │            └─ Network interface
│ └─ Account ID
└─ Version

14.7 Key Takeaways

Start at Layer 1

Always check physical connectivity first. Many “network issues” are unplugged cables.

Ping isn't Everything

ICMP can be blocked. Use nc/telnet to test specific ports.

Know Your Tools

ping, traceroute, dig, curl, netstat, tcpdump - master these.

Check Logs

Logs often have the answer. Know where to find them.

Next Module

Module 15: VPNs & Tunneling

Understand VPN technologies, tunneling protocols, and secure remote access.

13. Load Balancing 15. VPNs & Tunneling

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Module 14: Network Troubleshooting

​14.1 The Troubleshooting Mindset

​The OSI Troubleshooting Approach

​Quick Diagnostic Checklist

​14.2 Essential Network Tools

​ping - Test Basic Connectivity

​traceroute / tracert - Trace the Path

​netstat / ss - View Network Connections

​nslookup / dig - DNS Queries

​curl / wget - Test HTTP(S)

​telnet / nc (netcat) - Test Port Connectivity

​tcpdump / Wireshark - Packet Capture

​mtr - Combined ping + traceroute

​14.3 Common Issues and Solutions

​Issue: “Cannot Reach Website”

​Issue: “Connection Refused”

​Issue: “Connection Timeout”

​Issue: “Slow Network”

​Issue: “Intermittent Connectivity”

​14.4 Network Diagnostic Flowchart

​14.5 Reading Log Files

​Common Log Locations

​Useful Log Commands

​14.6 Cloud-Specific Troubleshooting

​AWS Troubleshooting Checklist

​VPC Flow Log Analysis

​14.7 Key Takeaways

Start at Layer 1

Ping isn't Everything

Know Your Tools

Check Logs

​Next Module

Module 15: VPNs & Tunneling

Module 14: Network Troubleshooting

14.1 The Troubleshooting Mindset

The OSI Troubleshooting Approach

Quick Diagnostic Checklist

14.2 Essential Network Tools

ping - Test Basic Connectivity

traceroute / tracert - Trace the Path

netstat / ss - View Network Connections

nslookup / dig - DNS Queries

curl / wget - Test HTTP(S)

telnet / nc (netcat) - Test Port Connectivity

tcpdump / Wireshark - Packet Capture

mtr - Combined ping + traceroute

14.3 Common Issues and Solutions

Issue: “Cannot Reach Website”

Issue: “Connection Refused”

Issue: “Connection Timeout”

Issue: “Slow Network”

Issue: “Intermittent Connectivity”

14.4 Network Diagnostic Flowchart

14.5 Reading Log Files

Common Log Locations

Useful Log Commands

14.6 Cloud-Specific Troubleshooting

AWS Troubleshooting Checklist

VPC Flow Log Analysis

14.7 Key Takeaways

Next Module