Skip to main content

Module 14: Network Troubleshooting

When networks fail, you need systematic approaches and the right tools to diagnose issues. This module covers the essential troubleshooting toolkit every engineer should master.
Network Troubleshooting Flowchart
Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: All previous modules

14.1 The Troubleshooting Mindset

The OSI Troubleshooting Approach

Start from the bottom and work up:
Layer 7: Application    "Is the application configured correctly?"
Layer 6: Presentation   "Is data being encrypted/decoded properly?"
Layer 5: Session        "Is the session established?"
Layer 4: Transport      "Is the port open? Is TCP/UDP working?"
Layer 3: Network        "Can I reach the IP? Is routing correct?"
Layer 2: Data Link      "Is the MAC address reachable?"
Layer 1: Physical       "Is the cable plugged in? Is there power?"

Quick Diagnostic Checklist

1

Physical Layer

  • Is the cable connected?
  • Is the link light on?
  • Is WiFi connected?
2

Network Layer

  • Do I have an IP address?
  • Can I ping the gateway?
  • Can I ping external IPs?
3

DNS

  • Can I resolve domain names?
  • Is DNS server reachable?
4

Application

  • Is the port open?
  • Is the service running?
  • Are there firewall blocks?

14.2 Essential Network Tools

ping - Test Basic Connectivity

The most basic tool. Tests if a host is reachable via ICMP.
# Basic ping
ping google.com

# Specific count
ping -c 4 google.com       # Linux/Mac
ping -n 4 google.com       # Windows

# Continuous with timestamp
ping -c 100 -D google.com
Output Analysis:
PING google.com (142.250.190.46): 56 data bytes
64 bytes from 142.250.190.46: icmp_seq=0 ttl=117 time=12.3 ms
64 bytes from 142.250.190.46: icmp_seq=1 ttl=117 time=11.8 ms
64 bytes from 142.250.190.46: icmp_seq=2 ttl=117 time=14.2 ms

--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.8/12.8/14.2/1.0 ms
MetricMeaning
ttlTime To Live - decrements each hop
timeRound-trip latency
packet loss% of packets that didn’t return
Ping can be misleading! Some hosts block ICMP (ping), so no response doesn’t always mean the host is down. The service might still be running.

traceroute / tracert - Trace the Path

Shows every hop between you and the destination.
# Linux/Mac
traceroute google.com

# Windows
tracert google.com

# Using TCP instead of ICMP (bypasses ICMP blocks)
traceroute -T -p 443 google.com
Output:
traceroute to google.com (142.250.190.46), 30 hops max
 1  192.168.1.1 (192.168.1.1)  1.234 ms  1.123 ms  1.456 ms
 2  10.0.0.1 (10.0.0.1)  5.678 ms  5.432 ms  5.789 ms
 3  * * *                                             ← No response
 4  72.14.215.85 (72.14.215.85)  10.123 ms  9.876 ms
 5  142.250.190.46 (142.250.190.46)  12.345 ms  11.987 ms
Interpreting Results:
PatternMeaning
* * *Hop doesn’t respond to probes (firewall/ICMP blocked)
High latency at one hopPossible congestion at that point
Increasing latencyNormal - each hop adds time
Sudden huge increasePotential bottleneck

netstat / ss - View Network Connections

See what’s connected to your machine.
# Linux (ss is modern replacement for netstat)
ss -tuln                    # TCP/UDP listening ports
ss -tunp                    # Include process names
ss -s                       # Summary statistics

# Windows
netstat -an                 # All connections, numeric
netstat -ano                # Include process IDs
netstat -b                  # Show executable names (admin required)

# Mac
netstat -an | grep LISTEN   # Listening ports
lsof -i -P                  # Better alternative
Common Flags:
FlagMeaning
-tTCP connections
-uUDP connections
-lListening sockets only
-nNumeric (don’t resolve names)
-pShow process
Output Example:
State    Recv-Q Send-Q  Local Address:Port   Peer Address:Port
LISTEN   0      128     0.0.0.0:22            0.0.0.0:*         ← SSH listening
LISTEN   0      128     0.0.0.0:80            0.0.0.0:*         ← HTTP listening
ESTAB    0      0       192.168.1.10:52431   93.184.216.34:443 ← Active HTTPS

nslookup / dig - DNS Queries

Query DNS servers directly.
# nslookup (simple, cross-platform)
nslookup google.com
nslookup google.com 8.8.8.8           # Use specific DNS server
nslookup -type=MX google.com          # Query MX records

# dig (more detailed, Linux/Mac)
dig google.com
dig google.com MX                      # MX records
dig @8.8.8.8 google.com               # Use specific server
dig +trace google.com                  # Show full resolution path
dig +short google.com                  # Just the IP
dig Output Explained:
$ dig example.com

;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A      93.184.216.34

                       TTL                      Answer

;; Query time: 25 msec
;; SERVER: 8.8.8.8#53

curl / wget - Test HTTP(S)

Make HTTP requests from command line.
# Basic GET request
curl https://api.example.com/health

# Verbose (see headers, SSL handshake)
curl -v https://example.com

# Only headers
curl -I https://example.com

# Follow redirects
curl -L https://example.com

# Timing breakdown
curl -w "@curl-format.txt" -o /dev/null -s https://example.com

# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com
Timing Format File (curl-format.txt):
time_namelookup:  %{time_namelookup}s\n
time_connect:     %{time_connect}s\n
time_appconnect:  %{time_appconnect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_starttransfer: %{time_starttransfer}s\n
time_total:       %{time_total}s\n

telnet / nc (netcat) - Test Port Connectivity

Check if a port is open and accepting connections.
# telnet (basic)
telnet google.com 443
telnet mail.example.com 25

# netcat (more powerful)
nc -zv google.com 443           # Test if port is open
nc -zv google.com 80-443        # Scan port range
nc -l 8080                       # Listen on port (create server)

# Test with timeout
nc -zv -w 5 google.com 443      # 5 second timeout
PowerShell Alternative (Windows):
Test-NetConnection google.com -Port 443
Test-NetConnection -ComputerName google.com -Port 443 -InformationLevel Detailed

tcpdump / Wireshark - Packet Capture

See exactly what’s happening on the network.
# tcpdump (command line)
sudo tcpdump -i eth0                          # All traffic on eth0
sudo tcpdump -i eth0 port 80                  # Only port 80
sudo tcpdump -i eth0 host 192.168.1.10       # Only this host
sudo tcpdump -i eth0 -w capture.pcap          # Save to file
sudo tcpdump -i eth0 -c 100                   # Capture 100 packets

# Common filters
sudo tcpdump 'tcp port 443'                   # HTTPS traffic
sudo tcpdump 'icmp'                           # Ping traffic
sudo tcpdump 'src 192.168.1.10'              # From specific IP
sudo tcpdump 'dst port 53'                    # DNS queries
Wireshark: GUI alternative with powerful analysis. Open .pcap files from tcpdump.

mtr - Combined ping + traceroute

Continuous traceroute with statistics.
mtr google.com
mtr --report google.com        # Generate report and exit
mtr --tcp --port 443 google.com # Use TCP
Output:
                             Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.1.1               0.0%    10    1.2   1.3   1.0   2.1   0.3
 2. 10.0.0.1                  0.0%    10    5.4   5.2   4.8   6.1   0.4
 3. 72.14.215.85              0.0%    10   10.2  10.5   9.8  12.3   0.8
 4. 142.250.190.46            0.0%    10   12.1  12.3  11.5  14.2   0.9

14.3 Common Issues and Solutions

Issue: “Cannot Reach Website”

# 1. Check if you have network
ping 8.8.8.8

# 2. Check DNS resolution
nslookup example.com

# 3. Check if website responds
curl -I https://example.com

# 4. Check your routing
traceroute example.com

# 5. Check if port is blocked locally
sudo iptables -L -n  # Linux
netsh advfirewall show allprofiles  # Windows

Issue: “Connection Refused”

# Service is not running or not listening on that port

# 1. Check if service is running
systemctl status nginx
ps aux | grep nginx

# 2. Check what's listening
ss -tuln | grep 80

# 3. Check if it's bound to the right interface
# 0.0.0.0:80 = all interfaces
# 127.0.0.1:80 = localhost only

Issue: “Connection Timeout”

# Either host is unreachable or firewall is dropping packets

# 1. Check basic connectivity
ping target-host

# 2. Check specific port
nc -zv target-host 443

# 3. Check route
traceroute target-host

# 4. Check firewall
# On target: sudo iptables -L -n
# On cloud: Check Security Groups, NACLs

Issue: “Slow Network”

# 1. Check latency
ping -c 10 target

# 2. Check path for bottlenecks
mtr target

# 3. Check bandwidth
iperf3 -c target  # Requires iperf3 server

# 4. Check for packet loss
ping -c 100 target | tail -2

# 5. Check for DNS slowness
time nslookup example.com

Issue: “Intermittent Connectivity”

# 1. Long-running ping to detect drops
ping -c 1000 target

# 2. Continuous mtr
mtr target

# 3. Check interface errors
ip -s link show eth0
# Look for: RX errors, TX errors, dropped

# 4. Check system logs
dmesg | grep -i network
journalctl -u NetworkManager

14.4 Network Diagnostic Flowchart

Start: "Can't reach X"


    ┌──────────────┐
    │ Can you ping │──No──► Check physical connection
    │   gateway?   │        Check IP configuration
    └──────┬───────┘        (ip addr / ipconfig)
           │ Yes

    ┌──────────────┐
    │ Can you ping │──No──► Check firewall
    │   8.8.8.8?   │        Check routing (ip route)
    └──────┬───────┘
           │ Yes

    ┌──────────────┐
    │ Can you      │──No──► Check DNS settings
    │ nslookup X?  │        Try different DNS server
    └──────┬───────┘
           │ Yes

    ┌──────────────┐
    │ Can you curl │──No──► Check if service is running
    │   X:port?    │        Check target firewall
    └──────┬───────┘        Check nc -zv X port
           │ Yes

    Connection works!
    Issue might be application-level

14.5 Reading Log Files

Common Log Locations

SystemLocation
Linux syslog/var/log/syslog, /var/log/messages
Network Managerjournalctl -u NetworkManager
Nginx/var/log/nginx/access.log, error.log
Apache/var/log/apache2/ or /var/log/httpd/
AWS VPC Flow LogsCloudWatch Logs

Useful Log Commands

# Follow log in real-time
tail -f /var/log/syslog

# Search for errors
grep -i error /var/log/syslog

# Last 100 lines
tail -100 /var/log/nginx/error.log

# Filter by time (journalctl)
journalctl --since "1 hour ago"
journalctl --since "2024-01-01 12:00:00"

14.6 Cloud-Specific Troubleshooting

AWS Troubleshooting Checklist

□ Security Group allows traffic (inbound rules)
□ NACL allows traffic (both inbound AND outbound)
□ Route table has correct routes
□ Internet Gateway attached (for public subnets)
□ NAT Gateway configured (for private subnets)
□ Elastic IP associated (if needed)
□ VPC Flow Logs enabled for debugging

VPC Flow Log Analysis

2 123456789012 eni-abc123 10.0.1.10 10.0.2.20 443 49152 6 10 840 1234567890 1234567899 ACCEPT OK
│ │            │          │          │         │   │    │ │  │   │          │          │      │
│ │            │          │          │         │   │    │ │  │   │          │          │      └─ Log status
│ │            │          │          │         │   │    │ │  │   │          │          └─ Action
│ │            │          │          │         │   │    │ │  │   │          └─ End time
│ │            │          │          │         │   │    │ │  │   └─ Start time
│ │            │          │          │         │   │    │ │  └─ Bytes
│ │            │          │          │         │   │    │ └─ Packets
│ │            │          │          │         │   │    └─ Protocol (6=TCP)
│ │            │          │          │         │   └─ Dest port
│ │            │          │          │         └─ Source port
│ │            │          │          └─ Dest IP
│ │            │          └─ Source IP
│ │            └─ Network interface
│ └─ Account ID
└─ Version

14.7 Key Takeaways

Start at Layer 1

Always check physical connectivity first. Many “network issues” are unplugged cables.

Ping isn't Everything

ICMP can be blocked. Use nc/telnet to test specific ports.

Know Your Tools

ping, traceroute, dig, curl, netstat, tcpdump - master these.

Check Logs

Logs often have the answer. Know where to find them.

Next Module

Module 15: VPNs & Tunneling

Understand VPN technologies, tunneling protocols, and secure remote access.