Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Module 14: Network Troubleshooting

When networks fail, you need systematic approaches and the right tools to diagnose issues. The difference between a junior and senior engineer is not that the senior knows more tools — it is that the senior follows a systematic process instead of randomly trying things. This module covers the essential troubleshooting toolkit and, more importantly, the mental framework for diagnosing network issues efficiently. Think of network troubleshooting like debugging a plumbing problem. You do not start by ripping open walls — you start at the faucet (application layer) and work backward. Is the faucet open? Is there water pressure at the valve? Is the main line connected? Each test eliminates an entire category of problems. The OSI model gives you the same systematic, bottom-up (or top-down) approach for networks.
Network Troubleshooting Flowchart
Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: All previous modules

14.1 The Troubleshooting Mindset

The OSI Troubleshooting Approach

Start from the bottom and work up. This is the golden rule of network troubleshooting. You cannot have a working Layer 7 (HTTP) if Layer 3 (IP routing) is broken, and Layer 3 cannot work if Layer 1 (the physical cable) is unplugged. Always rule out lower layers first.
Layer 7: Application    "Is the application configured correctly?"
Layer 6: Presentation   "Is data being encrypted/decoded properly?"
Layer 5: Session        "Is the session established?"
Layer 4: Transport      "Is the port open? Is TCP/UDP working?"
Layer 3: Network        "Can I reach the IP? Is routing correct?"
Layer 2: Data Link      "Is the MAC address reachable? VLAN correct?"
Layer 1: Physical       "Is the cable plugged in? Is there link light?"
The most common mistake: Jumping straight to Layer 7 (“maybe the application config is wrong”) when the actual problem is at Layer 1 (“the cat chewed through the Ethernet cable”). Experienced engineers have learned this lesson the hard way, often after spending hours debugging application code when a quick ping would have revealed no network connectivity at all.

Quick Diagnostic Checklist

1

Physical Layer

  • Is the cable connected?
  • Is the link light on?
  • Is WiFi connected?
2

Network Layer

  • Do I have an IP address?
  • Can I ping the gateway?
  • Can I ping external IPs?
3

DNS

  • Can I resolve domain names?
  • Is DNS server reachable?
4

Application

  • Is the port open?
  • Is the service running?
  • Are there firewall blocks?

14.2 Essential Network Tools

ping - Test Basic Connectivity

The most basic tool. Tests if a host is reachable via ICMP.
# Basic ping
ping google.com

# Specific count
ping -c 4 google.com       # Linux/Mac
ping -n 4 google.com       # Windows

# Continuous with timestamp
ping -c 100 -D google.com
Output Analysis:
PING google.com (142.250.190.46): 56 data bytes
64 bytes from 142.250.190.46: icmp_seq=0 ttl=117 time=12.3 ms
64 bytes from 142.250.190.46: icmp_seq=1 ttl=117 time=11.8 ms
64 bytes from 142.250.190.46: icmp_seq=2 ttl=117 time=14.2 ms

--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.8/12.8/14.2/1.0 ms
MetricMeaning
ttlTime To Live - decrements each hop
timeRound-trip latency
packet loss% of packets that didn’t return
Ping can be misleading! Some hosts block ICMP (ping), so no response does not always mean the host is down. AWS Security Groups, for example, do not allow ICMP by default. A server could be perfectly healthy and serving HTTP traffic while ignoring all pings. Always follow up with a port-specific test (nc -zv host 443 or curl) before concluding a host is down.Conversely, a successful ping does not mean your application works. Ping only proves Layer 3 (IP) connectivity. The application (Layer 7) could be crashed while the OS still responds to pings.

traceroute / tracert - Trace the Path

Shows every hop between you and the destination.
# Linux/Mac
traceroute google.com

# Windows
tracert google.com

# Using TCP instead of ICMP (bypasses ICMP blocks)
traceroute -T -p 443 google.com
Output:
traceroute to google.com (142.250.190.46), 30 hops max
 1  192.168.1.1 (192.168.1.1)  1.234 ms  1.123 ms  1.456 ms
 2  10.0.0.1 (10.0.0.1)  5.678 ms  5.432 ms  5.789 ms
 3  * * *                                             ← No response
 4  72.14.215.85 (72.14.215.85)  10.123 ms  9.876 ms
 5  142.250.190.46 (142.250.190.46)  12.345 ms  11.987 ms
Interpreting Results:
PatternMeaning
* * *Hop doesn’t respond to probes (firewall/ICMP blocked)
High latency at one hopPossible congestion at that point
Increasing latencyNormal - each hop adds time
Sudden huge increasePotential bottleneck
Reading traceroute like a pro: A single * * * hop in the middle is usually harmless — many routers are configured not to respond to traceroute probes. But if every hop after a certain point shows * * * or the trace never completes, that is where the path is broken. Also, do not panic about high latency on one intermediate hop — routers deprioritize ICMP responses, so the displayed time for that hop may look bad while actual forwarding is fine. What matters is whether latency increases and stays high for all subsequent hops.

netstat / ss - View Network Connections

See what’s connected to your machine.
# Linux (ss is modern replacement for netstat)
ss -tuln                    # TCP/UDP listening ports
ss -tunp                    # Include process names
ss -s                       # Summary statistics

# Windows
netstat -an                 # All connections, numeric
netstat -ano                # Include process IDs
netstat -b                  # Show executable names (admin required)

# Mac
netstat -an | grep LISTEN   # Listening ports
lsof -i -P                  # Better alternative
Common Flags:
FlagMeaning
-tTCP connections
-uUDP connections
-lListening sockets only
-nNumeric (don’t resolve names)
-pShow process
Output Example:
State    Recv-Q Send-Q  Local Address:Port   Peer Address:Port
LISTEN   0      128     0.0.0.0:22            0.0.0.0:*         ← SSH listening
LISTEN   0      128     0.0.0.0:80            0.0.0.0:*         ← HTTP listening
ESTAB    0      0       192.168.1.10:52431   93.184.216.34:443 ← Active HTTPS

nslookup / dig - DNS Queries

Query DNS servers directly.
# nslookup (simple, cross-platform)
nslookup google.com
nslookup google.com 8.8.8.8           # Use specific DNS server
nslookup -type=MX google.com          # Query MX records

# dig (more detailed, Linux/Mac)
dig google.com
dig google.com MX                      # MX records
dig @8.8.8.8 google.com               # Use specific server
dig +trace google.com                  # Show full resolution path
dig +short google.com                  # Just the IP
dig Output Explained:
$ dig example.com

;; QUESTION SECTION:
;example.com.                   IN      A

;; ANSWER SECTION:
example.com.            3600    IN      A      93.184.216.34

                       TTL                      Answer

;; Query time: 25 msec
;; SERVER: 8.8.8.8#53

curl / wget - Test HTTP(S)

Make HTTP requests from command line.
# Basic GET request
curl https://api.example.com/health

# Verbose (see headers, SSL handshake)
curl -v https://example.com

# Only headers
curl -I https://example.com

# Follow redirects
curl -L https://example.com

# Timing breakdown
curl -w "@curl-format.txt" -o /dev/null -s https://example.com

# POST with data
curl -X POST -d '{"key":"value"}' -H "Content-Type: application/json" https://api.example.com
Timing Format File (curl-format.txt):
time_namelookup:  %{time_namelookup}s\n
time_connect:     %{time_connect}s\n
time_appconnect:  %{time_appconnect}s\n
time_pretransfer: %{time_pretransfer}s\n
time_starttransfer: %{time_starttransfer}s\n
time_total:       %{time_total}s\n

telnet / nc (netcat) - Test Port Connectivity

Check if a port is open and accepting connections.
# telnet (basic)
telnet google.com 443
telnet mail.example.com 25

# netcat (more powerful)
nc -zv google.com 443           # Test if port is open
nc -zv google.com 80-443        # Scan port range
nc -l 8080                       # Listen on port (create server)

# Test with timeout
nc -zv -w 5 google.com 443      # 5 second timeout
PowerShell Alternative (Windows):
Test-NetConnection google.com -Port 443
Test-NetConnection -ComputerName google.com -Port 443 -InformationLevel Detailed

tcpdump / Wireshark - Packet Capture

See exactly what’s happening on the network.
# tcpdump (command line)
sudo tcpdump -i eth0                          # All traffic on eth0
sudo tcpdump -i eth0 port 80                  # Only port 80
sudo tcpdump -i eth0 host 192.168.1.10       # Only this host
sudo tcpdump -i eth0 -w capture.pcap          # Save to file
sudo tcpdump -i eth0 -c 100                   # Capture 100 packets

# Common filters
sudo tcpdump 'tcp port 443'                   # HTTPS traffic
sudo tcpdump 'icmp'                           # Ping traffic
sudo tcpdump 'src 192.168.1.10'              # From specific IP
sudo tcpdump 'dst port 53'                    # DNS queries
Wireshark: GUI alternative with powerful analysis. Open .pcap files from tcpdump. Wireshark is indispensable for deep debugging — you can see every packet, decode protocol headers, follow TCP streams, and filter by any field. If ping and curl tell you “something is wrong,” Wireshark tells you exactly what is wrong at the packet level.
When to use tcpdump vs Wireshark: Use tcpdump on remote servers where you only have CLI access. Capture to a .pcap file and then download it for analysis in Wireshark on your local machine. Do not try to run Wireshark directly on a production server — it requires a GUI and consumes significant resources.

mtr - Combined ping + traceroute

Continuous traceroute with statistics.
mtr google.com
mtr --report google.com        # Generate report and exit
mtr --tcp --port 443 google.com # Use TCP
Output:
                             Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. 192.168.1.1               0.0%    10    1.2   1.3   1.0   2.1   0.3
 2. 10.0.0.1                  0.0%    10    5.4   5.2   4.8   6.1   0.4
 3. 72.14.215.85              0.0%    10   10.2  10.5   9.8  12.3   0.8
 4. 142.250.190.46            0.0%    10   12.1  12.3  11.5  14.2   0.9

14.3 Common Issues and Solutions

Issue: “Cannot Reach Website”

# 1. Check if you have network
ping 8.8.8.8

# 2. Check DNS resolution
nslookup example.com

# 3. Check if website responds
curl -I https://example.com

# 4. Check your routing
traceroute example.com

# 5. Check if port is blocked locally
sudo iptables -L -n  # Linux
netsh advfirewall show allprofiles  # Windows

Issue: “Connection Refused”

# Service is not running or not listening on that port

# 1. Check if service is running
systemctl status nginx
ps aux | grep nginx

# 2. Check what's listening
ss -tuln | grep 80

# 3. Check if it's bound to the right interface
# 0.0.0.0:80 = all interfaces
# 127.0.0.1:80 = localhost only

Issue: “Connection Timeout”

# Either host is unreachable or firewall is dropping packets

# 1. Check basic connectivity
ping target-host

# 2. Check specific port
nc -zv target-host 443

# 3. Check route
traceroute target-host

# 4. Check firewall
# On target: sudo iptables -L -n
# On cloud: Check Security Groups, NACLs

Issue: “Slow Network”

# 1. Check latency
ping -c 10 target

# 2. Check path for bottlenecks
mtr target

# 3. Check bandwidth
iperf3 -c target  # Requires iperf3 server

# 4. Check for packet loss
ping -c 100 target | tail -2

# 5. Check for DNS slowness
time nslookup example.com

Issue: “Intermittent Connectivity”

# 1. Long-running ping to detect drops
ping -c 1000 target

# 2. Continuous mtr
mtr target

# 3. Check interface errors
ip -s link show eth0
# Look for: RX errors, TX errors, dropped

# 4. Check system logs
dmesg | grep -i network
journalctl -u NetworkManager

14.4 Network Diagnostic Flowchart

Start: "Can't reach X"


    ┌──────────────┐
    │ Can you ping │──No──► Check physical connection
    │   gateway?   │        Check IP configuration
    └──────┬───────┘        (ip addr / ipconfig)
           │ Yes

    ┌──────────────┐
    │ Can you ping │──No──► Check firewall
    │   8.8.8.8?   │        Check routing (ip route)
    └──────┬───────┘
           │ Yes

    ┌──────────────┐
    │ Can you      │──No──► Check DNS settings
    │ nslookup X?  │        Try different DNS server
    └──────┬───────┘
           │ Yes

    ┌──────────────┐
    │ Can you curl │──No──► Check if service is running
    │   X:port?    │        Check target firewall
    └──────┬───────┘        Check nc -zv X port
           │ Yes

    Connection works!
    Issue might be application-level

14.5 Reading Log Files

Common Log Locations

SystemLocation
Linux syslog/var/log/syslog, /var/log/messages
Network Managerjournalctl -u NetworkManager
Nginx/var/log/nginx/access.log, error.log
Apache/var/log/apache2/ or /var/log/httpd/
AWS VPC Flow LogsCloudWatch Logs

Useful Log Commands

# Follow log in real-time
tail -f /var/log/syslog

# Search for errors
grep -i error /var/log/syslog

# Last 100 lines
tail -100 /var/log/nginx/error.log

# Filter by time (journalctl)
journalctl --since "1 hour ago"
journalctl --since "2024-01-01 12:00:00"

14.6 Cloud-Specific Troubleshooting

AWS Troubleshooting Checklist

□ Security Group allows traffic (inbound rules)
□ NACL allows traffic (both inbound AND outbound)
□ Route table has correct routes
□ Internet Gateway attached (for public subnets)
□ NAT Gateway configured (for private subnets)
□ Elastic IP associated (if needed)
□ VPC Flow Logs enabled for debugging

VPC Flow Log Analysis

2 123456789012 eni-abc123 10.0.1.10 10.0.2.20 443 49152 6 10 840 1234567890 1234567899 ACCEPT OK
│ │            │          │          │         │   │    │ │  │   │          │          │      │
│ │            │          │          │         │   │    │ │  │   │          │          │      └─ Log status
│ │            │          │          │         │   │    │ │  │   │          │          └─ Action
│ │            │          │          │         │   │    │ │  │   │          └─ End time
│ │            │          │          │         │   │    │ │  │   └─ Start time
│ │            │          │          │         │   │    │ │  └─ Bytes
│ │            │          │          │         │   │    │ └─ Packets
│ │            │          │          │         │   │    └─ Protocol (6=TCP)
│ │            │          │          │         │   └─ Dest port
│ │            │          │          │         └─ Source port
│ │            │          │          └─ Dest IP
│ │            │          └─ Source IP
│ │            └─ Network interface
│ └─ Account ID
└─ Version

14.7 Key Takeaways

Start at Layer 1

Always check physical connectivity first. Many “network issues” are unplugged cables.

Ping isn't Everything

ICMP can be blocked. Use nc/telnet to test specific ports.

Know Your Tools

ping, traceroute, dig, curl, netstat, tcpdump - master these.

Check Logs

Logs often have the answer. Know where to find them.

Next Module

Module 15: VPNs & Tunneling

Understand VPN technologies, tunneling protocols, and secure remote access.