When networks fail, you need systematic approaches and the right tools to diagnose issues. The difference between a junior and senior engineer is not that the senior knows more tools — it is that the senior follows a systematic process instead of randomly trying things. This module covers the essential troubleshooting toolkit and, more importantly, the mental framework for diagnosing network issues efficiently.Think of network troubleshooting like debugging a plumbing problem. You do not start by ripping open walls — you start at the faucet (application layer) and work backward. Is the faucet open? Is there water pressure at the valve? Is the main line connected? Each test eliminates an entire category of problems. The OSI model gives you the same systematic, bottom-up (or top-down) approach for networks.
Estimated Time: 3-4 hours Difficulty: Intermediate Prerequisites: All previous modules
Start from the bottom and work up. This is the golden rule of network troubleshooting. You cannot have a working Layer 7 (HTTP) if Layer 3 (IP routing) is broken, and Layer 3 cannot work if Layer 1 (the physical cable) is unplugged. Always rule out lower layers first.
Layer 7: Application "Is the application configured correctly?"Layer 6: Presentation "Is data being encrypted/decoded properly?"Layer 5: Session "Is the session established?"Layer 4: Transport "Is the port open? Is TCP/UDP working?"Layer 3: Network "Can I reach the IP? Is routing correct?"Layer 2: Data Link "Is the MAC address reachable? VLAN correct?"Layer 1: Physical "Is the cable plugged in? Is there link light?"
The most common mistake: Jumping straight to Layer 7 (“maybe the application config is wrong”) when the actual problem is at Layer 1 (“the cat chewed through the Ethernet cable”). Experienced engineers have learned this lesson the hard way, often after spending hours debugging application code when a quick ping would have revealed no network connectivity at all.
PING google.com (142.250.190.46): 56 data bytes64 bytes from 142.250.190.46: icmp_seq=0 ttl=117 time=12.3 ms64 bytes from 142.250.190.46: icmp_seq=1 ttl=117 time=11.8 ms64 bytes from 142.250.190.46: icmp_seq=2 ttl=117 time=14.2 ms--- google.com ping statistics ---3 packets transmitted, 3 packets received, 0.0% packet lossround-trip min/avg/max/stddev = 11.8/12.8/14.2/1.0 ms
Metric
Meaning
ttl
Time To Live - decrements each hop
time
Round-trip latency
packet loss
% of packets that didn’t return
Ping can be misleading! Some hosts block ICMP (ping), so no response does not always mean the host is down. AWS Security Groups, for example, do not allow ICMP by default. A server could be perfectly healthy and serving HTTP traffic while ignoring all pings. Always follow up with a port-specific test (nc -zv host 443 or curl) before concluding a host is down.Conversely, a successful ping does not mean your application works. Ping only proves Layer 3 (IP) connectivity. The application (Layer 7) could be crashed while the OS still responds to pings.
# Linux/Mactraceroute google.com# Windowstracert google.com# Using TCP instead of ICMP (bypasses ICMP blocks)traceroute -T -p 443 google.com
Output:
traceroute to google.com (142.250.190.46), 30 hops max 1 192.168.1.1 (192.168.1.1) 1.234 ms 1.123 ms 1.456 ms 2 10.0.0.1 (10.0.0.1) 5.678 ms 5.432 ms 5.789 ms 3 * * * ← No response 4 72.14.215.85 (72.14.215.85) 10.123 ms 9.876 ms 5 142.250.190.46 (142.250.190.46) 12.345 ms 11.987 ms
Interpreting Results:
Pattern
Meaning
* * *
Hop doesn’t respond to probes (firewall/ICMP blocked)
High latency at one hop
Possible congestion at that point
Increasing latency
Normal - each hop adds time
Sudden huge increase
Potential bottleneck
Reading traceroute like a pro: A single * * * hop in the middle is usually harmless — many routers are configured not to respond to traceroute probes. But if every hop after a certain point shows * * * or the trace never completes, that is where the path is broken. Also, do not panic about high latency on one intermediate hop — routers deprioritize ICMP responses, so the displayed time for that hop may look bad while actual forwarding is fine. What matters is whether latency increases and stays high for all subsequent hops.
# Linux (ss is modern replacement for netstat)ss -tuln # TCP/UDP listening portsss -tunp # Include process namesss -s # Summary statistics# Windowsnetstat -an # All connections, numericnetstat -ano # Include process IDsnetstat -b # Show executable names (admin required)# Macnetstat -an | grep LISTEN # Listening portslsof -i -P # Better alternative
Common Flags:
Flag
Meaning
-t
TCP connections
-u
UDP connections
-l
Listening sockets only
-n
Numeric (don’t resolve names)
-p
Show process
Output Example:
State Recv-Q Send-Q Local Address:Port Peer Address:PortLISTEN 0 128 0.0.0.0:22 0.0.0.0:* ← SSH listeningLISTEN 0 128 0.0.0.0:80 0.0.0.0:* ← HTTP listeningESTAB 0 0 192.168.1.10:52431 93.184.216.34:443 ← Active HTTPS
# nslookup (simple, cross-platform)nslookup google.comnslookup google.com 8.8.8.8 # Use specific DNS servernslookup -type=MX google.com # Query MX records# dig (more detailed, Linux/Mac)dig google.comdig google.com MX # MX recordsdig @8.8.8.8 google.com # Use specific serverdig +trace google.com # Show full resolution pathdig +short google.com # Just the IP
dig Output Explained:
$ dig example.com;; QUESTION SECTION:;example.com. IN A;; ANSWER SECTION:example.com. 3600 IN A 93.184.216.34 ↑ ↑ TTL Answer;; Query time: 25 msec;; SERVER: 8.8.8.8#53
Check if a port is open and accepting connections.
# telnet (basic)telnet google.com 443telnet mail.example.com 25# netcat (more powerful)nc -zv google.com 443 # Test if port is opennc -zv google.com 80-443 # Scan port rangenc -l 8080 # Listen on port (create server)# Test with timeoutnc -zv -w 5 google.com 443 # 5 second timeout
# tcpdump (command line)sudo tcpdump -i eth0 # All traffic on eth0sudo tcpdump -i eth0 port 80 # Only port 80sudo tcpdump -i eth0 host 192.168.1.10 # Only this hostsudo tcpdump -i eth0 -w capture.pcap # Save to filesudo tcpdump -i eth0 -c 100 # Capture 100 packets# Common filterssudo tcpdump 'tcp port 443' # HTTPS trafficsudo tcpdump 'icmp' # Ping trafficsudo tcpdump 'src 192.168.1.10' # From specific IPsudo tcpdump 'dst port 53' # DNS queries
Wireshark: GUI alternative with powerful analysis. Open .pcap files from tcpdump. Wireshark is indispensable for deep debugging — you can see every packet, decode protocol headers, follow TCP streams, and filter by any field. If ping and curl tell you “something is wrong,” Wireshark tells you exactly what is wrong at the packet level.
When to use tcpdump vs Wireshark: Use tcpdump on remote servers where you only have CLI access. Capture to a .pcap file and then download it for analysis in Wireshark on your local machine. Do not try to run Wireshark directly on a production server — it requires a GUI and consumes significant resources.
# 1. Check if you have networkping 8.8.8.8# 2. Check DNS resolutionnslookup example.com# 3. Check if website respondscurl -I https://example.com# 4. Check your routingtraceroute example.com# 5. Check if port is blocked locallysudo iptables -L -n # Linuxnetsh advfirewall show allprofiles # Windows
# Service is not running or not listening on that port# 1. Check if service is runningsystemctl status nginxps aux | grep nginx# 2. Check what's listeningss -tuln | grep 80# 3. Check if it's bound to the right interface# 0.0.0.0:80 = all interfaces# 127.0.0.1:80 = localhost only