Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Module 17: Firewalls & Security Groups

Network security is implemented at multiple layers — and getting it wrong at any one of them can expose your entire infrastructure. Think of network security like physical building security: a front door lock (firewall) is important, but you also need key cards for individual floors (NACLs), badge readers on each office (security groups), and ID verification at each desk (application-level controls). This module covers traditional firewalls, cloud security groups, and network access control strategies, with a focus on the real-world mistakes that cause breaches.
Firewall and Security Groups
Estimated Time: 3-4 hours
Difficulty: Intermediate
Prerequisites: Module 4 (Network Layer), Module 7 (Security basics)

17.1 What is a Firewall?

A firewall is a network security device that monitors and controls incoming and outgoing network traffic based on predetermined security rules. Think of it like a bouncer at a nightclub: every packet that wants to enter (or leave) your network has to pass through the bouncer, who checks it against a guest list (the rule set). If the packet matches an “allow” rule, it gets in. If it matches a “deny” rule — or does not match any allow rule — it gets turned away. The critical insight is where the firewall sits. A firewall only controls traffic that passes through it. If there is a side door (a misconfigured route, a VPN tunnel, an exposed cloud instance), the firewall never sees that traffic. This is why modern architectures use firewalls at multiple layers.

Firewall Positioning

                    Internet


               ┌────────────────┐
               │    Firewall    │  ← First line of defense
               └────────┬───────┘

           ┌────────────┼────────────┐
           │            │            │
           ▼            ▼            ▼
       ┌───────┐    ┌───────┐    ┌───────┐
       │  DMZ  │    │  Web  │    │ App   │
       │       │    │ Tier  │    │ Tier  │
       └───────┘    └───────┘    └───────┘

17.2 Types of Firewalls

1. Packet Filtering Firewall (Stateless)

Inspects each packet independently based on header information. This is the simplest form of firewall — every packet is a stranger, and the firewall has no memory of what came before. It looks at the packet headers (source IP, destination IP, protocol, port) and makes an allow/deny decision purely based on those fields. It is like a security guard who checks your ID badge every single time you walk through a door, even if you just walked through 10 seconds ago.
Rule evaluation (each packet individually):
┌─────────────────────────────────────────────────────────┐
│ Rule │ Action │ Protocol │ Src IP    │ Dst Port │      │
├─────────────────────────────────────────────────────────┤
│  1   │ ALLOW  │   TCP    │ Any       │    80    │      │
│  2   │ ALLOW  │   TCP    │ Any       │   443    │      │
│  3   │ DENY   │   ALL    │ Any       │   Any    │      │
└─────────────────────────────────────────────────────────┘
Pros: Fast, low overhead — because it only reads headers, it can process packets at line speed with minimal CPU. Cons: Cannot track connection state, so you must explicitly allow return traffic. If you allow outbound traffic to port 443, you also need a separate rule to allow inbound traffic from port 443 back to your ephemeral ports. This creates rule bloat and makes configuration error-prone. Attackers can also craft packets that look like return traffic to slip through.
Practical tip: Stateless firewalls are still used today in high-throughput scenarios where per-packet performance matters more than deep inspection — for example, AWS NACLs are stateless by design. If you are working with NACLs, always remember the ephemeral port rules.

2. Stateful Inspection Firewall

Tracks the state of active connections. This is a massive upgrade over stateless firewalls. The firewall maintains a connection table — essentially a memory of every active conversation. When an outbound request goes out, the firewall remembers the 4-tuple (source IP, source port, destination IP, destination port) and automatically allows the corresponding return traffic. Think of it like a receptionist who logs every outgoing phone call, so when the call back comes in, they already know to put it through without asking you again.
Connection Table:
┌────────────────────────────────────────────────────────────┐
│ Src IP        │ Src Port │ Dst IP       │ Dst Port │ State│
├────────────────────────────────────────────────────────────┤
│ 192.168.1.10  │  52431   │ 93.184.216.34│   443    │ EST  │
│ 192.168.1.11  │  54321   │ 8.8.8.8      │   53     │ EST  │
└────────────────────────────────────────────────────────────┘

Outbound: Allow → Connection tracked
Return traffic: Automatically allowed (matches established connection)
Advantage: Only need to allow outbound; return traffic is automatically permitted. This drastically simplifies rule management — you write half the rules and eliminate an entire class of misconfiguration bugs. Trade-off: The connection table consumes memory. Under a DDoS attack that opens millions of half-open connections (SYN flood), the connection table can overflow, causing the firewall to drop legitimate traffic. This is why stateful firewalls are often paired with DDoS protection (like AWS Shield or CloudFlare) that filters volumetric attacks before they hit the firewall.

3. Application Layer Firewall (WAF)

Inspects the actual application data (Layer 7). While lower-level firewalls only see “TCP traffic on port 443,” a WAF opens the envelope and reads the letter inside. It understands HTTP methods, URL paths, headers, cookies, and request bodies. This lets it detect attacks that are perfectly valid at the network level but malicious at the application level.
HTTP Request Analysis:
┌────────────────────────────────────────────────────────────┐
│ POST /login HTTP/1.1                                       │
│ Host: example.com                                          │
│ Content-Type: application/x-www-form-urlencoded            │
│                                                            │
│ username=admin&password=' OR '1'='1                        │
│                          ↑                                 │
│                    SQL Injection detected! → BLOCK         │
└────────────────────────────────────────────────────────────┘
Protects Against:
  • SQL Injection
  • Cross-Site Scripting (XSS)
  • Cross-Site Request Forgery (CSRF)
  • Bot attacks
  • DDoS at application layer

4. Next-Generation Firewall (NGFW)

Combines multiple security functions into a single device. Think of it as a Swiss Army knife for network security — instead of deploying separate boxes for each function, an NGFW does it all in one pass:
  • Stateful inspection — tracks connections like a traditional stateful firewall
  • Deep packet inspection (DPI) — reads past the headers into the payload
  • Intrusion prevention (IPS) — detects and blocks known attack signatures in real time
  • Application awareness — identifies applications regardless of port (catches Skype running on port 80, for example)
  • SSL/TLS inspection — decrypts, inspects, and re-encrypts HTTPS traffic (controversial for privacy reasons but common in enterprise environments)
  • URL filtering — blocks access to known malicious or policy-violating websites
Cost consideration: NGFWs from vendors like Palo Alto, Fortinet, and Check Point can cost 10,00010,000-100,000+ depending on throughput. In cloud environments, managed services (AWS Network Firewall, Azure Firewall Premium) offer similar capabilities at pay-per-use pricing, which is often more cost-effective for variable workloads.

17.3 Firewall Rules

Rule Structure

┌──────┬────────┬──────────┬───────────┬───────────┬──────────┬────────┐
│Order │ Action │ Protocol │ Source    │ Dest      │ Dst Port │ Notes  │
├──────┼────────┼──────────┼───────────┼───────────┼──────────┼────────┤
│  1   │ ALLOW  │ TCP      │ 10.0.1.0/24│ Any      │ 22       │ SSH    │
│  2   │ ALLOW  │ TCP      │ Any       │ Any       │ 80, 443  │ HTTP(S)│
│  3   │ ALLOW  │ TCP      │ 10.0.0.0/8│ DB Server │ 5432     │ DB     │
│  4   │ DENY   │ ALL      │ Any       │ Any       │ Any      │Default │
└──────┴────────┴──────────┴───────────┴───────────┴──────────┴────────┘

Rule Order Matters!

This is one of the most common sources of firewall misconfiguration. Rules are evaluated top to bottom, and the first matching rule wins — all subsequent rules are ignored for that packet. It is exactly like a series of if/else if statements in code: once a condition matches, none of the later conditions are checked.
Packet: TCP from 10.0.1.5 to 10.0.2.10:22

Rules evaluated top to bottom:
Rule 1: Source 10.0.1.0/24, Port 22 → MATCH → ALLOW  ← Stops here

If rules were reversed:
Rule 1: DENY ALL → MATCH → DENY (SSH would be blocked!)  ← Never reaches the SSH allow rule
Always put specific rules before general rules. The first matching rule wins. A common production incident: someone adds a broad “deny all” rule at the top during an emergency, forgets to remove it, and silently blocks legitimate traffic for hours. Always audit your rule order after making changes.
Pro tip: Number your rules with gaps (100, 200, 300) instead of sequential integers. This way, when you need to insert a rule between two existing ones, you have room. AWS NACLs use this convention by default. It is the same principle as the old BASIC line numbering (10, 20, 30) — leave space for future insertions.

17.4 AWS Security Groups

Security Groups act as virtual firewalls for EC2 instances. They are arguably the most important security control in AWS, because they are the last line of defense around your actual compute resources. If you get security groups right, even a misconfigured NACL or routing table is unlikely to expose your instances directly.

Key Characteristics

FeatureBehavior
StatefulReturn traffic automatically allowed
Instance levelAttached to ENI (network interface)
Allow onlyNo explicit deny rules
DefaultDeny all inbound, allow all outbound

Security Group Example

┌─────────────────────────────────────────────────────────────┐
│                   web-server-sg                              │
├─────────────────────────────────────────────────────────────┤
│ INBOUND RULES:                                              │
│ ┌─────────┬──────────┬──────────────┬────────────────────┐ │
│ │ Type    │ Protocol │ Port Range   │ Source             │ │
│ ├─────────┼──────────┼──────────────┼────────────────────┤ │
│ │ HTTP    │ TCP      │ 80           │ 0.0.0.0/0          │ │
│ │ HTTPS   │ TCP      │ 443          │ 0.0.0.0/0          │ │
│ │ SSH     │ TCP      │ 22           │ 10.0.1.0/24        │ │
│ │ Custom  │ TCP      │ 8080         │ sg-12345 (app-sg)  │ │
│ └─────────┴──────────┴──────────────┴────────────────────┘ │
│                                                             │
│ OUTBOUND RULES:                                             │
│ ┌─────────┬──────────┬──────────────┬────────────────────┐ │
│ │ Type    │ Protocol │ Port Range   │ Destination        │ │
│ ├─────────┼──────────┼──────────────┼────────────────────┤ │
│ │ All     │ All      │ All          │ 0.0.0.0/0          │ │
│ └─────────┴──────────┴──────────────┴────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Security Group Chaining

This is one of the most powerful and underused features in AWS. Instead of hardcoding IP addresses in your rules, you reference another security group as the source or destination. The beauty: when instances scale up, scale down, or get new IPs, the rules automatically apply to whatever instances are members of the referenced security group. No manual IP updates, no race conditions during autoscaling.
                    ┌───────────────┐
                    │   ALB-SG      │
                    │ Inbound: 443  │
                    │ from 0.0.0.0/0│
                    └───────┬───────┘


                    ┌───────────────┐
                    │   Web-SG      │
                    │ Inbound: 8080 │
                    │ from ALB-SG   │  ← Reference by SG, not IP
                    └───────┬───────┘


                    ┌───────────────┐
                    │   DB-SG       │
                    │ Inbound: 5432 │
                    │ from Web-SG   │  ← Only web servers can connect
                    └───────────────┘

17.5 AWS Network ACLs (NACLs)

NACLs are subnet-level firewalls — they guard the entrance to an entire subnet rather than individual instances. Think of security groups as locks on individual apartment doors, and NACLs as the lock on the building’s front entrance. Both are needed: the building entrance stops most unwanted visitors, and the apartment locks handle the rest.

Security Groups vs NACLs

AspectSecurity GroupNACL
LevelInstance (ENI)Subnet
StateStatefulStateless
RulesAllow onlyAllow and Deny
EvaluationAll rules evaluatedRules evaluated in order
DefaultDeny inbound, Allow outboundAllow all

NACL Rule Structure

┌─────────────────────────────────────────────────────────────┐
│                   INBOUND RULES                              │
├───────┬────────┬──────────┬───────────────┬────────┬────────┤
│ Rule# │ Type   │ Protocol │ Port Range    │ Source │ Action │
├───────┼────────┼──────────┼───────────────┼────────┼────────┤
│  100  │ HTTP   │ TCP      │ 80            │ 0.0.0.0│ ALLOW  │
│  110  │ HTTPS  │ TCP      │ 443           │ 0.0.0.0│ ALLOW  │
│  120  │ Custom │ TCP      │ 1024-65535    │ 0.0.0.0│ ALLOW  │ ← Ephemeral!
│  *    │ ALL    │ ALL      │ ALL           │ 0.0.0.0│ DENY   │
└───────┴────────┴──────────┴───────────────┴────────┴────────┘

┌─────────────────────────────────────────────────────────────┐
│                   OUTBOUND RULES                             │
├───────┬────────┬──────────┬───────────────┬────────┬────────┤
│ Rule# │ Type   │ Protocol │ Port Range    │ Dest   │ Action │
├───────┼────────┼──────────┼───────────────┼────────┼────────┤
│  100  │ HTTP   │ TCP      │ 80            │ 0.0.0.0│ ALLOW  │
│  110  │ HTTPS  │ TCP      │ 443           │ 0.0.0.0│ ALLOW  │
│  120  │ Custom │ TCP      │ 1024-65535    │ 0.0.0.0│ ALLOW  │ ← Ephemeral!
│  *    │ ALL    │ ALL      │ ALL           │ 0.0.0.0│ DENY   │
└───────┴────────┴──────────┴───────────────┴────────┴────────┘

Why Ephemeral Ports?

This is the number one gotcha with NACLs, and it trips up even experienced engineers. NACLs are stateless — they have no memory of connections, so return traffic needs explicit rules. Here is the problem: When your server responds to an HTTPS request, the response does not go back to port 443. It goes back to the ephemeral port that the client chose when it initiated the connection. That ephemeral port is a random number in a high range. If your NACL outbound rules only allow port 443, the response is blocked and the client sees a timeout.
Request:   Client:52431 → Server:443     ← Inbound rule allows port 443 -- works fine
Response:  Server:443 → Client:52431     ← Outbound rule must allow port 52431!

Ephemeral port range: 1024-65535 (or 32768-65535 on Linux)
This is the most common NACL debugging scenario: “I allowed port 443 in both directions but HTTPS is not working.” The fix is always the ephemeral port range. With security groups, you never see this problem because they are stateful and handle return traffic automatically.

17.6 Defense in Depth

Layer multiple security controls. The core philosophy: no single layer is perfect, so stack them. If an attacker bypasses one layer, they still face the next. In practice, this means accepting some redundancy in your security rules — that redundancy is the point, not a flaw.
┌─────────────────────────────────────────────────────────────┐
│                        Internet                              │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│           CloudFlare / AWS Shield (DDoS Protection)         │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                    WAF (SQL injection, XSS)                  │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                    NACL (Subnet level)                       │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                 Security Group (Instance level)              │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│              Host Firewall (iptables, Windows Firewall)      │
└───────────────────────────┬─────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│                      Application                             │
└─────────────────────────────────────────────────────────────┘

17.7 Linux iptables

The Linux kernel’s built-in firewall, and one of the most important tools to understand for anyone working with Linux servers. Every Docker port mapping, every Kubernetes kube-proxy rule, and every cloud NAT gateway ultimately translates into iptables rules (or its modern successor, nftables). Understanding iptables means understanding what is actually happening under the hood of your container orchestration.

Basic Syntax

iptables -A <chain> -p <protocol> --dport <port> -j <action>

Chains:
- INPUT: Incoming to this host
- OUTPUT: Outgoing from this host
- FORWARD: Passing through (routing)

Actions:
- ACCEPT: Allow
- DROP: Silently discard
- REJECT: Discard with error response

Common Commands

# List rules with numeric addresses and packet counters (skip DNS resolution with -n)
iptables -L -n -v

# Allow SSH -- only from specific networks in production, never 0.0.0.0/0
iptables -A INPUT -p tcp --dport 22 -j ACCEPT

# Allow established connections -- this single rule is what makes iptables "stateful"
# Without it, you'd need explicit rules for every return packet (like NACLs)
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Block specific IP -- DROP silently discards (attacker gets no feedback)
# Use REJECT instead if you want the sender to know they're blocked (sends ICMP error)
iptables -A INPUT -s 192.168.1.100 -j DROP

# Allow localhost -- critical for applications that talk to local services
# (Redis on localhost, Unix sockets, health check endpoints, etc.)
iptables -A INPUT -i lo -j ACCEPT

# Default deny -- everything not explicitly allowed is dropped
# This is the "whitelist" approach: deny by default, allow by exception
iptables -P INPUT DROP

# Save rules (varies by distro -- rules are lost on reboot without this!)
iptables-save > /etc/iptables.rules

# Modern alternative: nftables (replacement for iptables since Linux 3.13)
# Most distros still support iptables via a compatibility layer
nft list ruleset

Complete Example Script

#!/bin/bash
# Production-ready iptables configuration for a web server
# Run as root. Test in a staging environment first!

# Step 1: Flush existing rules -- start from a clean slate
# -F flushes all rules in all chains, -X deletes user-defined chains
iptables -F
iptables -X

# Step 2: Set default policies -- DROP means "deny unless explicitly allowed"
# INPUT DROP: block all incoming traffic by default (whitelist approach)
# FORWARD DROP: this server is not a router, so drop forwarded packets
# OUTPUT ACCEPT: allow all outbound traffic (restrict in high-security environments)
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT

# Step 3: Allow loopback interface -- MUST come first
# Many apps (Redis, Postgres, health checks) bind to 127.0.0.1
# Blocking loopback will break your application in mysterious ways
iptables -A INPUT -i lo -j ACCEPT

# Step 4: Allow established connections -- the "stateful" magic
# Once an outbound connection is made, return traffic flows automatically
# RELATED handles associated connections (like FTP data channels)
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Step 5: Allow SSH from management network ONLY
# Never allow SSH from 0.0.0.0/0 in production -- use a bastion host
iptables -A INPUT -p tcp -s 10.0.1.0/24 --dport 22 -j ACCEPT

# Step 6: Allow HTTP/HTTPS from anywhere (this is a public web server)
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT

# Step 7: Allow ICMP ping (optional but useful for monitoring)
# Some teams disable this to reduce attack surface; others need it for health checks
iptables -A INPUT -p icmp --icmp-type echo-request -j ACCEPT

# Step 8: Log dropped packets -- MUST be the last rule before the implicit drop
# Check logs with: grep "IPTables-Dropped" /var/log/syslog
# WARNING: High-traffic servers can generate enormous log volumes from this
iptables -A INPUT -j LOG --log-prefix "IPTables-Dropped: "
Common production mistake: Running this script over SSH without first ensuring the SSH allow rule is correct. If you lock yourself out, you need console access (or AWS Session Manager, or a reboot with a startup script that flushes rules). Always test firewall changes from a secondary connection, or set a cron job to flush rules in 5 minutes as a safety net: echo "iptables -F" | at now + 5 minutes.

17.8 Zero Trust Network

Traditional Model (Castle and Moat)

                    Firewall

Outside = Untrusted    │    Inside = Trusted
        ╳              │         ✓

     Everything blocked│   Everything allowed
Problem: Once inside, attackers move freely. This is called lateral movement, and it is exactly how most major breaches work. The 2013 Target breach started with a compromised HVAC vendor credential, then moved laterally through the trusted internal network to reach the payment systems. The perimeter firewall was doing its job perfectly — the attacker just did not come through the perimeter.

Zero Trust Model

"Never trust, always verify"

Every request authenticated, regardless of location:
- User identity verified
- Device health checked  
- Least privilege access
- Micro-segmentation
- Continuous monitoring

Zero Trust Principles

Verify Explicitly

Always authenticate and authorize based on all available data points.

Least Privilege

Limit user access with Just-In-Time and Just-Enough-Access (JIT/JEA).

Assume Breach

Minimize blast radius. Segment access. Verify end-to-end encryption.

Micro-segmentation

Each workload gets its own security perimeter.

17.9 Common Firewall Mistakes

# Bad: Allow all traffic from anywhere
ALLOW TCP 0.0.0.0/0 → Any port

# Good: Specific rules
ALLOW TCP 0.0.0.0/0 → Port 443 only
ALLOW TCP 10.0.1.0/24 → Port 22
# Inbound locked down, but...
# Outbound allows everything

Attacker compromises server → Can exfiltrate data freely

# Better: Restrict outbound too
ALLOW TCP Any → 443 (updates)
DENY TCP Any → Any (block reverse shells)
# Bad: Hardcoded IPs
ALLOW TCP 10.0.1.5 → Port 3306

# Good: Reference by security group
ALLOW TCP from web-sg → Port 3306

# IP changes don't break rules
# Forgot return traffic
ALLOW INBOUND TCP 443 ✓
ALLOW OUTBOUND TCP 443 ✓
# Return from 443 uses ephemeral port!

# Missing:
ALLOW INBOUND TCP 1024-65535  ← Needed for responses

17.10 Key Takeaways

Defense in Depth

Multiple layers of security. Never rely on a single control.

Stateful vs Stateless

Know when return traffic is automatic (SG) vs explicit (NACL).

Least Privilege

Only allow what’s necessary. Block everything else.

Rule Order Matters

Specific rules first, default deny last.

Next Module

Module 18: Container Networking

Understand how containers communicate: Docker networking, Kubernetes services, and service mesh.

Interview Deep-Dive

Strong Answer:Security groups and NACLs operate at different layers and have fundamentally different behaviors, which is why AWS provides both. Security groups are stateful, instance-level firewalls attached to ENIs (Elastic Network Interfaces). NACLs are stateless, subnet-level firewalls that guard the entrance to an entire subnet. The key architectural difference: security groups evaluate all rules together and apply the most permissive match (allow-only model), while NACLs evaluate rules in numerical order and stop at the first match (ordered rules with explicit allow and deny).In practice, security groups are your primary tool. They handle 90% of access control needs because they are stateful (return traffic is automatic), they support security group chaining (reference other groups instead of IPs, which is essential for autoscaling), and they are instance-specific (you can have different rules per instance in the same subnet). I use NACLs as a secondary defense layer — specifically for subnet-wide deny rules, like blocking a known malicious IP range across all instances in a subnet, or adding a temporary block during an active incident.How they interact: traffic must pass both. An inbound packet first hits the NACL (subnet border), then the security group (instance border). An outbound packet hits the security group first, then the NACL. If either one blocks the traffic, it is dropped. The critical gotcha is that NACLs are stateless — if you allow inbound HTTPS (port 443) in the NACL, you must also allow outbound ephemeral ports (1024-65535) for the responses. With security groups, this is automatic because they track connection state.The architectural pattern I use: keep NACLs with relatively open default rules (allow common traffic, deny known-bad ranges), and do the fine-grained access control in security groups. This avoids the debugging nightmare of ephemeral port issues in NACLs while still providing the subnet-level deny capability that security groups lack.Follow-up: “You mentioned security group chaining. What happens during an autoscaling event — is there a window where the new instance is unreachable?”When a new EC2 instance launches via autoscaling, it is assigned its security groups at launch time (defined in the launch template). The security group membership is effective immediately — there is no propagation delay for the security group rules themselves. However, the instance still needs to pass health checks before the load balancer sends it traffic. If another service references web-sg as the source in its inbound rules, the new instance can connect to that service as soon as its ENI is attached, because membership in web-sg is what matters, not the IP address. This is precisely why security group chaining is superior to hardcoded IPs for dynamic environments.
Strong Answer:I would create three security groups, one per tier, and use security group chaining so that each tier can only talk to the tier directly adjacent to it. This enforces the principle of least privilege at the network level.The ALB security group (alb-sg) allows inbound HTTPS (port 443) from 0.0.0.0/0 — this is the only security group that accepts traffic from the public internet. I would also allow HTTP (port 80) and immediately redirect to HTTPS at the ALB level, or use an NACL to block port 80 if I want to be strict.The application security group (app-sg) allows inbound traffic on the application port (say, 8080) from alb-sg only — not from 0.0.0.0/0, not from a CIDR range, but from the ALB’s security group. This means only traffic that passes through the load balancer can reach the app servers. I would also allow SSH (port 22) from a bastion-sg for maintenance, or better yet, use AWS Systems Manager Session Manager and skip SSH entirely.The database security group (db-sg) allows inbound on port 5432 (PostgreSQL) or 3306 (MySQL) from app-sg only. The database cannot be reached directly from the internet, from the ALB, or from the bastion — only from the application tier. Outbound rules on db-sg can be restricted to prevent the database from initiating connections to the internet, which is a data exfiltration control.The key design principles: traffic flows in one direction through the tiers (ALB to app to database), each security group references the previous tier’s group (not IPs), and no tier has broader access than it needs. When autoscaling adds a new app server, it automatically inherits the correct access to the database because it is a member of app-sg. When we scale the database to a read replica, it automatically accepts traffic from app servers because it is a member of db-sg.For defense in depth, I would add NACLs at the subnet level: the public subnet NACL allows 443 inbound and ephemeral ports outbound; the private app subnet NACL allows traffic from the public subnet CIDR only; the private database subnet NACL allows traffic from the app subnet CIDR only. This provides a second layer in case someone accidentally modifies a security group.Follow-up: “How would you handle a scenario where one of your application servers needs to call an external third-party API?”The app server’s outbound security group already allows all outbound by default (which is the AWS security group default). The NACL outbound rules would need to allow the destination port (usually 443) and the inbound ephemeral ports for the response. If I want to restrict outbound to specific IPs, I could tighten the security group’s outbound rules to allow HTTPS only to the third-party API’s IP range. For extra control, I would route the outbound traffic through a NAT gateway with a static IP, so the third-party can whitelist our IP, and use VPC flow logs to monitor what outbound connections the app servers are making.
Strong Answer:Traditional perimeter security operates on a castle-and-moat model: everything inside the network is trusted, everything outside is untrusted. Zero Trust flips this assumption entirely — nothing is trusted by default, regardless of where the request originates. Every request must be authenticated, authorized, and encrypted, whether it comes from the public internet or from the server sitting next to you in the same rack. The mantra is “never trust, always verify.”The motivation is practical, not philosophical. In modern environments, the perimeter is effectively dissolved. Employees work from home on personal devices, applications span multiple cloud providers and on-premise data centers, and third-party SaaS integrations mean trusted partners have credentials inside your network. The 2020 SolarWinds breach demonstrated this perfectly: the attacker was inside the network with a trusted identity, and perimeter defenses were irrelevant.In a cloud environment, I would implement Zero Trust across four layers. First, identity-based access: every service has a strong identity (not just an IP address), typically through short-lived certificates or tokens. In AWS, this means IAM roles for EC2 instances and ECS tasks, not long-lived API keys. In Kubernetes, this means service accounts with RBAC and a service mesh providing mTLS between every pod.Second, micro-segmentation: instead of a flat internal network, each workload gets its own security boundary. In Kubernetes, this is network policies that default-deny all traffic and explicitly allow only the connections each service needs. In AWS, this is fine-grained security groups with security group chaining, plus VPC endpoints for AWS service access so traffic never traverses the public internet.Third, continuous verification: every request is validated, not just the initial connection. This means short-lived tokens (JWTs with 15-minute expiry), mutual TLS with certificate rotation, and device posture checks (is this laptop encrypted, patched, and running endpoint detection). Google’s BeyondCorp is the canonical implementation of this approach.Fourth, assume breach: design every system as if an attacker already has access to some component. This means encrypting data at rest and in transit (even internal traffic), implementing comprehensive logging and anomaly detection, maintaining least-privilege access so a compromised service cannot escalate, and keeping blast radius small through isolation. If your database service is compromised, it should not be able to read secrets from the authentication service.Follow-up: “What is the biggest practical challenge when migrating from perimeter security to Zero Trust?”The biggest challenge is not technical — it is organizational. Zero Trust requires every team to define exactly what their service needs access to, which means someone has to map every inter-service dependency. In a large organization with hundreds of microservices, no one has a complete picture of who talks to whom. Starting with network flow logs (VPC Flow Logs, Cilium Hubble, Istio telemetry) to discover actual traffic patterns before writing deny rules is essential. Teams that skip this step and go straight to default-deny break production services they did not know existed. The migration has to be incremental: start in audit-only mode (log what would be blocked), validate with teams, then enforce.
Strong Answer:This is a classic “it should work but does not” scenario, and the key is to systematically eliminate layers. I would work from the outside in, checking each network control point.First, I verify the basics: is the application actually running and listening on port 8080? I would SSH into the instance (or use Session Manager) and run ss -tlnp | grep 8080 or netstat -tlnp | grep 8080. A surprising number of “network issues” turn out to be the application binding to 127.0.0.1 instead of 0.0.0.0, which means it only accepts connections from localhost, not from the network.Second, I check the security group. Not just the rule I added, but the full rule set. Is the source correct? If I allowed 10.0.1.0/24 but the client is in 10.0.2.0/24, the rule does not match. Is the protocol correct? TCP versus UDP is a common mix-up. Is the security group actually attached to the correct instance or ENI? An instance can have up to five security groups, and the rule might be on the wrong one.Third, I check the NACL on the instance’s subnet. Even if the security group allows port 8080 inbound, the NACL might block it. And critically, if the NACL allows inbound 8080, it also needs to allow outbound ephemeral ports (1024-65535) for the response traffic, because NACLs are stateless. This is the most commonly missed step.Fourth, I check the route table. Can the client actually route to the instance’s subnet? If the instance is in a private subnet, is there a route from the client’s network to that subnet? Is there a NAT gateway or VPN connection involved?Fifth, I check the client side. Is the client’s security group allowing outbound traffic on port 8080? By default, security groups allow all outbound, but if someone has restricted outbound rules, this could be the blocker.Sixth, I use VPC Flow Logs. If I am still stuck, I enable flow logs on the ENI and look for the specific traffic. Flow logs show whether traffic was ACCEPTED or REJECTED, and at which interface. If I see REJECT at the ENI level, it is a security group or NACL issue. If I see no log entry at all, the traffic is not reaching the instance, which points to a routing or upstream filtering issue.The tool I always reach for is VPC Reachability Analyzer (in the AWS console or CLI), which analyzes the entire path between source and destination and tells you exactly which component is blocking traffic. It checks route tables, NACLs, security groups, and peering connections in one shot.Follow-up: “What if VPC Flow Logs show the traffic as ACCEPTED but the application still is not responding?”If flow logs show ACCEPTED, the network layer is fine and the issue is at the host or application level. I would check the host-level firewall (iptables or firewalld on the instance itself — cloud security groups do not replace host firewalls). Then I would check if the application is healthy — maybe it is running but returning 502 errors, or it is overloaded and timing out. I would use curl -v localhost:8080 from inside the instance to verify the app responds locally, then tcpdump -i eth0 port 8080 to see if packets are arriving at the host but not reaching the application process.