Skip to main content

Chapter 3: Networking Fundamentals - The VPC

In most cloud platforms, a network is a regional construct. In Google Cloud, the VPC (Virtual Private Cloud) is a global resource. This single architectural difference changes how you design IP ranges, how services in different regions talk to each other, and how you extend your network back to on‑premises. This chapter assumes no prior networking expertise. We will start from basic IP and CIDR concepts, then layer on VPC design, firewalling, Shared VPC, hybrid connectivity, and SRE‑grade patterns.

1. From Scratch: IP, Subnets, and CIDR

Before talking about VPCs, we need a solid mental model for IP addressing.

1.1 What is an IP Address?

An IP address is like a phone number for a machine:
  • IPv4 addresses look like 10.0.1.25 (four numbers between 0 and 255)
  • Each address has two parts:
    • Network part – which “street” or subnet
    • Host part – which “house” on that street
In cloud networking you almost always work with private IP ranges, not public Internet addresses.

1.2 Private vs Public IPs

Private IP ranges (RFC1918) are:
  • 10.0.0.0/8 (10.x.x.x)
  • 172.16.0.0/12 (172.16.x.x – 172.31.x.x)
  • 192.168.0.0/16 (192.168.x.x)
These addresses:
  • Are not routable on the public Internet
  • Are reused by many organizations
  • Are ideal for internal service‑to‑service communication
Public IPs are globally unique addresses routable on the Internet. In GCP, public IPs are typically attached to:
  • VM instances
  • Load balancers
  • Certain managed services

1.3 CIDR Notation: 10.0.1.0/24

CIDR (Classless Inter-Domain Routing) expresses a network as:
  • Base address: 10.0.1.0
  • Prefix length: /24 (how many bits are the network)
A /24 means:
  • 24 bits for network, 8 bits for hosts
  • Total addresses: 2^(32-24) = 256
  • Usable addresses: 254 (network + broadcast reserved in many tools)
Common CIDR sizes:
CIDRHosts (approx)Typical Use
/321Single host or interface
/304Point‑to‑point links
/24256Small subnet (one app tier)
/204096Region‑level subnet
/1665,536Large shared address space
Rule of thumb: start slightly larger (/20 instead of /24) to avoid running out of addresses when your app succeeds.

2. Global VPCs and Regional Subnets

2.1 The VPC (Global)

A VPC network in GCP is:
  • A global logical network spanning all regions
  • Your private IP address space, routes, and firewall rules
  • Fully managed by Andromeda, Google’s SDN fabric
When you create a VPC, you are not choosing a region. You are defining a global container that can have subnets in many regions.

2.2 Subnets (Regional)

Inside a global VPC, you create subnets, each tied to a specific region:
VPC: prod-vpc (global)
├── Subnet: prod-us-central1 (10.0.0.0/20)
├── Subnet: prod-europe-west1 (10.0.16.0/20)
└── Subnet: prod-asia-southeast1 (10.0.32.0/20)
Key properties:
  • Subnets are regional, not zonal
  • Subnets define which IP range is available in that region
  • VM instances get an IP address from the subnet in their region

2.3 Growing Without Downtime: Subnet Expansion

One powerful GCP feature: you can expand a subnet’s IP range without downtime, as long as:
  • The new range is contiguous and within the VPC’s overall IP range
  • The new range does not overlap with other subnets
Example:
  • Start with 10.0.1.0/24 (256 addresses)
  • Later traffic grows; you expand to 10.0.0.0/20 (4096 addresses)
  • Existing VMs keep their IPs; new VMs get addresses in the bigger range
This is a huge improvement over many on‑prem networks where expanding a subnet might involve maintenance windows and IP renumbering.

2.4 Private Google Access

Private Google Access allows VMs without public IPs to reach Google APIs and services (Cloud Storage, BigQuery, etc.) over Google’s private backbone:
  • Enabled per subnet
  • Prevents the need for public IPs on internal workloads
  • Common in regulated environments
gcloud compute networks subnets create private-app-subnet \ 
    --network=prod-vpc \ 
    --region=us-east1 \ 
    --range=10.0.1.0/24 \ 
    --enable-private-ip-google-access

3. Routing Inside a VPC

Every VPC has a routing table that tells Andromeda where to send packets.

3.1 System Routes

GCP automatically creates system routes such as:
  • Subnet routes: one for each subnet, e.g. 10.0.1.0/24 via subnet-us-east1
  • Default Internet route: 0.0.0.0/0 via the Internet gateway (for resources with public IP)
You rarely need to think about these for basic deployments.

3.2 Custom Static Routes

For more advanced designs, you add custom static routes, e.g.:
  • Send 10.20.0.0/16 to an on‑prem VPN
  • Send 192.168.100.0/24 to an appliance VM
gcloud compute routes create onprem-route \ 
    --network=prod-vpc \ 
    --destination-range=10.20.0.0/16 \ 
    --next-hop-vpn-tunnel=onprem-vpn-tunnel \ 
    --next-hop-vpn-tunnel-region=us-east1

3.3 Longest Prefix Match (LPM)

When multiple routes match a packet:
  1. The most specific route (largest prefix, e.g. /24 vs /16) wins
  2. If equal, other preferences (such as priority) apply
Example:
  • Route 1: 10.0.0.0/16 → on‑prem
  • Route 2: 10.0.1.0/24 → internal service
Traffic to 10.0.1.10 uses Route 2 because /24 is more specific than /16.

4. Firewall Rules: The Real Guardrails

In GCP, the primary security control is the VPC firewall, not an appliance.

4.1 Firewall Basics

Each VPC has stateful firewall rules that apply to traffic:
  • Ingress rules – control incoming traffic to a VM
  • Egress rules – control outbound traffic from a VM
Every rule has:
  • Direction: ingress / egress
  • Action: allow / deny (default is deny)
  • Priority (lower number = evaluated first)
  • Match criteria (source/destination ranges, protocols, ports, targets)

4.2 Targeting by Tags vs Service Accounts

You can attach firewall rules to instances by:
  1. Network tags (string labels on VMs)
  2. Service accounts (identity‑based targeting)
Tags example:
gcloud compute firewall-rules create allow-web-ingress \ 
    --network=prod-vpc \ 
    --direction=INGRESS \ 
    --priority=1000 \ 
    --action=ALLOW \ 
    --rules=tcp:80,tcp:443 \ 
    --source-ranges=0.0.0.0/0 \ 
    --target-tags=web-server
Service account–based example:
gcloud compute firewall-rules create allow-web-sa \ 
    --network=prod-vpc \ 
    --direction=INGRESS \ 
    --priority=1000 \ 
    --action=ALLOW \ 
    --rules=tcp:80,tcp:443 \ 
    --source-ranges=0.0.0.0/0 \ 
    --target-service-accounts=web-server-sa@PROJECT_ID.iam.gserviceaccount.com
SRE Tip: Prefer service accounts over tags for critical services. Tags are easy to mis‑apply; service accounts tie security to identity.

4.3 Common Firewall Patterns

  • Deny by default: Use default deny, then open only required ports
  • Tiered access: web tier accessible from Internet; app tier only from web tier; db tier only from app tier
  • SSH/Jumphost: Only a bastion host accepts SSH from the Internet; all other VMs accept SSH only from the bastion’s internal IP

5. Enterprise Connectivity: Shared VPC vs VPC Peering

As your company grows, you’ll have many projects and teams. You need a way to connect them without creating a mess of ad‑hoc tunnels.

5.1 Shared VPC: The Enterprise Standard

Shared VPC is the architectural “hub-and-spoke” model for Google Cloud. It allows a central network team to manage resources while allowing application teams to consume them.

The Relationship

  • Host Project: The “Owner” of the network. It contains the VPC, subnets, firewall rules, and hybrid connectivity (VPN/Interconnect).
  • Service Projects: Attached to the Host Project. They “borrow” subnets from the host to deploy VMs, GKE clusters, or Cloud SQL instances.

Administrative Delegation (Least Privilege)

A Principal Engineer uses IAM roles to maintain separation of duties:
  1. Compute Network Admin: In the Host Project. Can modify the network itself.
  2. Compute Network User: Granted to service project users/service accounts on a per-subnet basis. This allows them to use the subnet without being able to change firewall rules or IP ranges.
SRE Tip: Use Shared VPC to ensure all egress traffic from multiple projects flows through a single set of Cloud NATs or Firewall Appliances for centralized inspection.

5.2 VPC Network Peering (Decentralized)

Concept:
  • Two independent VPCs establish a peering relationship.
  • They exchange routes and can reach each other using private IPs.
Important constraints:
  • Not transitive: if A peers with B, and B peers with C, A cannot reach C unless A–C peering exists.
  • Peering has quotas (e.g., 25 peerings per VPC).

5.3 Choosing Between Shared VPC and Peering

Shared VPC:
  • Centralized networking team.
  • Strong need for governance and control.
  • Many small projects requiring consistent network policy.
Peering:
  • Separate organizations or business units.
  • Need to connect networks without merging ownership.
  • Limited number of connections.

6. Private Service Connect (PSC): The Modern Bridge

While Shared VPC and Peering connect networks, Private Service Connect (PSC) connects services. It is the evolution beyond VPC Peering for service consumption.

6.1 The Problem with VPC Peering

VPC Peering has significant limitations in large enterprises:
  • IP Exhaustion: Both networks must have non-overlapping IP ranges.
  • Transitivity: Peering is not transitive (A -> B -> C doesn’t mean A -> C).
  • Security: Peering exposes the entire network to the peer.

6.2 How PSC Works: Endpoint-Based Consumption

PSC allows you to reach a service (like a Google API or a third-party managed database) using a Private IP address in your own subnet.
  1. Service Producer: Publishes a service via a Service Attachment.
  2. Service Consumer: Creates a PSC Endpoint (a private IP) in their VPC.
  3. The Magic: Andromeda maps that IP directly to the producer’s load balancer. No IP overlaps are required, as traffic undergoes NAT at the PSC boundary.

6.3 PSC for Google APIs

Instead of using Private Google Access, you can create a specific PSC Endpoint for Google APIs (e.g., 10.0.0.100 maps to storage.googleapis.com).
  • Control: You can apply firewall rules to the PSC endpoint IP.
  • On-Prem Access: Your on-prem servers can reach Google APIs by simply routing to that private IP over VPN/Interconnect.

7. Hybrid Connectivity: Bridging On‑Prem and Cloud

Most real‑world deployments are hybrid: some systems remain on‑premises, others move to GCP.

6.1 Cloud VPN (HA VPN)

Cloud VPN securely connects your on‑premises network to your VPC over the Internet.
  • Speed: Typically 1.5–3 Gbps per tunnel
  • Setup time: Minutes
  • Availability: 99.99% when using HA VPN (two tunnels in different zones using BGP)
Use cases:
  • Quick connectivity for POCs and small workloads
  • Backup path for Interconnect
  • Cost‑effective connectivity for moderate bandwidth

6.2 Cloud Interconnect (The Fast Lane)

Cloud Interconnect provides private, dedicated connections:
  • Dedicated Interconnect:
    • Physical fiber connection to Google edge
    • Capacities: 10 Gbps, 100 Gbps ports
    • Used by large enterprises and latency‑sensitive workloads
  • Partner Interconnect:
    • Connect to Google via a service provider (e.g., Equinix, Megaport)
    • Capacities from 50 Mbps up to 10 Gbps
    • Faster to provision than Dedicated Interconnect

6.3 Cloud Router & BGP Deep Dive

Cloud Router is a control‑plane service that:
  • Exchanges routes using BGP with your on‑prem router
  • Does not forward data packets (Andromeda does that)
  • Automatically updates routes when networks change

BGP Communities (Traffic Scope)

You can use BGP communities to influence how Google advertises your routes:
  • 15169:10001 – Advertise to local region only
  • 15169:10002 – Advertise to local continent
  • 15169:10003 – Advertise globally (default)

Route Selection (Simplified)

When choosing between multiple paths:
  1. Longest Prefix Match – most specific prefix wins
  2. Priority/Cost – lower numeric priority value wins
Example priorities:
  • VPN: priority 1000
  • Interconnect: priority 100
With both VPN and Interconnect for 10.20.0.0/16, Interconnect is preferred because 100 < 1000. VPN can act as automatic backup if the Interconnect fails.

8. Cloud NAT: Secure Outbound Access

If your VMs do not have public IP addresses, how do they download software updates or reach external APIs? The answer is Cloud NAT.

7.1 What Cloud NAT Does

  • Enables outbound Internet access from private IPs
  • Does not allow unsolicited inbound connections
  • Scales automatically; no single VM acts as a choke‑point

7.2 Cloud NAT Configuration Pattern

# 1. Create a custom VPC
gcloud compute networks create prod-vpc --subnet-mode=custom

# 2. Create a private subnet with Private Google Access
gcloud compute networks subnets create private-app-subnet \ 
    --network=prod-vpc \ 
    --region=us-east1 \ 
    --range=10.0.1.0/24 \ 
    --enable-private-ip-google-access

# 3. Create a Cloud Router
gcloud compute routers create nat-router \ 
    --network=prod-vpc \ 
    --region=us-east1

# 4. Attach Cloud NAT configuration to the router
gcloud compute routers nats create nat-config \ 
    --router=nat-router \ 
    --region=us-east1 \ 
    --auto-allocate-nat-external-ips \ 
    --nat-all-subnet-ip-ranges
Result:
  • VMs in private-app-subnet have no public IP addresses
  • They can still reach package repositories, external APIs, etc.
  • Inbound Internet traffic is blocked by design

9. Lab: Architecting a Multi‑Tier Secure Network

In this lab, you will build a production‑grade network topology using:
  • Custom VPC
  • Private subnets
  • Cloud NAT
  • Firewall rules for tier isolation

9.1 Architecture Overview

We will create:
  • VPC: prod-vpc
  • Subnets:
    • web-subnet (public‑facing, limited public IPs)
    • app-subnet (private)
    • db-subnet (private, most restricted)
  • Cloud NAT for app-subnet and db-subnet
  • Firewall rules enforcing web → app → db traffic flow

9.2 Implementation Steps

# Create VPC
gcloud compute networks create prod-vpc --subnet-mode=custom

# Create subnets
gcloud compute networks subnets create web-subnet \ 
    --network=prod-vpc \ 
    --region=us-east1 \ 
    --range=10.0.0.0/24

gcloud compute networks subnets create app-subnet \ 
    --network=prod-vpc \ 
    --region=us-east1 \ 
    --range=10.0.1.0/24 \ 
    --enable-private-ip-google-access

gcloud compute networks subnets create db-subnet \ 
    --network=prod-vpc \ 
    --region=us-east1 \ 
    --range=10.0.2.0/24 \ 
    --enable-private-ip-google-access

# Create Cloud Router and NAT
gcloud compute routers create nat-router \ 
    --network=prod-vpc \ 
    --region=us-east1

gcloud compute routers nats create nat-config \ 
    --router=nat-router \ 
    --region=us-east1 \ 
    --auto-allocate-nat-external-ips \ 
    --nat-all-subnet-ip-ranges

# Firewall: allow internal traffic within VPC
gcloud compute firewall-rules create allow-internal \ 
    --network=prod-vpc \ 
    --allow=tcp,udp,icmp \ 
    --source-ranges=10.0.0.0/16

# Firewall: allow HTTP/HTTPS to web tier from Internet
gcloud compute firewall-rules create allow-web-from-internet \ 
    --network=prod-vpc \ 
    --direction=INGRESS \ 
    --priority=1000 \ 
    --action=ALLOW \ 
    --rules=tcp:80,tcp:443 \ 
    --source-ranges=0.0.0.0/0 \ 
    --target-tags=web-tier

# Firewall: only allow app tier from web subnet
gcloud compute firewall-rules create allow-web-to-app \ 
    --network=prod-vpc \ 
    --direction=INGRESS \ 
    --priority=1000 \ 
    --action=ALLOW \ 
    --rules=tcp:8080 \ 
    --source-ranges=10.0.0.0/24 \ 
    --target-tags=app-tier

# Firewall: only allow db tier from app subnet
gcloud compute firewall-rules create allow-app-to-db \ 
    --network=prod-vpc \ 
    --direction=INGRESS \ 
    --priority=1000 \ 
    --action=ALLOW \ 
    --rules=tcp:5432 \ 
    --source-ranges=10.0.1.0/24 \ 
    --target-tags=db-tier

9.3 Verification

  1. Deploy simple VMs in each subnet (using appropriate network tags)
  2. Verify:
    • Internet → web: HTTP/HTTPS allowed
    • web → app: port 8080 allowed
    • app → db: port 5432 allowed
    • db → Internet: outbound allowed via NAT (e.g. OS updates)
    • Internet → db: blocked

10. Security at the Edge: Cloud Armor and VPC-SC

While firewalls protect the VM, Cloud Armor and VPC Service Controls protect the perimeter.

10.1 Cloud Armor (The WAF)

Cloud Armor is Google’s Distributed Denial of Service (DDoS) and Web Application Firewall (WAF).
  • L7 Protection: Blocks SQL Injection (SQLi), Cross-Site Scripting (XSS), and other OWASP Top 10 risks.
  • Adaptive Protection: Uses machine learning to detect anomalous traffic patterns and suggest firewall rules.
  • Pre-configured Rules: Ready-to-use rules for common application stacks (WordPress, Drupal, etc.).

10.2 VPC Service Controls (VPC-SC) Deep Dive

VPC Service Controls (VPC-SC) let you draw a virtual security perimeter around sensitive services (Cloud Storage, BigQuery, etc.). This is a critical enterprise security feature that prevents data exfiltration.

10.2.1 How it Works: The Perimeter

When a service is protected by a perimeter:
  1. Identity is not enough: Even if an attacker has your credentials, they cannot access the data if the request comes from outside the perimeter.
  2. Data Exfiltration Protection: A compromised VM inside the perimeter cannot copy data to a Cloud Storage bucket outside the perimeter.
  3. Context-Aware Access: You can allow access from specific IP ranges or only from devices that meet security requirements (e.g., encrypted, screen-locked) via Access Context Manager.

10.2.2 Perimeter Types

  • Service Perimeter: The standard boundary.
  • Perimeter Bridge: Allows services in different perimeters to communicate (e.g., Project A in Perimeter 1 needs to read a BigQuery table in Project B in Perimeter 2).
  • Dry-Run Mode: Essential for production. It logs what would have been blocked without actually stopping traffic, allowing you to debug policies before enforcement.

10.2.3 Service Perimeter Configuration (Terraform Example)

resource "google_access_context_manager_service_perimeter" "service-perimeter" {
  parent = "accessPolicies/${var.policy_id}"
  name   = "accessPolicies/${var.policy_id}/servicePerimeters/restrict_storage"
  title  = "restrict_storage"
  status {
    restricted_services = ["storage.googleapis.com"]
    resources           = ["projects/${var.project_number}"]
    
    access_levels = [
      google_access_context_manager_access_level.corp_ips.name
    ]

    vpc_accessible_services {
      enable_restriction = true
      allowed_services   = ["RESTRICTED-SERVICES"]
    }
  }
}

11. Advanced Hybrid Connectivity: BGP and Routing Math

11.1 BGP Community Values

GCP uses BGP communities to give you control over how your routes are prioritized and where they are advertised.
CommunityMeaning
15169:10001Advertise to Local Region only.
15169:10002Advertise to Local Continent.
15169:10003Advertise Globally (Default).

11.2 Influencing Inbound Traffic (MED and AS-Path)

When you have two Interconnects (Primary and Backup), how do you ensure Google sends traffic to the Primary?
  1. MED (Multi-Exit Discriminator): A lower MED value is preferred. Set Primary to 100 and Backup to 200.
  2. AS-Path Prepending: Make the backup path look longer by repeating your AS number multiple times.

11.3 Cloud Router Route Priority Math

Andromeda calculates the base cost of a route. You add a Priority (0-65535).
  • Formula: Total Cost = Base Cost + User Priority
  • Rule: Lower Total Cost wins.
  • SRE Tip: Always set your VPN priority higher (e.g., 1000) than your Interconnect priority (e.g., 100) to ensure the VPN is only used as a failover.

12. Network Intelligence Center: The SRE’s Radar

12.1 Connectivity Tests

A static analysis tool that tells you why a packet is being dropped. It simulates the packet path through firewalls, routes, and peering.
  • Scenario: “Why can’t VM A talk to Cloud SQL?”
  • Tool: Runs a trace and identifies that Firewall Rule 105 is blocking ingress.

12.2 Performance Dashboard

Provides real-time and historical latency/packet loss metrics for:
  • Google-to-Google (Inter-region)
  • Google-to-Internet
  • Google-to-On-prem (via Interconnect)

12.3 Firewall Insights

Identifies shadowed rules (rules that never get hit because a higher priority rule matches first) and overly permissive rules.

13. Private Service Connect (PSC) Deep Dive

PSC is the successor to VPC Peering for service consumption.

13.1 Service Producer vs Consumer

  • Producer: A service provider (e.g., Snowflake, or your own internal Shared Services team) creates a Service Attachment.
  • Consumer: An application team creates a PSC Endpoint (a private IP) in their own VPC.

13.2 Why PSC is Better than Peering

  1. No IP Overlaps: Both networks can use 10.0.0.0/24. PSC uses NAT to translate the traffic.
  2. Uni-directional: The producer cannot initiate traffic into the consumer’s network.
  3. Transitive: Unlike peering, PSC endpoints can be reached across VPC peering links.

14. Lab: Implementing a Hub-and-Spoke with Central Firewall

14.1 Architecture

  • Hub Project: Contains a Central Firewall VM (Palo Alto/Fortinet) and the Hybrid Connectivity.
  • Spoke Projects: Attached to the Hub via Shared VPC.
  • Routing: All traffic from Spokes to the Internet or On-prem must be routed through the Hub’s Firewall VM.

14.2 Implementation (Route-Based Redirection)

# In the Hub Project, create a route for 0.0.0.0/0 pointing to the Firewall VM's internal IP
gcloud compute routes create centralized-egress \
    --network=hub-vpc \
    --destination-range=0.0.0.0/0 \
    --next-hop-instance=firewall-vm \
    --next-hop-instance-zone=us-central1-a \
    --priority=900

12. Interview Preparation

Answer: The primary difference is scope. In GCP, a VPC is a global resource, not regional.
  • Global Reach: A single VPC can span all Google regions worldwide.
  • Subnets: Subnets are regional. You can have a subnet in us-east1 and another in asia-east1 within the same VPC.
  • Internal Routing: VMs in different regions can communicate using internal IP addresses over Google’s private backbone (B4) without needing VPNs or peering.
  • Simplicity: This simplifies global application architecture, as you don’t need to manage complex transit gateways or multiple peerings for global connectivity.
Answer:
  • Shared VPC: Choose this for centralized control. It allows a central host project to manage the network (IPs, firewalls) while service projects deploy resources into subnets. Best for large enterprises with a dedicated network team.
  • VPC Peering: Choose this for decentralized or cross-org connectivity. It connects two independent VPCs (even in different organizations). Best for connecting third-party services or when teams need full autonomy over their own network settings.
Key Difference: Shared VPC follows a “Hub-and-Spoke” model with administrative delegation, whereas Peering is a peer-to-peer relationship between equals.
Answer: Private Google Access allows VMs that only have internal IP addresses to reach the public APIs of Google services (like Cloud Storage, BigQuery, or Pub/Sub).
  • Mechanism: It routes traffic over Google’s internal network to the service endpoints rather than exiting to the public internet.
  • Security: It allows you to keep your VMs completely isolated from the internet (no public IPs) while still being able to use managed cloud services.
  • Requirement: It must be enabled at the subnet level.
Answer: Andromeda implements firewall rules at the vNIC (Virtual Network Interface Card) level of each VM.
  • Distributed Enforcement: Rules are enforced on the host machine where the VM runs, before the packet even hits the physical wire.
  • No Bottlenecks: Because enforcement is distributed across all hosts, there is no central “choke point” or firewall appliance that can become a performance bottleneck.
  • Identity-Aware: GCLB firewalls can use Network Tags or Service Accounts rather than just IP ranges, making security policies dynamic and application-centric.
Answer: No, it does not use the public internet.
  1. The packet is identified as internal VPC traffic by the Andromeda SDN.
  2. Andromeda encapsulates the packet and routes it over Google’s B4 global backbone (private fiber).
  3. The packet travels directly between data centers.
  4. It is decapsulated at the destination host and delivered to the target VM.
Interview Insight: This provides lower latency, higher security, and better predictable performance compared to routing over the public internet.