Skip to main content

Chapter 10: Kubernetes at Scale - Google Kubernetes Engine (GKE)

Google is the birthplace of Kubernetes. Born from Project Borg, Google’s internal container orchestrator, Kubernetes was donated to the CNCF and became the world’s standard. Google Kubernetes Engine (GKE) is the most mature, automated, and integrated managed Kubernetes service in the world.

1. GKE Architecture: The Foundation

Control Plane: Zonal vs. Regional

  • Zonal Clusters: A single control plane in one zone. If the zone goes down, the control plane is inaccessible (though your nodes keep running). SLA: 99.5%.
  • Regional Clusters (Production Standard): Three control planes distributed across three zones in a region. This ensures your API server is always available, even during a zonal outage or Google-initiated upgrades. SLA: 99.95%.

Release Channels

GKE offers three release channels to balance stability and features:
  • Rapid: For early adopters and testing.
  • Regular (Default): A balance of stability and new features.
  • Stable: For mission-critical production workloads.

2. Operation Modes: Autopilot vs. Standard

Choosing between Autopilot and Standard is a choice between Operational Simplicity and Total Control.

2.1 GKE Autopilot (The SRE’s Dream)

In Autopilot mode, Google manages the entire cluster infrastructure, including node provisioning, scaling, and security hardening.
  • The “Pod-Only” Contract: You define your pods; Google ensures they have a place to run.
  • Security by Default: Enforces the GKE Hardening Guide (e.g., no privileged containers, mandatory NET_RAW removal).
  • Billing: You are billed per-pod (CPU, RAM, and ephemeral storage requested). You pay for what you use, not the idle space on nodes.
  • Ideal For: Teams that want to focus on code rather than Kubernetes cluster maintenance.

2.2 GKE Standard (The Architect’s Choice)

In Standard mode, you manage the node pools (GCE Managed Instance Groups).
  • Full Control: You can customize kernel parameters, use privileged containers, and install custom drivers.
  • Hardware Flexibility: Required for GPUs, TPUs, Local SSDs, or Sole-Tenant nodes.
  • Bin-Packing Efficiency: If you are an expert at optimizing pod density, you can often achieve lower costs than Autopilot by manually managing large node pools.
  • Billing: You pay for the underlying Compute Engine VMs.

2.3 Decision Matrix: Principal’s Guide

RequirementAutopilotStandard
Operational OverheadLow (Google manages nodes)High (You manage node pools)
Custom Kernel ModulesNoYes
GPU / Machine LearningYes (Select regions)Yes (Full control)
Windows ContainersNoYes
Privileged ContainersNo (Security risk)Yes
Cost ModelPay-per-Pod (Predictable)Pay-per-Node (Optimization needed)

3. GKE Networking: Andromeda, PSC, and Multi-Cluster

3.1 VPC-Native Clusters (Alias IP)

Modern GKE clusters use VPC-Native networking. This is the foundation for all high-performance Kubernetes networking in GCP.
  • The Alias IP Mechanism: Every pod is assigned an IP from a secondary range in the VPC subnet. Unlike overlay networks (flannel, calico-vxlan), there is no packet encapsulation overhead.
  • Andromeda Integration: The VPC-Native pod IPs are “known” to the underlying Andromeda SDN. This allows for direct routing at the hardware level, bypassing the host kernel’s bridge for most traffic.
  • Connectivity: Pods can reach any other VPC resource (Cloud SQL, VMs) without NAT.

3.2 Private Service Connect (PSC) for GKE

PSC allows you to expose GKE services to other VPCs or projects privately, without VPC Peering or VPNs.
  • Service Attachments: You create a Service Attachment in the GKE project.
  • Endpoints: Consuming projects create a PSC Endpoint (an internal IP) that routes traffic directly to your GKE Internal Load Balancer.

3.3 Multi-Cluster Ingress (MCI) and Gateway

For global applications, MCI uses a single Global External HTTP(S) Load Balancer to route traffic to multiple clusters in different regions.
  • ClusterSet: A logical grouping of clusters.
  • MCI Controller: A managed service that synchronizes MultiClusterIngress and MultiClusterService resources across the set.

4. Security: The Defense-in-Depth Model

4.1 Workload Identity (The Principal Standard)

Already covered in Section 4.1, but here is the architectural why: Without Workload Identity, you would use JSON keys stored as K8s Secrets. These are “static” and “unmanaged.” Workload Identity provides short-lived tokens, eliminating the risk of key theft.

4.2 Binary Authorization

Binary Authorization is a deploy-time security control that ensures only trusted container images are deployed on GKE.
  • Attestations: A “digital signature” created by a CI/CD pipeline (e.g., Cloud Build) after passing security scans.
  • Policy: “Require attestation from ‘Security-Scanner-V1’ for all production deployments.”

4.3 Policy Controller (Config Management)

Based on the Open Policy Agent (OPA) Gatekeeper, Policy Controller lets you enforce “Guardrails” using declarative policies.
  • Example: “Prevent any pod from running with a privileged security context.”
  • Example: “Require all services to have a ‘team-owner’ label.”

4.4 Shielded GKE Nodes

GKE nodes use Shielded VMs to provide:
  • Secure Boot: Ensures only verified software is used during the boot process.
  • Measured Boot: Uses a Virtual Trusted Platform Module (vTPM) to verify the integrity of the node.

5. Storage: Persistent Data in Kubernetes

5.1 Compute Engine Persistent Disk (PD) CSI Driver

The default storage for GKE.
  • Standard/SSD PD: Block storage for databases.
  • Balanced PD: Price/performance sweet spot for general workloads.
  • Regional PD: Synchronously replicated across two zones for High Availability.

5.2 Filestore for GKE

For workloads requiring a Shared File System (NFS), GKE provides the Filestore CSI driver.
  • ReadWriteMany (RWX): Allows multiple pods in different zones to read/write to the same volume.

5.3 Backup for GKE

A fully managed service to protect your GKE stateful workloads.
  • What it Backups: Both Kubernetes manifests (YAMLs) and the actual data in Persistent Disks.
  • Scenario: Accidental deletion of a namespace or a regional disaster.

6. Advanced Scaling and Cost Optimization

6.1 Node Auto-Provisioning (NAP)

While the Cluster Autoscaler adds nodes from existing pools, NAP can create entirely new node pools on the fly.
  • Logic: If a pod requires a specific T2D machine type or a GPU, and no such pool exists, NAP will create one, run the pod, and delete the pool when finished.

6.2 GKE Usage Metering

To solve the “Who is spending what?” problem, usage metering exports granular consumption data (CPU, RAM, Storage, Egress) to BigQuery.
  • Attribution: You can break down costs by Namespace, Label, or Service.

7. Interview Preparation: Architectural Deep Dive

1. Q: What is the primary difference between GKE Autopilot and GKE Standard? A: Autopilot is a fully managed mode where Google manages the nodes, scaling, and security. You pay per-pod. Standard gives you full control over node pools (machine types, GPUs). You pay per-node. Standard is required for custom kernels or specialized hardware like TPUs. 2. Q: How does VPC-Native networking improve performance over Kubenet? A: VPC-Native (Alias IP) assigns VPC IPs directly to pods. This allows the Andromeda SDN to route traffic at the hardware level without packet encapsulation (VXLAN) overhead, reducing latency and increasing throughput for pod-to-pod and pod-to-external communication. 3. Q: Explain the role of Binary Authorization in a secure CI/CD pipeline. A: Binary Authorization ensures that only images that have been signed (attested) by authorized entities (like a vulnerability scanner) can be deployed. It acts as a final gate in the production environment to prevent the execution of untrusted or unverified code. 4. Q: Why use Regional Persistent Disks in GKE? A: Regional PDs synchronously replicate data across two zones in a region. If a zone fails, Kubernetes can quickly re-attach the volume to a node in the second zone without data loss, providing a lower RTO for stateful applications like databases. 5. Q: What is the benefit of the GKE Gateway API over Ingress? A: The Gateway API is more expressive and role-based. It separates the infrastructure concerns (GatewayClass/Gateway) from the application routing (HTTPRoute), enabling better collaboration between SREs and developers and supporting advanced features like cross-namespace routing and multi-cluster traffic management.

Implementation: The “Enterprise Grade” GKE Lab

In this lab, we will build a production-ready GKE Standard cluster using Terraform, including VPC-Native networking, Workload Identity, and a sample application deployment.

Step 1: Terraform Infrastructure

Create a file named main.tf:
resource "google_container_cluster" "primary" {
  name     = "prod-cluster"
  location = "us-central1"

  # Enabling VPC-Native Networking
  networking_mode = "VPC_NATIVE"
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  # Enabling Workload Identity
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  # Use a managed node pool
  remove_default_node_pool = true
  initial_node_count       = 1
}

resource "google_container_node_pool" "primary_nodes" {
  name       = "app-pool"
  location   = "us-central1"
  cluster    = google_container_cluster.primary.name
  node_count = 3

  node_config {
    preemptible  = false
    machine_type = "e2-standard-4"

    # Workload Identity Metadata Server
    workload_metadata_config {
      mode = "GKE_METADATA"
    }

    labels = {
      env = "production"
    }
  }
}

Step 2: Deploying a Secure Workload

Apply the Terraform, then connect to the cluster:
gcloud container clusters get-credentials prod-cluster --region us-central1
Create a Kubernetes Service Account and bind it to a GCP Service Account:
# 1. Create GCP Service Account
gcloud iam service-accounts create gke-sa --display-name="GKE App SA"

# 2. Allow KSA to act as GSA
gcloud iam service-accounts add-iam-policy-binding gke-sa@$PROJECT_ID.iam.gserviceaccount.com \
    --role="roles/iam.workloadIdentityUser" \
    --member="serviceAccount:$PROJECT_ID.svc.id.goog[default/my-app-sa]"

# 3. Create Kubernetes SA and annotate
kubectl create serviceaccount my-app-sa
kubectl annotate serviceaccount my-app-sa \
    iam.gke.io/gcp-service-account=gke-sa@$PROJECT_ID.iam.gserviceaccount.com

Step 3: Deploying with Helm

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install my-ingress ingress-nginx/ingress-nginx

Pro-Tip: The “Zero-Drip” Upgrade

When upgrading GKE nodes, use Surge Upgrades. GKE will create new nodes with the updated version before deleting old nodes, ensuring your pods have a destination to move to, maintaining 100% availability during the maintenance window.