Chapter 10: Kubernetes at Scale - Google Kubernetes Engine (GKE)
Google is the birthplace of Kubernetes. Born from Project Borg, Google’s internal container orchestrator, Kubernetes was donated to the CNCF and became the world’s standard. Google Kubernetes Engine (GKE) is the most mature, automated, and integrated managed Kubernetes service in the world.1. GKE Architecture: The Foundation
Control Plane: Zonal vs. Regional
- Zonal Clusters: A single control plane in one zone. If the zone goes down, the control plane is inaccessible (though your nodes keep running). SLA: 99.5%.
- Regional Clusters (Production Standard): Three control planes distributed across three zones in a region. This ensures your API server is always available, even during a zonal outage or Google-initiated upgrades. SLA: 99.95%.
Release Channels
GKE offers three release channels to balance stability and features:- Rapid: For early adopters and testing.
- Regular (Default): A balance of stability and new features.
- Stable: For mission-critical production workloads.
2. Operation Modes: Autopilot vs. Standard
Choosing between Autopilot and Standard is a choice between Operational Simplicity and Total Control.2.1 GKE Autopilot (The SRE’s Dream)
In Autopilot mode, Google manages the entire cluster infrastructure, including node provisioning, scaling, and security hardening.- The “Pod-Only” Contract: You define your pods; Google ensures they have a place to run.
- Security by Default: Enforces the GKE Hardening Guide (e.g., no privileged containers, mandatory
NET_RAWremoval). - Billing: You are billed per-pod (CPU, RAM, and ephemeral storage requested). You pay for what you use, not the idle space on nodes.
- Ideal For: Teams that want to focus on code rather than Kubernetes cluster maintenance.
2.2 GKE Standard (The Architect’s Choice)
In Standard mode, you manage the node pools (GCE Managed Instance Groups).- Full Control: You can customize kernel parameters, use
privilegedcontainers, and install custom drivers. - Hardware Flexibility: Required for GPUs, TPUs, Local SSDs, or Sole-Tenant nodes.
- Bin-Packing Efficiency: If you are an expert at optimizing pod density, you can often achieve lower costs than Autopilot by manually managing large node pools.
- Billing: You pay for the underlying Compute Engine VMs.
2.3 Decision Matrix: Principal’s Guide
| Requirement | Autopilot | Standard |
|---|---|---|
| Operational Overhead | Low (Google manages nodes) | High (You manage node pools) |
| Custom Kernel Modules | No | Yes |
| GPU / Machine Learning | Yes (Select regions) | Yes (Full control) |
| Windows Containers | No | Yes |
| Privileged Containers | No (Security risk) | Yes |
| Cost Model | Pay-per-Pod (Predictable) | Pay-per-Node (Optimization needed) |
3. GKE Networking: Andromeda, PSC, and Multi-Cluster
3.1 VPC-Native Clusters (Alias IP)
Modern GKE clusters use VPC-Native networking. This is the foundation for all high-performance Kubernetes networking in GCP.- The Alias IP Mechanism: Every pod is assigned an IP from a secondary range in the VPC subnet. Unlike overlay networks (flannel, calico-vxlan), there is no packet encapsulation overhead.
- Andromeda Integration: The VPC-Native pod IPs are “known” to the underlying Andromeda SDN. This allows for direct routing at the hardware level, bypassing the host kernel’s bridge for most traffic.
- Connectivity: Pods can reach any other VPC resource (Cloud SQL, VMs) without NAT.
3.2 Private Service Connect (PSC) for GKE
PSC allows you to expose GKE services to other VPCs or projects privately, without VPC Peering or VPNs.- Service Attachments: You create a Service Attachment in the GKE project.
- Endpoints: Consuming projects create a PSC Endpoint (an internal IP) that routes traffic directly to your GKE Internal Load Balancer.
3.3 Multi-Cluster Ingress (MCI) and Gateway
For global applications, MCI uses a single Global External HTTP(S) Load Balancer to route traffic to multiple clusters in different regions.- ClusterSet: A logical grouping of clusters.
- MCI Controller: A managed service that synchronizes
MultiClusterIngressandMultiClusterServiceresources across the set.
4. Security: The Defense-in-Depth Model
4.1 Workload Identity (The Principal Standard)
Already covered in Section 4.1, but here is the architectural why: Without Workload Identity, you would use JSON keys stored as K8s Secrets. These are “static” and “unmanaged.” Workload Identity provides short-lived tokens, eliminating the risk of key theft.4.2 Binary Authorization
Binary Authorization is a deploy-time security control that ensures only trusted container images are deployed on GKE.- Attestations: A “digital signature” created by a CI/CD pipeline (e.g., Cloud Build) after passing security scans.
- Policy: “Require attestation from ‘Security-Scanner-V1’ for all production deployments.”
4.3 Policy Controller (Config Management)
Based on the Open Policy Agent (OPA) Gatekeeper, Policy Controller lets you enforce “Guardrails” using declarative policies.- Example: “Prevent any pod from running with a privileged security context.”
- Example: “Require all services to have a ‘team-owner’ label.”
4.4 Shielded GKE Nodes
GKE nodes use Shielded VMs to provide:- Secure Boot: Ensures only verified software is used during the boot process.
- Measured Boot: Uses a Virtual Trusted Platform Module (vTPM) to verify the integrity of the node.
5. Storage: Persistent Data in Kubernetes
5.1 Compute Engine Persistent Disk (PD) CSI Driver
The default storage for GKE.- Standard/SSD PD: Block storage for databases.
- Balanced PD: Price/performance sweet spot for general workloads.
- Regional PD: Synchronously replicated across two zones for High Availability.
5.2 Filestore for GKE
For workloads requiring a Shared File System (NFS), GKE provides the Filestore CSI driver.- ReadWriteMany (RWX): Allows multiple pods in different zones to read/write to the same volume.
5.3 Backup for GKE
A fully managed service to protect your GKE stateful workloads.- What it Backups: Both Kubernetes manifests (YAMLs) and the actual data in Persistent Disks.
- Scenario: Accidental deletion of a namespace or a regional disaster.
6. Advanced Scaling and Cost Optimization
6.1 Node Auto-Provisioning (NAP)
While the Cluster Autoscaler adds nodes from existing pools, NAP can create entirely new node pools on the fly.- Logic: If a pod requires a specific T2D machine type or a GPU, and no such pool exists, NAP will create one, run the pod, and delete the pool when finished.
6.2 GKE Usage Metering
To solve the “Who is spending what?” problem, usage metering exports granular consumption data (CPU, RAM, Storage, Egress) to BigQuery.- Attribution: You can break down costs by Namespace, Label, or Service.
7. Interview Preparation: Architectural Deep Dive
1. Q: What is the primary difference between GKE Autopilot and GKE Standard? A: Autopilot is a fully managed mode where Google manages the nodes, scaling, and security. You pay per-pod. Standard gives you full control over node pools (machine types, GPUs). You pay per-node. Standard is required for custom kernels or specialized hardware like TPUs. 2. Q: How does VPC-Native networking improve performance over Kubenet? A: VPC-Native (Alias IP) assigns VPC IPs directly to pods. This allows the Andromeda SDN to route traffic at the hardware level without packet encapsulation (VXLAN) overhead, reducing latency and increasing throughput for pod-to-pod and pod-to-external communication. 3. Q: Explain the role of Binary Authorization in a secure CI/CD pipeline. A: Binary Authorization ensures that only images that have been signed (attested) by authorized entities (like a vulnerability scanner) can be deployed. It acts as a final gate in the production environment to prevent the execution of untrusted or unverified code. 4. Q: Why use Regional Persistent Disks in GKE? A: Regional PDs synchronously replicate data across two zones in a region. If a zone fails, Kubernetes can quickly re-attach the volume to a node in the second zone without data loss, providing a lower RTO for stateful applications like databases. 5. Q: What is the benefit of the GKE Gateway API over Ingress? A: The Gateway API is more expressive and role-based. It separates the infrastructure concerns (GatewayClass/Gateway) from the application routing (HTTPRoute), enabling better collaboration between SREs and developers and supporting advanced features like cross-namespace routing and multi-cluster traffic management.Implementation: The “Enterprise Grade” GKE Lab
In this lab, we will build a production-ready GKE Standard cluster using Terraform, including VPC-Native networking, Workload Identity, and a sample application deployment.Step 1: Terraform Infrastructure
Create a file namedmain.tf: