Chapter 5: Virtual Machines at Scale - Compute Engine
Compute Engine (GCE) is Google Cloud’s Infrastructure-as-a-Service (IaaS) flagship. While it looks like standard VMs on the surface, its architecture is a marvel of custom hardware and software integration. From the Titan security chip to the Colossus file system, GCE is designed for consistency, security, and performance at massive scale.1. Under the Hood: The GCE Infrastructure
The Titan Security Chip
Every physical server in Google’s data centers contains a Titan chip—a custom hardware root of trust.- Secure Boot: Titan verifies the integrity of the BIOS and OS bootloaders before the machine is allowed to join the network.
- Identity: It provides a unique, cryptographically verifiable identity to the hardware, preventing “insider threat” hardware tampering.
Virtualization: KVM and the “Hypervisor”
Google uses a heavily modified version of KVM (Kernel-based Virtual Machine).- No Overcommit: Unlike some cloud providers, GCP does not overcommit CPU resources on standard machine types. If you buy 4 vCPUs, you get 4 physical threads of execution.
- Live Migration: This is GCE’s “killer feature.” When Google needs to perform hardware maintenance or update a host kernel, it moves your running VM to a new host without a reboot or noticeable downtime.
1.3 The Life of a Packet (Andromeda SDN)
When a VM sends a packet, it doesn’t just “go out.” It undergoes a complex transformation:- vNIC Interception: The packet is intercepted by the virtual NIC.
- Andromeda Encapsulation: Google’s SDN, Andromeda, encapsulates the packet (usually in a custom GRE or VXLAN-like header).
- Flow Programming: Andromeda checks if this flow is known. If not, it consults the central controller to program the host’s Open vSwitch (OVS).
- Hardware Offload: On modern instances, this encapsulation is offloaded to custom ASICs, ensuring that the host CPU isn’t wasted on networking overhead.
2. Machine Families: The Right Tool for the Job
GCP categorizes VMs into families, each optimized for specific hardware footprints.2.1 Custom Machine Types
A unique GCP feature: if a predefined machine (liken2-standard-4) doesn’t fit your needs, you can create a Custom Machine Type.
- The Rule: You must follow the supported CPU-to-Memory ratios (typically 1 vCPU to 1GB–6.5GB of RAM).
- SRE Tip: Use custom machines to match your app’s exact profile (e.g., an app that needs 10 vCPUs but only 12GB of RAM). This saves ~15% compared to paying for the unused RAM in a larger standard instance.
2.2 General Purpose (Balanced)
- E2: Cost-optimized, uses dynamic resource scheduling. Good for small apps and dev environments. No local SSD support.
- N2 / N2D: The workhorses. N2 uses Intel (Ice Lake), N2D uses AMD (EPYC). Best for web servers, enterprise apps, and databases.
- Tau T2D: Google’s best price-performance for scale-out workloads. Uses AMD EPYC processors.
Tau T2D: The Price-Performance Leader
Tau VMs are specifically designed for “scale-out” workloads (web servers, microservices, media transcoding). Performance Benchmark (Estimated):| Metric | N2 (Standard) | Tau T2D | % Difference |
|---|---|---|---|
| Price per vCPU/hr | $0.048 | $0.038 | ~20% Cheaper |
| SPECrate®2017_int | 100 | 142 | ~40% Higher |
| Overall Value | Baseline | +60% | Massive |
2.3 Compute Optimized (High Frequency)
- C2 / C2D: Optimized for single-threaded performance and memory bandwidth. Features high clock speeds (up to 3.8+ GHz).
- C3: The first machine type powered by Intel Sapphire Rapids and Google’s custom IPU (Infrastructure Processing Unit). This offloads storage and networking entirely, leaving the CPU 100% available for your code.
2.4 Accelerator Optimized (GPU & TPU)
- A2 / A3: Designed specifically for NVIDIA A100 and H100 GPUs.
- TPU v4/v5: Google’s custom AI silicon.
- ICI (Inter-Core Interconnect): TPU pods are connected by a dedicated high-speed 2D/3D torus network, bypassing the standard data center network to achieve ultra-low latency for model weight synchronization.
3. Advanced VM Management: Spot VMs and Preemption
3.1 Spot VMs (The 91% Discount)
Spot VMs are excess Google capacity offered at a massive discount (60-91%).- The Catch: They can be reclaimed by Google at any time with only a 30-second notice.
- The Strategy: Use them for fault-tolerant, stateless workloads (e.g., batch processing, rendering, stateless web workers).
3.2 Handling the Termination Signal
In a production Spot environment, your app MUST handle the termination signal. Example (Python Termination Handler):4. Special Use Cases: Nested Virtualization and Argos VCU
4.1 Nested Virtualization
GCP allows you to run a hypervisor (like KVM or VMware) inside a GCE VM.- Use Case: Dev/Test environments for specialized software that requires its own kernel modules or proprietary virtualization.
- Requirement: Must use an L1 instance type (Haswell or newer) and enable the nested virtualization flag in the image.
4.2 Argos VCU (Video Compression Units)
Google uses custom ASICs called Argos for YouTube and Google Meet. These are now being exposed to GCE users for massive-scale video transcoding, offering 20x-40x better performance/watt than standard CPUs.3. Deployment Artifacts: Custom Images vs. Machine Images
To achieve fast boot times in a MIG, you must bake your application into a bootable artifact.3.1 Custom Images (The Standard)
A custom image is a disk image that includes the OS, your application, and dependencies.- Scope: Single disk (the boot disk).
- Best For: Instance Templates, MIGs, and sharing base OS builds across the organization.
- Internals: Stored as compressed tarballs in a hidden GCS bucket. When you boot, GCP uses streaming to start the VM before the entire image is even copied to the PD.
3.2 Machine Images (The Comprehensive Backup)
A machine image is a more “complete” artifact than a custom image.- Scope: Captures all disks (boot + data), metadata, labels, and the instance configuration.
- Best For: Cloning entire environments or creating consistent backups of complex multi-disk servers.
- Internals: Uses differential compression. If you create multiple machine images of the same VM, only the changed blocks are stored.
4. Managed Instance Groups (MIGs)
A MIG is a collection of identical VM instances that you control as a single entity. It provides the automation layer for high availability.The Auto-Healing Loop
- Health Check: You define a health check (e.g., “Is port 8080 responding?”).
- Detection: If an instance fails the check, the MIG marks it as unhealthy.
- Recreation: The MIG deletes the unhealthy instance and creates a fresh one from the Instance Template.
Autoscaling Strategies
- CPU Utilization: Scale when average CPU across the group exceeds X%.
- Load Balancing Capacity: Scale based on the number of requests per second reaching the LB.
- Cloud Monitoring Metrics: Scale based on custom metrics (e.g., number of messages in a Pub/Sub queue).
- Predictive Autoscaling: Google uses ML to predict upcoming traffic spikes (based on historical data) and starts scaling before the traffic arrives.
Rolling Updates and Blue/Green
- Max Surge: How many extra instances can be created during an update (e.g.,
maxSurge=3means 3 new VMs are built before deleting old ones). - Max Unavailable: How many instances can be offline during an update.
- Canary Updates: Roll out a new version to only 10% of the group to test for errors before full deployment.
5. Sole-Tenant Nodes: Hardware Isolation
For workloads that require physical isolation (compliance) or specific licensing (BYOL), GCP offers Sole-Tenant Nodes.- Physical Host Reservation: You rent the entire physical server. No “noisy neighbors.”
- Node Groups: Group these hosts and apply placement policies (e.g., ensure two VMs never run on the same physical rack).
- Overcommit (Internal): You can overcommit CPUs on your own sole-tenant nodes to save money if you know your workloads are bursty.
6. Advanced Security: Shielded & Confidential VMs
Shielded VMs
A suite of security features that protect your VMs from boot-level malware (rootkits).- vTPM (Virtual Trusted Platform Module): Stores keys and secrets securely.
- Integrity Monitoring: Alerts you if the boot state of the VM has changed from its known “good” state.
Confidential Computing
Confidential VMs use AMD SEV (Secure Encrypted Virtualization) to encrypt data in use (while it is in RAM).- The Key: The encryption key is generated by the AMD hardware and is never accessible to Google or the host OS.
- Use Case: Processing highly sensitive data (PII, financial records, medical data) where you don’t even trust the cloud provider.
7. Storage Architecture: Persistent Disks (PD)
Persistent Disks are not local drives; they are network-attached storage distributed across the Colossus file system.- Standard PD (pd-standard): HDD-backed, sequential IO.
- Balanced PD (pd-balanced): SSD-backed, best for general enterprise apps.
- Performance SSD (pd-ssd): High IOPS, low latency.
- Extreme PD (pd-extreme): For the most demanding databases (up to 100k IOPS).
- Local SSD: Physically attached to the host. Fast but ephemeral (data is lost if the instance is deleted).
Regional PD: The High Availability King
Regional PDs synchronously replicate data across two zones in the same region. If an entire zone fails, you can force-attach the disk to a VM in the second zone with zero data loss.8. Networking: gVNIC and Tiered Networking
gVNIC (Google Virtual NIC)
A modern device driver designed for high-throughput, low-latency networking. It is required for many high-performance machine types (like C2) to reach 50 Gbps+ throughput.Network Service Tiers
- Premium Tier (Default): Traffic stays on Google’s private global fiber network for as long as possible. Best performance.
- Standard Tier: Traffic exits Google’s network at the nearest PoP and travels over the public internet. Cheaper, but higher latency.
9. Advanced Instance Management: Resource Policies and Schedules
9.1 Snapshot Schedules
Never rely on manual backups. Resource policies allow you to automate data protection.- Schedule: Daily, weekly, or hourly.
- Retention: Define how many days/weeks of snapshots to keep.
- Consistency: Use Application-Consistent Snapshots for databases (requires the guest agent to freeze the file system briefly).
9.2 Instance Schedules (Cost Savings)
For non-production environments, use Instance Schedules to automatically start and stop VMs.- Scenario: Turn off dev servers at 6:00 PM and start them at 8:00 AM on weekdays.
- Benefit: Reduces costs by ~65% for dev/test environments.
9.3 Placement Policies
Control where your VMs are physically located relative to each other:- Spread: VMs are placed on different physical racks (reduces correlated failure risk).
- Compact: VMs are placed as close as possible (reduces latency for HPC/clustered workloads).
10. Implementation: Pro-Level VM Configuration
When creating a production VM, never just “click through” the console. Follow these best practices:- Use Service Accounts: Never use the default “Compute Engine Service Account” with broad permissions. Create a custom SA with the least privilege.
- Enable Shielded VM: It should be the default for all workloads.
- Optimize Boot Time: Use Custom Images with your application pre-installed rather than relying on heavy startup scripts.
- Tag for Firewalls: Use Network Tags to apply firewall rules dynamically (e.g., all VMs with the
web-servertag allow port 80). - Metadata and Labelling: Use labels for cost tracking (e.g.,
env=prod,team=billing).