Chapter 16: Beyond a Single Cloud - Hybrid and Multi-Cloud
Modern enterprise computing is rarely confined to a single cloud provider. Organizations often maintain on-premises data centers for data sovereignty or use multiple clouds to avoid vendor lock-in. Anthos is Google Cloud’s answer to this complexity—a managed platform that extends Google’s services and operational model to any environment.1. Anthos: The Unified Control Plane
Anthos is not a single product; it is a suite of technologies built on Kubernetes, Istio, and Knative. It allows you to manage clusters on GCP, AWS, Azure, and On-Premises (VMware or Bare Metal) from a single dashboard.Connect Gateway
One of the most powerful features of Anthos is the Connect Gateway. It allows you to runkubectl commands or use the GCP Console to manage clusters that are behind firewalls or in private networks, without needing a VPN or complex SSH tunnels.
Anthos Clusters (GKE Multi-Cloud)
Anthos provides a consistent GKE experience everywhere.- Anthos on VMware: Runs GKE clusters on your existing vSphere infrastructure.
- Anthos on Bare Metal: Runs GKE directly on physical Linux servers, eliminating the overhead of a hypervisor—ideal for edge computing and high-performance workloads.
- Anthos on AWS/Azure: Google manages the lifecycle of Kubernetes clusters running on EC2 or Azure VMs.
2. Anthos Service Mesh (ASM)
ASM is a managed service mesh based on Istio. It solves the “microservices mess” by providing security, observability, and traffic control without requiring code changes.- Managed Control Plane: Google manages the Istio control plane (pilot, citadels), so you only worry about the sidecar proxies.
- mTLS by Default: ASM automatically encrypts all service-to-service communication using Mutual TLS (mTLS), ensuring that even if the network is compromised, the data is safe.
- Service Graph: A visual representation of how your services communicate, including latency, error rates, and throughput for every link.
2.1 The ASM Ingress Gateway: TLS Termination
The ASM Ingress Gateway is a standalone Envoy-based proxy that handles traffic entering the mesh. Architectural Flow:- Client Request: User hits the Global Load Balancer (or a static IP).
- TLS Termination: The Ingress Gateway terminates the external TLS using a Kubernetes Secret (containing the cert/key).
- mTLS Origination: The Gateway then starts a new mTLS connection to the backend sidecar proxy inside the mesh.
- Security: Keeps sensitive SSL keys in a restricted namespace (
istio-system). - Offloading: Backend microservices don’t need to manage certificates; they only see mTLS.
- Policy Enforcement: You can apply global Rate Limiting or JWT validation at the Gateway before traffic reaches any pod.
3. Anthos Config Management (ACM)
ACM brings GitOps to the enterprise. It ensures that all your clusters (regardless of where they are) stay in sync with a single source of truth—your Git repository.- Config Sync: Periodically pulls manifests (YAML) from Git and applies them to the fleet.
- Policy Controller: Built on the Open Policy Agent (OPA), it allows you to enforce guardrails. For example, you can prevent any developer from creating a “LoadBalancer” service that has a public IP in a specific namespace.
- Hierarchy Controller: Allows you to create parent-child relationships between namespaces, making it easier to manage permissions and quotas in large multi-tenant clusters.
4. Connecting the Clouds: Interconnect and VPN
To make hybrid cloud work, you need a high-performance “pipe” between your data center and GCP.Dedicated vs. Partner Interconnect
- Dedicated Interconnect: A physical fiber connection between your router and a Google edge PoP. Provides 10 Gbps or 100 Gbps of bandwidth.
- Partner Interconnect: You connect to a supported service provider (like Equinix or AT&T) who already has a high-speed link to Google. Ideal if you are not in a Google peering location.
Cross-Cloud Interconnect
The modern way to do multi-cloud. Google provides direct physical links to AWS and Azure edge locations.- No Internet: Traffic between GCP and AWS never touches the public internet.
- Performance: Sub-10ms latency between clouds, enabling real-time data synchronization.
5. Advanced Anthos Ops: Service Mesh and Policy Controller
5.1 ASM Traffic Management
ASM allows you to decouple traffic routing from deployment.- VirtualService: Defines where the traffic goes (e.g., “Send 90% to v1 and 10% to v2”).
- DestinationRule: Defines how the traffic is handled at the destination (e.g., “Use random load balancing” or “Set a circuit breaker if 5xx errors > 5%”).
- Circuit Breakers: Prevents a single failing service from taking down the entire system by failing fast and allowing the service to recover.
5.2 Policy Controller Guardrails
Policy Controller (built on OPA Gatekeeper) allows you to audit and enforce compliance across your fleet.- Constraints: Declarative rules (e.g., “All namespaces must have a ‘cost-center’ label”).
- Audit Mode: See which resources are currently violating a policy without actually blocking them—ideal for onboarding legacy clusters.
6. Google Cloud VMware Engine (GCVE)
For organizations that want to move to the cloud without containerizing their apps, GCVE is the fastest path.- Native VMware Stack: You get a full VMware environment (vSphere, vCenter, vSAN, NSX-T) running on Google’s bare-metal infrastructure.
- Seamless Migration: Use HCX to “Live Migrate” (vMotion) VMs from your data center to GCP with zero downtime.
6. Interview Preparation: Architectural Deep Dive
1. Q: What is “Anthos” and what problem does it solve for the enterprise? A: Anthos is a Managed Application Platform that provides a consistent operational model across GCP, other clouds (AWS/Azure), and on-premise (VMware/Bare Metal). It solves “Operational Silos” by allowing teams to use the same Kubernetes-based tools (GKE), the same service mesh (Istio), and the same GitOps (Anthos Config Management) regardless of where the hardware actually lives. 2. Q: Explain the “GitOps” workflow as implemented by Anthos Config Management (ACM). A: In ACM, the Git Repository is the single source of truth.- A developer commits a YAML manifest to Git.
- The Config Sync agent running in the Anthos clusters detects the change.
- The agent pulls and applies the manifest to the cluster. This ensures that all clusters in the fleet stay in sync and allows for “infrastructure versioning” and easy rollbacks.
- Dedicated: You have a physical fiber connection between your router and a Google Edge Point of Presence (PoP). Supports 10G/100G. Best for high bandwidth and security.
- Partner: You connect to a service provider (like Equinix) who already has a link to Google. Better for smaller bandwidth (50Mbps to 10G) or if your data center is not in a Google PoP city.
- mTLS: Automatically encrypts all service-to-service traffic.
- Observability: Provides a “Service Graph” and Golden Signals (Latency/Errors) out of the box.
- Resiliency: Handles retries, circuit breakers, and canary rollouts at the infrastructure layer.