Chapter 11: No Servers to Manage - Cloud Run and Cloud Functions
Serverless in Google Cloud is not just about “functions.” It is a comprehensive ecosystem that allows you to deploy containers, functions, or entire web applications without managing a single virtual machine. The core philosophy is abstraction: you provide the code, and Google provides the scale.1. Cloud Run: The Future of Serverless
Cloud Run is Google’s premier serverless offering. It is built on Knative, an open-source Kubernetes-based platform, but it abstracts away the Kubernetes complexity entirely.Concurrency and Cold Starts: The Performance Trade-off
Cloud Run supports Concurrency.- Efficiency: A single Cloud Run instance can handle up to 1000 concurrent requests.
- Cold Start Mitigation: Because one instance stays warm for many users, the “Cold Start” penalty is hit far less frequently than in traditional FaaS (Function as a Service) models.
- Cost: You are billed for the resources used during the overlap of requests, making it significantly cheaper for high-traffic APIs.
- Setting:
--min-instances=5keeps at least 5 instances perpetually warm. - Trade-off: You pay for 5 instances continuously, even if they are idle. This is a guaranteed cost for guaranteed low latency.
- SRE Formula: If your service has a 100ms p99 cold start time and serves 1M requests/day with a 1-second SLA, keeping 2-3 min instances can prevent 99% of cold starts while only increasing cost by ~10%.
Cloud Run Services vs. Jobs
- Services: For request-driven workloads (APIs, Web Frontends). They scale to zero when idle and scale up instantly based on traffic.
- Jobs: For data processing, database migrations, or scheduled tasks. They run to completion and do not listen for HTTP requests.
1.3 Advanced Cloud Run Patterns
Sidecar Containers (Multi-Container)
Cloud Run now supports Sidecars. You can run multiple containers within a single service.- Use Cases:
- Envoy/Nginx: As a local proxy for auth or caching.
- Logging Agents: Shipping custom logs to third-party tools (Datadog, Splunk).
- Cloud SQL Proxy: Running the proxy as a sidecar for better security and performance.
- Constraint: Only one container (the “ingress” container) can listen for HTTP requests on the specified port.
Direct VPC Egress
The traditional “Serverless VPC Access Connector” was a separate VM-based bottleneck. Cloud Run now supports Direct VPC Egress.- Performance: Lower latency and higher throughput (up to 10 Gbps).
- Security: No need to manage a separate “Connector” subnet.
- Cost: Eliminates the cost of the Connector VMs.
2. Cloud Functions (2nd Gen) Deep Dive
Cloud Functions 2nd Gen is a major architectural leap. It is actually built on top of Cloud Run and Eventarc.2.1 The Event-Driven Heart
2nd Gen functions are designed to react to the world via Eventarc.- 90+ Event Sources: Including Cloud Storage (File Created), Pub/Sub (Message Published), and BigQuery (Query Finished).
- Architecture: Eventarc captures the event -> Wraps it in a CloudEvent JSON -> POSTs it to the Cloud Function’s HTTP endpoint.
2.2 Security: Secret Manager Integration
Never store API keys in environment variables.- Implementation: Mount secrets as Volumes or Environment Variables directly from Secret Manager.
- Benefit: Rotation is automatic. When you update the secret in Secret Manager, the function picks it up without a redeploy.
5. Orchestration: Cloud Workflows
When your serverless architecture grows beyond 3-4 services, you shouldn’t use “Chaining” (where A calls B, B calls C). This creates “Distributed Spaghetti.”5.1 Why Workflows?
Cloud Workflows is a serverless orchestrator that allows you to chain services with:- Retries: Automatic backoff if a service is down.
- State Management: Storing variables across different steps.
- Long-Running: A single workflow can wait for up to 1 year.
5.2 Workflow Example (YAML)
3. App Engine: The Mature PaaS
App Engine was Google’s first cloud product (2008). While Cloud Run is the modern choice, App Engine remains a powerful Platform-as-a-Service for traditional web apps.Standard vs. Flexible
| Feature | Standard Environment | Flexible Environment |
|---|---|---|
| Scaling | Seconds (to Zero) | Minutes (Cannot scale to zero) |
| Runtime | Specific versions (Python 3.10, Node 18, etc.) | Any (Docker-based) |
| Hardware | Sandboxed | GCE Virtual Machines |
| Networking | Internal | VPC-enabled |
Traffic Splitting
App Engine makes “Canary Deployments” incredibly easy. You can deploy a new version and split traffic (e.g., 95% tov1, 5% to v2) based on IP address, cookies, or random selection.
4. Performance Tuning: CPU Allocation and Probes
4.1 CPU Allocation: “Always On” vs. “During Requests”
By default, Cloud Run only allocates CPU during request processing.- CPU Boost: For 2nd generation execution environments, Cloud Run can “boost” CPU during startup to reduce cold start latency.
- Always-on CPU: You can choose to allocate CPU even when no requests are being processed. This is useful for background tasks or maintaining heavy in-memory caches.
4.2 Startup Probes
If your container takes a long time to initialize (e.g., a heavy Java app), use Startup Probes. Cloud Run will wait until the probe succeeds before sending any traffic to the instance, ensuring users don’t see 503 errors during scale-up.5. Event-Driven Architecture with Eventarc
Eventarc is the glue that connects GCP services. It allows you to build decoupled systems where a change in one service triggers an action in another.- Flow: An Event occurs (e.g., a file is uploaded to GCS) → Eventarc captures it → Eventarc routes it to a Trigger (e.g., a Cloud Function or Cloud Run service).
- Format: It uses the CloudEvents standard, ensuring your event-driven code is portable.
5. Serverless Security and Connectivity
VPC Connector
Serverless services live outside your VPC by default. To access a private Cloud SQL instance or a Redis cache in your VPC, you must use a Serverless VPC Access Connector.- It creates a bridge between the serverless environment and your VPC, allowing traffic to flow over private internal IPs.
IAM and the “Least Privilege” Service Account
By default, serverless services use the “Default Compute Service Account,” which is far too powerful. Always:- Create a Custom Service Account.
- Grant only the necessary roles (e.g.,
roles/storage.objectViewer). - Assign that SA to the Cloud Run service or Cloud Function.
6. Interview Preparation: Architectural Deep Dive
1. Q: How does Cloud Run handle concurrency compared to traditional Functions-as-a-Service (FaaS)? A: Traditional FaaS (like AWS Lambda or GCF 1st Gen) handles one request per instance. This causes many “cold starts” during traffic spikes. Cloud Run supports concurrency (up to 1,000 requests per instance). This means a single container instance can serve multiple users simultaneously, drastically reducing the number of cold starts and significantly lowering costs for high-traffic services. 2. Q: What is the purpose of the “Serverless VPC Access Connector”? A: Cloud Run and Cloud Functions live in a Google-managed tenant project outside your VPC. By default, they cannot reach resources with private IPs (like Cloud SQL or Memorystore). The VPC Access Connector creates a bridge (via a small VM subnet) that allows the serverless service to route traffic to your VPC’s internal IP addresses securely, without going over the public internet. 3. Q: When should you use Cloud Run “Jobs” instead of “Services”? A: Use Services for request-driven workloads that listen for HTTP/gRPC requests (APIs, web apps) and scale to zero. Use Jobs for task-driven workloads that run to completion and do not have an HTTP endpoint. Examples for Jobs include database migrations, scheduled data processing, or batch report generation. Jobs can be triggered manually or via a cron schedule (Cloud Scheduler). 4. Q: Explain the concept of “Min Instances” and the cost-performance trade-off. A: Setting--min-instances (e.g., to 2) keeps a “warm pool” of instances always running. Benefit: It eliminates cold start latency for the first requests. Trade-off: You are billed for these instances even if they are idle. This is a “pay for performance” model used for latency-sensitive production APIs where a 2-second cold start is unacceptable.
5. Q: How does “Traffic Splitting” in App Engine or Cloud Run facilitate Canary deployments?
A: Traffic splitting allows you to deploy a new version (Revision) of your service without routing 100% of traffic to it. You can specify a split (e.g., 95% to v1, 5% to v2) based on random selection or tags. This allows you to monitor the health and performance of the new version in production with a small subset of real users before completing the rollout, minimizing the “Blast Radius” of potential bugs.
Implementation: The “Serverless Pro” Lab
Deploying a Multi-Region Cloud Run API
Pro-Tip: Cloud Run Revision Tags
When you deploy a new version of a Cloud Run service, use Revision Tags.gcloud run deploy --image=... --tag=beta
This allows you to access the new version at a specific URL (beta---my-service-xyz.a.run.app) for testing before you route any production traffic to it.