Skip to main content

Chapter 11: No Servers to Manage - Cloud Run and Cloud Functions

Serverless in Google Cloud is not just about “functions.” It is a comprehensive ecosystem that allows you to deploy containers, functions, or entire web applications without managing a single virtual machine. The core philosophy is abstraction: you provide the code, and Google provides the scale.

1. Cloud Run: The Future of Serverless

Cloud Run is Google’s premier serverless offering. It is built on Knative, an open-source Kubernetes-based platform, but it abstracts away the Kubernetes complexity entirely.

Concurrency and Cold Starts: The Performance Trade-off

Cloud Run supports Concurrency.
  • Efficiency: A single Cloud Run instance can handle up to 1000 concurrent requests.
  • Cold Start Mitigation: Because one instance stays warm for many users, the “Cold Start” penalty is hit far less frequently than in traditional FaaS (Function as a Service) models.
  • Cost: You are billed for the resources used during the overlap of requests, making it significantly cheaper for high-traffic APIs.
Min Instances: The Warm Pool Strategy To eliminate cold starts entirely, you can configure Min Instances.
  • Setting: --min-instances=5 keeps at least 5 instances perpetually warm.
  • Trade-off: You pay for 5 instances continuously, even if they are idle. This is a guaranteed cost for guaranteed low latency.
  • SRE Formula: If your service has a 100ms p99 cold start time and serves 1M requests/day with a 1-second SLA, keeping 2-3 min instances can prevent 99% of cold starts while only increasing cost by ~10%.

Cloud Run Services vs. Jobs

  • Services: For request-driven workloads (APIs, Web Frontends). They scale to zero when idle and scale up instantly based on traffic.
  • Jobs: For data processing, database migrations, or scheduled tasks. They run to completion and do not listen for HTTP requests.

1.3 Advanced Cloud Run Patterns

Sidecar Containers (Multi-Container)

Cloud Run now supports Sidecars. You can run multiple containers within a single service.
  • Use Cases:
    • Envoy/Nginx: As a local proxy for auth or caching.
    • Logging Agents: Shipping custom logs to third-party tools (Datadog, Splunk).
    • Cloud SQL Proxy: Running the proxy as a sidecar for better security and performance.
  • Constraint: Only one container (the “ingress” container) can listen for HTTP requests on the specified port.

Direct VPC Egress

The traditional “Serverless VPC Access Connector” was a separate VM-based bottleneck. Cloud Run now supports Direct VPC Egress.
  • Performance: Lower latency and higher throughput (up to 10 Gbps).
  • Security: No need to manage a separate “Connector” subnet.
  • Cost: Eliminates the cost of the Connector VMs.

2. Cloud Functions (2nd Gen) Deep Dive

Cloud Functions 2nd Gen is a major architectural leap. It is actually built on top of Cloud Run and Eventarc.

2.1 The Event-Driven Heart

2nd Gen functions are designed to react to the world via Eventarc.
  • 90+ Event Sources: Including Cloud Storage (File Created), Pub/Sub (Message Published), and BigQuery (Query Finished).
  • Architecture: Eventarc captures the event -> Wraps it in a CloudEvent JSON -> POSTs it to the Cloud Function’s HTTP endpoint.

2.2 Security: Secret Manager Integration

Never store API keys in environment variables.
  • Implementation: Mount secrets as Volumes or Environment Variables directly from Secret Manager.
  • Benefit: Rotation is automatic. When you update the secret in Secret Manager, the function picks it up without a redeploy.

5. Orchestration: Cloud Workflows

When your serverless architecture grows beyond 3-4 services, you shouldn’t use “Chaining” (where A calls B, B calls C). This creates “Distributed Spaghetti.”

5.1 Why Workflows?

Cloud Workflows is a serverless orchestrator that allows you to chain services with:
  • Retries: Automatic backoff if a service is down.
  • State Management: Storing variables across different steps.
  • Long-Running: A single workflow can wait for up to 1 year.

5.2 Workflow Example (YAML)

main:
  steps:
    - call_translation_service:
        call: http.post
        args:
          url: https://translate-api-xyz.a.run.app
          body:
            text: "Hello World"
        result: translation_output
    - save_to_db:
        call: http.post
        args:
          url: https://db-api-xyz.a.run.app
          body:
            data: ${translation_output.body}

3. App Engine: The Mature PaaS

App Engine was Google’s first cloud product (2008). While Cloud Run is the modern choice, App Engine remains a powerful Platform-as-a-Service for traditional web apps.

Standard vs. Flexible

FeatureStandard EnvironmentFlexible Environment
ScalingSeconds (to Zero)Minutes (Cannot scale to zero)
RuntimeSpecific versions (Python 3.10, Node 18, etc.)Any (Docker-based)
HardwareSandboxedGCE Virtual Machines
NetworkingInternalVPC-enabled

Traffic Splitting

App Engine makes “Canary Deployments” incredibly easy. You can deploy a new version and split traffic (e.g., 95% to v1, 5% to v2) based on IP address, cookies, or random selection.

4. Performance Tuning: CPU Allocation and Probes

4.1 CPU Allocation: “Always On” vs. “During Requests”

By default, Cloud Run only allocates CPU during request processing.
  • CPU Boost: For 2nd generation execution environments, Cloud Run can “boost” CPU during startup to reduce cold start latency.
  • Always-on CPU: You can choose to allocate CPU even when no requests are being processed. This is useful for background tasks or maintaining heavy in-memory caches.

4.2 Startup Probes

If your container takes a long time to initialize (e.g., a heavy Java app), use Startup Probes. Cloud Run will wait until the probe succeeds before sending any traffic to the instance, ensuring users don’t see 503 errors during scale-up.

5. Event-Driven Architecture with Eventarc

Eventarc is the glue that connects GCP services. It allows you to build decoupled systems where a change in one service triggers an action in another.
  • Flow: An Event occurs (e.g., a file is uploaded to GCS) → Eventarc captures it → Eventarc routes it to a Trigger (e.g., a Cloud Function or Cloud Run service).
  • Format: It uses the CloudEvents standard, ensuring your event-driven code is portable.

5. Serverless Security and Connectivity

VPC Connector

Serverless services live outside your VPC by default. To access a private Cloud SQL instance or a Redis cache in your VPC, you must use a Serverless VPC Access Connector.
  • It creates a bridge between the serverless environment and your VPC, allowing traffic to flow over private internal IPs.

IAM and the “Least Privilege” Service Account

By default, serverless services use the “Default Compute Service Account,” which is far too powerful. Always:
  1. Create a Custom Service Account.
  2. Grant only the necessary roles (e.g., roles/storage.objectViewer).
  3. Assign that SA to the Cloud Run service or Cloud Function.

6. Interview Preparation: Architectural Deep Dive

1. Q: How does Cloud Run handle concurrency compared to traditional Functions-as-a-Service (FaaS)? A: Traditional FaaS (like AWS Lambda or GCF 1st Gen) handles one request per instance. This causes many “cold starts” during traffic spikes. Cloud Run supports concurrency (up to 1,000 requests per instance). This means a single container instance can serve multiple users simultaneously, drastically reducing the number of cold starts and significantly lowering costs for high-traffic services. 2. Q: What is the purpose of the “Serverless VPC Access Connector”? A: Cloud Run and Cloud Functions live in a Google-managed tenant project outside your VPC. By default, they cannot reach resources with private IPs (like Cloud SQL or Memorystore). The VPC Access Connector creates a bridge (via a small VM subnet) that allows the serverless service to route traffic to your VPC’s internal IP addresses securely, without going over the public internet. 3. Q: When should you use Cloud Run “Jobs” instead of “Services”? A: Use Services for request-driven workloads that listen for HTTP/gRPC requests (APIs, web apps) and scale to zero. Use Jobs for task-driven workloads that run to completion and do not have an HTTP endpoint. Examples for Jobs include database migrations, scheduled data processing, or batch report generation. Jobs can be triggered manually or via a cron schedule (Cloud Scheduler). 4. Q: Explain the concept of “Min Instances” and the cost-performance trade-off. A: Setting --min-instances (e.g., to 2) keeps a “warm pool” of instances always running. Benefit: It eliminates cold start latency for the first requests. Trade-off: You are billed for these instances even if they are idle. This is a “pay for performance” model used for latency-sensitive production APIs where a 2-second cold start is unacceptable. 5. Q: How does “Traffic Splitting” in App Engine or Cloud Run facilitate Canary deployments? A: Traffic splitting allows you to deploy a new version (Revision) of your service without routing 100% of traffic to it. You can specify a split (e.g., 95% to v1, 5% to v2) based on random selection or tags. This allows you to monitor the health and performance of the new version in production with a small subset of real users before completing the rollout, minimizing the “Blast Radius” of potential bugs.

Implementation: The “Serverless Pro” Lab

Deploying a Multi-Region Cloud Run API

# 1. Deploy the service to two regions
gcloud run deploy api-us \
    --image=us-docker.pkg.dev/my-project/my-repo/api:v1 \
    --region=us-central1 \
    --service-account=my-api-sa@$PROJECT_ID.iam.gserviceaccount.com

gcloud run deploy api-eu \
    --image=europe-docker.pkg.dev/my-project/my-repo/api:v1 \
    --region=europe-west1 \
    --service-account=my-api-sa@$PROJECT_ID.iam.gserviceaccount.com

# 2. Use a Global HTTP(S) Load Balancer to route to both
# This provides a single global IP that routes users to the nearest Cloud Run region.

Pro-Tip: Cloud Run Revision Tags

When you deploy a new version of a Cloud Run service, use Revision Tags. gcloud run deploy --image=... --tag=beta This allows you to access the new version at a specific URL (beta---my-service-xyz.a.run.app) for testing before you route any production traffic to it.