Chapter 8: NoSQL and Caching - Firestore, Bigtable, and Memorystore
Google Cloud’s NoSQL ecosystem is designed for extreme variety and scale. While Cloud SQL and Spanner handle relational data, NoSQL services like Bigtable, Firestore, and Memorystore handle unstructured data, real-time synchronization, and sub-millisecond caching at the petabyte scale.1. Cloud Bigtable: The Wide-Column Giant
Bigtable is Google’s flagship NoSQL database. It powers Google Search, Maps, and Gmail. It is a sparsely populated table that can scale to billions of rows and thousands of columns.1.1 Architecture: The Distributed Storage Engine
Bigtable’s performance comes from its complete separation of compute and storage.- Compute (Nodes): Bigtable nodes handle metadata management and request routing. They do not store data locally.
- Storage (Colossus): Data is stored in SSTables (Sorted String Tables) on Google’s Colossus (distributed file system).
- The Write Path (Life of a Write):
- Commit Log: The write is first recorded in a persistent commit log on Colossus (for durability).
- Memtable: The data is then written to a sorted in-memory buffer called the Memtable.
- Acknowledgement: The client receives a
200 OK. - Flushing: When the Memtable reaches a certain size, it is flushed to Colossus as a new SSTable.
- The Read Path (Bloom Filters):
- To avoid reading every SSTable on disk, Bigtable uses Bloom Filters.
- A Bloom Filter is a probabilistic data structure that can tell Bigtable if a row key might be in an SSTable or definitely isn’t. This significantly reduces unnecessary I/O.
- Compaction: Background processes merge smaller SSTables into larger ones and remove deleted data (Garbage Collection).
1.2 Row Key Design: The Principal’s Art
In Bigtable, the Row Key is the only indexed field. Your entire performance depends on how you design this string.1.2.1 Advanced Key Design Patterns
| Pattern | Example Key | Best For | Technical Logic |
|---|---|---|---|
| Reverse Domain | com.google.search | Web Crawling | Groups related subdomains together for range scans. |
| Hashed Prefix | md5(id)_id | High Write Volume | Uniformly distributes data across all nodes by randomizing the start of the key. |
| Reversed Timestamp | 9999999999 - timestamp | Recent Data Access | Ensures the most recent data (largest timestamp) is always at the “top” of the table. |
| Salting | shard_id#timestamp | Extreme Hotspots | Adds a random prefix (0-9) to split a single hot key into multiple tablets. |
1.2.2 The “Walking Hotspot” Problem
Scenario: You are logging sensor data every second using the keytimestamp#sensor_id.
- Result: Because keys are sorted, all writes for the current second hit the same Tablet. As time moves, the “heat” moves to the next tablet. This is a “walking hotspot.”
- Solution: Promote the
sensor_idto the front of the key:sensor_id#timestamp. This spreads the load across as many tablets as you have sensors.
1.3 Schema Design: Column Families and Row Size Limits
- Column Families: Group related columns (e.g.,
user_profile,user_settings). Bigtable compresses data within a family. Keep families to a minimum (< 10). - Row Size (The 100MB Limit): While Bigtable can handle rows up to 100MB, performance degrades significantly after 10MB.
- SRE Tip: If your rows are growing too large, move high-cardinality data to a separate table or use GCS for large binary objects (blobs) and store the link in Bigtable.
1.3 Schema Design and Performance Tuning
Column Families vs. Qualifiers
- Column Family: A logical grouping of columns. You should have very few (usually < 10). Bigtable compresses data by column family.
- Column Qualifier: The individual column name. These can be millions.
The “Thin” vs. “Wide” Table Trade-off
- Tall/Thin: Many rows, few columns. (Ideal for Bigtable).
- Short/Wide: Few rows, thousands of columns. (Can lead to large rows that exceed the 100MB-per-row limit).
Performance Scaling
Bigtable performance scales linearly with the number of nodes.- Throughput: Each SSD node provides ~10,000 QPS (Reads/Writes) or ~220 MB/s.
- Latency: Sub-10ms for typical operations.
Principal Note: If you see high latency despite low CPU, check for large cells (>10MB) or large rows (>100MB). Bigtable performs best when cells are small and rows are under 100MB.
1.4 Garbage Collection: Managing Data Longevity
In Bigtable, you don’t typically “overwrite” data. You write new cells with new timestamps. Without management, your storage costs would explode. Garbage Collection (GC) policies define when old data is automatically deleted. GC policies are defined at the Column Family level:- Max Versions: Keep only the last versions of a cell (e.g., keep the last 3 price updates).
- Max Age: Delete data older than a specific duration (e.g., delete logs older than 30 days).
- Complex Logic:
- Union: Delete if EITHER condition is met (Age > 30 days OR Versions > 5).
- Intersection: Delete only if BOTH conditions are met (Age > 30 days AND Versions > 5).
1.5 Key Visualizer: Diagnosing the Heatmap
The Key Visualizer is the most important tool for a Bigtable engineer.- Bright Horizontal Lines: A single row key is being hammered (Hotspot).
- Diagonal Lines: Sequential keys (like timestamps) are being written. This is a “walking hotspot” that moves across nodes.
- Evenly Distributed Colors: The ideal state. Load is spread across the entire key space.
2. Cloud Firestore: The Serverless Document Store
Firestore is a flexible, scalable NoSQL document database. It is the successor to Cloud Datastore and the core of the Firebase platform.Native vs. Datastore Mode
| Feature | Native Mode | Datastore Mode |
|---|---|---|
| Best For | Mobile/Web Apps (Real-time) | Server-side Backends (High Throughput) |
| Real-time Sync | Yes (Live Listeners) | No |
| Write Throughput | 10,000/sec (Soft Limit) | Unlimited |
| Security | Firestore Security Rules | IAM Only |
2.2 Advanced Firestore Internals
2.2.1 Concurrency Models
Firestore supports two ways to handle simultaneous updates to the same document:- Optimistic Concurrency (Default): Multiple clients can attempt to update a document. If the document has changed since a client last read it, the update fails, and the client must retry.
- Pessimistic Concurrency (Transactions): Firestore places a lock on the document during a transaction. Other clients must wait until the transaction completes.
2.2.2 The 500-Document Batch Limit
A single Write Batch or Transaction in Firestore is limited to 500 operations.- SRE Tip: If you need to perform a massive migration (e.g., updating 10,000 documents), you must chunk your work into batches of 500.
2.2.3 Indexing Deep Dive: Single-field vs. Composite
- Single-field Indexes: Automatically created for every field in a document.
- Composite Indexes: Required for queries that use multiple fields or complex ordering (e.g.,
WHERE city == 'NYC' AND price > 100 ORDER BY price DESC). - Index Selection: Firestore uses the most efficient index possible. If no composite index exists, it may use a Zig-Zag Merge of two single-field indexes, but this is significantly slower.
3. Cloud Memorystore: Managed Caching
Memorystore provides fully managed Redis and Memcached clusters.3.1 Redis Cluster Mode (Terabyte Scaling)
For massive workloads, Memorystore for Redis support Cluster Mode.- Slot Distribution: Redis uses 16,384 hash slots. Each node in the cluster is responsible for a subset of these slots.
- Scaling: You can add nodes to a cluster to increase both memory capacity and throughput. Memorystore handles the “resharding” of slots in the background.
3.2 Persistence: RDB vs. AOF
In a managed service like Memorystore, you have two choices for data durability:- RDB (Redis Database Backup): Point-in-time snapshots of the dataset. Good for recovery with low performance overhead.
- AOF (Append-Only File): Logs every write operation. Provides much higher durability but can impact performance.
4. NoSQL Selection Decision Tree
- Is sub-millisecond latency the primary requirement?
- Yes → Memorystore (Redis).
- Is it a mobile/web app requiring real-time updates and offline sync?
- Yes → Firestore (Native Mode).
- Is it a massive (10TB+) analytical or time-series workload with sub-10ms latency?
- Yes → Cloud Bigtable.
- Is it a high-throughput server-side backend needing NoSQL scale?
- Yes → Firestore (Datastore Mode).
4. Advanced NoSQL Internals: Sharding and Querying
4.1 Bigtable Tablet Splitting
Bigtable automatically shards your data into Tablets (contiguous ranges of row keys).- Auto-sharding: When a tablet reaches ~100-200MB, or its load (CPU/disk) exceeds a threshold, Bigtable automatically splits it into two.
- Dynamic Rebalancing: Bigtable moves these tablets between nodes in a cluster to ensure every node has a balanced load. This is why you should have at least 1GB of data per node to see the benefits of rebalancing.
4.2 Firestore Zig-Zag Merge
How does Firestore perform a multi-field query (e.g.,WHERE city == 'NYC' AND age > 25) if there is no composite index?
- The Algorithm: Firestore uses a Zig-Zag Merge. It takes the results of two independent single-field indexes and “merges” them by hopping between the sorted lists.
- Efficiency: While clever, this is less efficient than a Composite Index. For high-throughput apps, you must create composite indexes for frequently queried field combinations.
4.3 Bigtable Replication and App Profiles
Bigtable supports multi-cluster replication for high availability and global reads.- App Profiles: You define how clients connect.
- Single-cluster routing: Best for strong consistency within a region.
- Multi-cluster routing: Automatically routes to the nearest cluster. Provides “eventual consistency” but much higher availability (resilient to a full cluster or zonal outage).
5. Security in the NoSQL Layer
- Firestore Security Rules: Allow you to define access logic directly on the database. For example:
allow read if request.auth != null && resource.data.owner == request.auth.uid;. This removes the need for a traditional backend for many mobile apps. - Bigtable IAM: Access is controlled at the Instance or Table level. You cannot restrict access to specific rows using IAM alone.
6. Interview Preparation: Architectural Deep Dive
1. Q: How does Bigtable achieve near-instant scaling without moving data? A: Bigtable uses a Compute-Storage Separation architecture. Data is stored in SSTables on Colossus (Google’s distributed file system). Bigtable nodes (compute) only store metadata (pointers) to these SSTables. When you add a node, Bigtable simply rebalances the pointers (tablets) to the new node. No actual data is moved across the network during scaling, which is why a cluster can scale from 3 to 30 nodes in seconds. 2. Q: Explain the difference between Bigtable Garbage Collection (GC) “Union” and “Intersection” policies. A: GC policies manage data longevity at the Column Family level.- Union: Data is deleted if EITHER condition is met (e.g., Age > 30 days OR Versions > 5).
- Intersection: Data is deleted only if BOTH conditions are met (e.g., Age > 30 days AND Versions > 5). Warning: GC is a background process (compaction); data is not instantly deleted from disk but becomes inaccessible immediately.