Chapter 5: Performance Optimization in DynamoDB
Introduction
Performance optimization in DynamoDB requires understanding how data distribution, access patterns, and throughput settings interact with the underlying distributed architecture. This chapter explores techniques for maximizing throughput, minimizing latency, and efficiently utilizing DynamoDB’s capacity.Understanding DynamoDB Performance Fundamentals
Read and Write Capacity Units
DynamoDB’s performance is measured in capacity units: Read Capacity Units (RCUs):- 1 RCU = one strongly consistent read per second for items up to 4KB
- 1 RCU = two eventually consistent reads per second for items up to 4KB
- Transactional reads = 2 RCUs per item per second
- 1 WCU = one write per second for items up to 1KB
- Transactional writes = 2 WCUs per item per second
Calculating Required Capacity
Partition Key Design for Performance
Hot Partition Problem
Strategies for Avoiding Hot Partitions
Strategy 1: Use High-Cardinality KeysDeep Dive: Burst and Adaptive Capacity
DynamoDB provides built-in mechanisms to handle temporary spikes and sustained imbalances in traffic. Understanding these is critical for production performance tuning.1. Burst Capacity
DynamoDB allows you to “burst” above your provisioned throughput for short periods. This is achieved by retaining unused capacity for up to 5 minutes (300 seconds).- How it works: If you don’t use your full throughput, DynamoDB stores the remainder in a “burst bucket.”
- Benefit: Handles sudden, micro-spikes in traffic without throttling.
- Limit: Once the 5-minute bucket is exhausted, requests are throttled back to the provisioned level.
2. Adaptive Capacity
Adaptive capacity handles sustained imbalances where one partition receives significantly more traffic than others.- Dynamic Boosting: DynamoDB automatically increases the throughput for a hot partition if the total table-level throughput is not exceeded.
- Isolation: It helps prevent “noisy neighbor” problems within your own table’s partitions.
- Instant vs. Delayed: Modern DynamoDB (since 2019) applies adaptive capacity almost instantly for most workloads.
| Feature | Burst Capacity | Adaptive Capacity |
|---|---|---|
| Duration | Short-term (up to 5 mins) | Long-term / Sustained |
| Trigger | Temporal spikes | Spatial imbalance (hot keys) |
| Limit | Accumulated bucket size | Total Table Throughput |
| Automation | Always on | Always on |
3. Throttling and Error Handling
When capacity (including burst and adaptive) is exhausted, DynamoDB returns aProvisionedThroughputExceededException (HTTP 400).
Optimizing Read Performance
Using Projection Expressions
Reduce data transfer by reading only needed attributes:Batch Operations
Query Optimization
Parallel Query Pattern
Optimizing Write Performance
BatchWriteItem
Conditional Writes for Efficiency
Update Expression Optimization
Caching Strategies
Application-Level Caching
DynamoDB Accelerator (DAX)
Cache-Aside Pattern
On-Demand vs Provisioned Capacity
Choosing the Right Mode
Auto-Scaling Configuration
Latency Optimization Techniques
Connection Pooling
Parallel Requests
Regional Endpoints
Monitoring and Performance Metrics
CloudWatch Metrics
Custom Performance Tracking
GSI Performance and Write Amplification
Global Secondary Indexes (GSIs) are powerful but come with performance trade-offs that impact both latency and cost.1. The Write Amplification Problem
Every write to a table with GSIs triggers one or more “shadow” writes to the index partitions.- Write Cost: A single
PutItemthat updates a GSI-indexed attribute costs WCUs on the base table plus WCUs on every affected GSI. - Latency: GSI updates are asynchronous but highly optimized. However, if the GSI is throttled, it can create backpressure or cause the index to become stale.
2. Index Projection Strategy
To minimize performance impact, project only the attributes necessary for the index’s specific query.| Projection Type | Description | Performance Impact |
|---|---|---|
| KEYS_ONLY | Smallest index size. | Highest (requires “Fetch” from base table). |
| INCLUDE | Selected attributes only. | Medium (balanced cost/latency). |
| ALL | Full item duplication. | Lowest latency, Highest WCU cost. |
3. GSI Backpressure and Index Creation
When creating a new GSI on an existing table:- Scanning Phase: DynamoDB scans the base table to populate the index.
- Backpressure: If the GSI’s provisioned write capacity is too low during creation, the scan slows down to avoid overwhelming the index.
- Impact on Base Table: GSI creation does not consume base table RCUs (it uses background capacity).
Interview Questions and Answers
Question 1: How do you prevent hot partitions in DynamoDB?
Answer: Hot partitions occur when traffic is unevenly distributed across partition keys. Prevention strategies include:- Use high-cardinality partition keys:
- Write sharding:
- Time-based partitioning:
- Composite keys:
Question 2: Explain the difference between on-demand and provisioned capacity modes.
Answer: On-Demand Mode:- Pay per request (no capacity planning)
- Automatically scales to handle traffic
- Higher cost per request (0.25/M reads)
- Best for unpredictable or spiky workloads
- No throttling (up to 40K RCU/WCU)
- Pre-define RCU/WCU capacity
- Lower cost per request (0.00013/RCU-hour)
- Requires capacity planning or auto-scaling
- Best for steady, predictable workloads
- Can throttle if capacity exceeded
- Use on-demand for: new apps, dev/test, unpredictable traffic
- Use provisioned for: production, predictable traffic, cost optimization (60%+ cheaper at scale)
Question 3: How would you optimize a query that retrieves 1,000 items frequently?
Answer: Multi-layered approach:- Caching (most important):
- DAX (DynamoDB Accelerator):
- Projection expressions:
- Pagination:
- Parallel queries (if sharded):
Question 4: What causes throttling in DynamoDB and how do you handle it?
Answer: Causes:- Exceeding provisioned capacity
- Hot partitions (uneven distribution)
- Burst capacity exhausted
- GSI throttling
- Exponential backoff with jitter:
- Increase capacity:
- Fix hot partitions:
- Enable auto-scaling:
Question 5: How do you optimize write performance for bulk data loads?
Answer: Strategies:- Use BatchWriteItem:
- Parallel batch writes:
- Temporarily increase capacity:
- Optimize item size:
- Disable streams/triggers temporarily:
Question 6: Explain how DynamoDB Accelerator (DAX) improves performance.
Answer: DAX is an in-memory cache for DynamoDB that provides: Benefits:- Microsecond latency: 1-2ms vs 10-20ms
- Transparent: Drop-in replacement for DynamoDB client
- Automatic cache management: No manual invalidation
- Read-through: Automatically populates cache on misses
- Write-through: Updates cache on writes
- Only eventually consistent reads
- Requires DAX cluster deployment
- Additional cost ($0.12/hour for t2.small)
- Read-heavy workloads
- Low-latency requirements
- Repeated reads of same items
Question 7: How do you calculate RCUs and WCUs for your table?
Answer: RCU Calculation:Question 8: What are the best practices for pagination in DynamoDB?
Answer: Standard pagination:- Use
Limitto control page size - Return
LastEvaluatedKeyas opaque token - Don’t expose internal key structure
- Implement timeout handling for large scans
- Consider caching for expensive queries
Question 9: How do you monitor and optimize table performance?
Answer: Key metrics to monitor:- Consumed capacity:
- Throttling:
- Latency:
- Identify hot partitions:
- Analyze access patterns:
- Set up alarms:
Question 10: How would you design a high-performance leaderboard system?
Answer: Requirements:- Millions of users
- Real-time score updates
- Top 100 leaderboard queries
- User rank queries
- Main table (for score updates):
- GSI for ranking (limited use):
- ElastiCache for rankings:
- Hybrid approach:
Summary
Key performance optimization strategies:- Partition key design: Use high-cardinality keys, avoid hot partitions
- Batch operations: Use BatchGetItem/BatchWriteItem for multiple items
- Caching: Implement Redis/DAX for read-heavy workloads
- Capacity planning: Choose on-demand vs provisioned based on workload
- Query optimization: Use projection expressions, parallel queries
- Monitoring: Track CloudWatch metrics, set up alarms
- Application cache (Redis): < 1ms
- DAX: 1-2ms
- DynamoDB eventual read: 5-10ms
- DynamoDB strong read: 10-20ms
- DynamoDB query: 10-50ms
- DynamoDB scan: 100ms+