Chapter 5: Performance Optimization in DynamoDB
Introduction
Performance optimization in DynamoDB requires understanding how data distribution, access patterns, and throughput settings interact with the underlying distributed architecture. Unlike traditional databases where performance tuning often means adding indexes or rewriting queries, DynamoDB performance is fundamentally determined by how well your data model distributes load across the partition infrastructure. The single most common performance problem in DynamoDB is the “hot partition” — a situation where traffic concentrates on a small number of partitions while others sit idle. This happens because DynamoDB distributes capacity evenly across partitions, so a table provisioned for 10,000 RCUs across 10 partitions gives each partition only 1,000 RCUs. If 90% of your traffic hits one partition, you will be throttled even though the table has plenty of aggregate capacity. Understanding and avoiding this pattern is the difference between a DynamoDB deployment that hums along at single-digit millisecond latency and one that drowns in ProvisionedThroughputExceededException errors. DynamoDB’s performance model has evolved significantly since launch. The original 2012 service had rigid partition-level throughput limits that punished even slight imbalances. Over time, AWS introduced burst capacity (2016), adaptive capacity (2018), on-demand mode (2018), and instant adaptive capacity (2019) to soften these sharp edges. Understanding this evolution matters because many older blog posts and Stack Overflow answers describe limitations that no longer exist, and many teams over-engineer their partition strategies for problems that adaptive capacity now handles automatically. This chapter explores techniques for maximizing throughput, minimizing latency, and efficiently utilizing DynamoDB’s capacity — covering both the timeless fundamentals and the modern features that have changed best practices.Understanding DynamoDB Performance Fundamentals
Read and Write Capacity Units
DynamoDB’s performance is measured in capacity units, a concept borrowed from the idea of “request units” that Azure Cosmos DB also adopted. The capacity unit abstraction is DynamoDB’s way of hiding the underlying hardware complexity (SSD IOPS, network bandwidth, replication overhead) behind a simple, predictable billing model. One important nuance that trips up many engineers: capacity units are calculated based on item size rounded up to the nearest boundary (4KB for reads, 1KB for writes), so a 4.1KB item costs the same as an 8KB item for reads. This rounding behavior makes item size optimization one of the highest-leverage performance improvements you can make. Read Capacity Units (RCUs):- 1 RCU = one strongly consistent read per second for items up to 4KB
- 1 RCU = two eventually consistent reads per second for items up to 4KB
- Transactional reads = 2 RCUs per item per second
- 1 WCU = one write per second for items up to 1KB
- Transactional writes = 2 WCUs per item per second
Calculating Required Capacity
Partition Key Design for Performance
Hot Partition Problem
Strategies for Avoiding Hot Partitions
The goal of every partition key strategy is to spread requests as uniformly as possible across partitions. This is the same fundamental challenge that consistent hashing (used in Dynamo, Cassandra, and memcached) was designed to solve at the infrastructure level. At the application level, you control the distribution by choosing partition key values that map roughly uniformly across the hash space. The three strategies below are listed in order of preference: start with high-cardinality keys (which solve most cases), escalate to write sharding only when you have a genuinely hot aggregate (like a global counter), and use time-based partitioning for time-series data where queries are naturally scoped to time windows. Strategy 1: Use High-Cardinality KeysDeep Dive: Burst and Adaptive Capacity
DynamoDB provides built-in mechanisms to handle temporary spikes and sustained imbalances in traffic. These features — burst capacity and adaptive capacity — were added over time as AWS learned from real customer workloads. They represent the DynamoDB team’s recognition that perfectly uniform partition key distributions are rare in practice, and the system needed to be more forgiving. Understanding these mechanisms is critical for production performance tuning, because they can mask underlying design problems during development and testing, only for those problems to surface under sustained production load when burst buckets are exhausted.1. Burst Capacity
DynamoDB allows you to “burst” above your provisioned throughput for short periods. This is achieved by retaining unused capacity for up to 5 minutes (300 seconds).- How it works: If you don’t use your full throughput, DynamoDB stores the remainder in a “burst bucket.”
- Benefit: Handles sudden, micro-spikes in traffic without throttling.
- Limit: Once the 5-minute bucket is exhausted, requests are throttled back to the provisioned level.
2. Adaptive Capacity
Adaptive capacity handles sustained imbalances where one partition receives significantly more traffic than others.- Dynamic Boosting: DynamoDB automatically increases the throughput for a hot partition if the total table-level throughput is not exceeded.
- Isolation: It helps prevent “noisy neighbor” problems within your own table’s partitions.
- Instant vs. Delayed: Modern DynamoDB (since 2019) applies adaptive capacity almost instantly for most workloads.
| Feature | Burst Capacity | Adaptive Capacity |
|---|---|---|
| Duration | Short-term (up to 5 mins) | Long-term / Sustained |
| Trigger | Temporal spikes | Spatial imbalance (hot keys) |
| Limit | Accumulated bucket size | Total Table Throughput |
| Automation | Always on | Always on |
3. Throttling and Error Handling
When capacity (including burst and adaptive) is exhausted, DynamoDB returns aProvisionedThroughputExceededException (HTTP 400).
Optimizing Read Performance
Read optimization in DynamoDB operates on three levers: reduce the data read per operation (projection expressions, smaller items), reduce the number of operations (batch reads, queries instead of scans), and avoid the database entirely (caching). The first lever is unique to DynamoDB because read capacity is billed based on the full item size stored, not just the bytes returned — aProjectionExpression reduces network transfer and deserialization cost but still consumes RCUs based on the stored item size. This is a critical distinction that many engineers miss: projections optimize bandwidth and client-side processing, not RCU consumption. The second lever — reducing operation count — is where batch operations and query design have the biggest impact. The third lever — caching — was covered in its own section above.
Using Projection Expressions
Reduce data transfer by reading only needed attributes (note that this reduces network bandwidth and client processing time, but RCU consumption is still based on the full item size on disk):Batch Operations
Query Optimization
Query is the most important operation in DynamoDB because it is the only way to efficiently retrieve multiple items from a single partition. Unlike Scan (which reads every item in the table), Query operates within a single partition key value and uses the sort key for range filtering — making it O(log n + k) where k is the number of items returned, not O(n) over the entire table. One critical subtlety:FilterExpression is applied after the query reads data from storage, so it reduces the items returned to your application but does not reduce the RCUs consumed. If you find yourself relying heavily on FilterExpression, it usually means your key schema does not match your access pattern and should be redesigned.
Parallel Query Pattern
Optimizing Write Performance
Write optimization in DynamoDB is fundamentally different from read optimization because writes are inherently more expensive (1 WCU covers only 1KB vs 4KB for reads) and cannot be cached away. Every write must be durably committed to at least two of the three Availability Zone replicas before DynamoDB acknowledges success. This replication overhead is the price of DynamoDB’s durability guarantee, and it means that write latency has a harder floor than read latency. The three main levers for write optimization are: reduce the number of round trips (batch writes), reduce the data written per operation (smaller items, targeted updates instead of full-item replacements), and avoid unnecessary writes (conditional writes, atomic counters). One pattern that trips up many teams: usingPutItem to update a single attribute on a large item. This replaces the entire item, consuming WCUs proportional to the full item size. Using UpdateExpression instead writes only the changed attributes at the storage layer, though WCU billing is still based on the new item’s total size.
BatchWriteItem
Conditional Writes for Efficiency
Update Expression Optimization
Caching Strategies
Caching is arguably the single most impactful performance optimization for DynamoDB-backed applications, yet it is the one most teams implement last. The reason caching matters more for DynamoDB than for traditional databases is economic: DynamoDB charges per request, so every cache hit directly reduces your bill in addition to reducing latency. A well-implemented cache with an 80% hit rate does not just make your application 5x faster for cached reads — it reduces your DynamoDB read costs by 80%. AWS recognized this pattern and built DynamoDB Accelerator (DAX) as a first-party solution, but many production systems use external caches (Redis, Memcached) for greater control over eviction policies, TTLs, and cross-service sharing. The choice between DAX and an external cache is one of the most consequential architectural decisions you will make with DynamoDB: DAX offers zero-code-change integration and write-through semantics, but it only supports eventually consistent reads and ties you to a VPC-deployed cluster. An external cache like Redis gives you fine-grained control, sorted sets for leaderboards, pub/sub for invalidation, and cross-service reuse — but you own the consistency model and invalidation logic.Application-Level Caching
DynamoDB Accelerator (DAX)
DAX (launched in 2017) is a fully managed, in-memory cache purpose-built for DynamoDB. It sits between your application and DynamoDB, intercepting API calls and serving cached results with microsecond latency. Architecturally, DAX is a write-through, read-through cache cluster deployed within your VPC. The “write-through” behavior means that when you write through DAX, it updates both the cache and DynamoDB synchronously, keeping the item cache consistent without manual invalidation. The “read-through” behavior means cache misses are automatically populated from DynamoDB. DAX maintains two internal caches: an item cache (for GetItem/PutItem results) and a query cache (for Query/Scan results). The item cache is invalidated on writes, but the query cache uses TTL-based expiration only — a distinction that catches many teams off guard. DAX is best suited for read-heavy workloads that repeatedly access the same items. It is not a good fit for write-heavy workloads, workloads requiring strongly consistent reads, or applications that primarily use Scan operations.Cache-Aside Pattern
On-Demand vs Provisioned Capacity
Before November 2018, DynamoDB only offered provisioned capacity mode, which meant you had to predict your traffic and pre-allocate throughput. Under-provision and you get throttled; over-provision and you waste money. This capacity planning burden was one of the most common complaints about DynamoDB and drove many teams to alternative databases. The introduction of on-demand mode at re:Invent 2018 was a watershed moment: it eliminated capacity planning entirely by charging per-request instead of per-hour. The trade-off is cost — on-demand mode costs roughly 6-7x more per request than well-utilized provisioned capacity. For most production workloads with predictable traffic, provisioned mode with auto-scaling remains the cost-optimal choice. On-demand mode shines for truly unpredictable workloads, new applications where you do not yet know your traffic patterns, and development/test environments where simplicity outweighs cost. One practical pattern many teams use: start with on-demand to observe traffic patterns, then switch to provisioned with auto-scaling once the workload stabilizes. DynamoDB allows one mode switch per table every 24 hours.Choosing the Right Mode
Auto-Scaling Configuration
Latency Optimization Techniques
DynamoDB advertises single-digit millisecond latency, and for well-designed tables with simple GetItem operations, it delivers: p50 latency is typically 3-5ms, and p99 is under 10ms. But these numbers apply to the DynamoDB service itself — your application’s observed latency includes TLS handshake time, SDK overhead, serialization/deserialization, and network round-trip time between your compute and the DynamoDB endpoint. In Lambda functions with cold starts, the first DynamoDB call can take 200-500ms due to TCP connection establishment and TLS negotiation. In long-running services, connection pooling and keep-alive eliminate this overhead for subsequent calls. The techniques below address these infrastructure-level latency sources, which are often larger than DynamoDB’s own processing time.Connection Pooling
Parallel Requests
Regional Endpoints
Monitoring and Performance Metrics
You cannot optimize what you cannot measure, and DynamoDB provides unusually rich observability through CloudWatch. The most important metrics to watch are not the obvious ones (consumed capacity) but the diagnostic ones:ThrottledRequests tells you when you are hitting limits, SuccessfulRequestLatency tells you when the service is performing normally, and — most valuably — Contributor Insights (launched in 2019) tells you which specific partition keys are receiving the most traffic. Before Contributor Insights existed, diagnosing hot partitions required guesswork and instrumentation in application code. Now it is a one-click enablement that reveals your top partition keys by traffic volume, making it the single most important monitoring tool for DynamoDB performance debugging.
CloudWatch Metrics
Custom Performance Tracking
GSI Performance and Write Amplification
Global Secondary Indexes (GSIs) are one of DynamoDB’s most powerful features and one of its most dangerous performance traps. They enable alternative query patterns on your data by maintaining a separate, automatically-synchronized copy of selected attributes with a different key schema. The key word is “copy” — a GSI is not a pointer or a reference; it is a physically separate partition structure that DynamoDB maintains by replicating writes from the base table. This replication is the source of write amplification, and understanding it is essential for cost control and performance planning in production. The write amplification problem in DynamoDB GSIs is conceptually identical to the write amplification in LSM-tree databases (RocksDB, LevelDB, Cassandra) and in secondary indexes of any distributed database — you are paying for the convenience of additional access patterns with additional write overhead. The critical practical implication: a table with 5 GSIs where every write touches all indexes effectively costs 6x the WCUs of the same table with no indexes. Teams that add GSIs casually during development often face sticker shock when production traffic arrives.1. The Write Amplification Problem
Every write to a table with GSIs triggers one or more “shadow” writes to the index partitions. Under the hood, DynamoDB’s replication subsystem detects which attributes changed, determines which GSIs are affected (only indexes whose key or projected attributes were modified), and propagates the changes asynchronously. This is similar to how MySQL’s InnoDB engine maintains secondary indexes, except that in DynamoDB the propagation happens across distributed storage nodes rather than within a single database instance.- Write Cost: A single
PutItemthat updates a GSI-indexed attribute costs WCUs on the base table plus WCUs on every affected GSI. If the old and new GSI key values differ, the operation incurs both a delete on the old GSI partition and an insert on the new one — effectively double the GSI write cost for that index. - Latency: GSI updates are asynchronous but highly optimized (typically propagated within milliseconds). However, if the GSI is throttled due to insufficient provisioned capacity, it can create backpressure that throttles writes to the base table itself — one of the most confusing failure modes in DynamoDB, because the throttling error appears on the base table even though the root cause is GSI capacity.
2. Index Projection Strategy
To minimize performance impact, project only the attributes necessary for the index’s specific query.| Projection Type | Description | Performance Impact |
|---|---|---|
| KEYS_ONLY | Smallest index size. | Highest (requires “Fetch” from base table). |
| INCLUDE | Selected attributes only. | Medium (balanced cost/latency). |
| ALL | Full item duplication. | Lowest latency, Highest WCU cost. |
3. GSI Backpressure and Index Creation
When creating a new GSI on an existing table:- Scanning Phase: DynamoDB scans the base table to populate the index.
- Backpressure: If the GSI’s provisioned write capacity is too low during creation, the scan slows down to avoid overwhelming the index.
- Impact on Base Table: GSI creation does not consume base table RCUs (it uses background capacity).
Interview Questions and Answers
DynamoDB performance questions in system design interviews are testing whether you understand the distributed systems mechanics beneath the API surface. The strongest candidates do not just recite optimization techniques — they explain why each technique works by connecting it to partitioning, replication, or caching fundamentals. Interviewers are particularly interested in hearing you reason about trade-offs: when is on-demand better than provisioned? When should you add a GSI vs. denormalize? When is DAX the right call vs. an external cache? Frame your answers around trade-offs rather than prescriptions.Question 1: How do you prevent hot partitions in DynamoDB?
Answer: Hot partitions occur when traffic is unevenly distributed across partition keys. This matters because DynamoDB distributes provisioned capacity evenly across physical partitions, so a table with 10,000 RCUs across 10 partitions gives each partition only 1,000 RCUs. Adaptive capacity (introduced in 2018) mitigates this to some extent by dynamically reallocating unused capacity from cold partitions to hot ones, but it cannot exceed the table’s total provisioned throughput. Prevention strategies include:- Use high-cardinality partition keys:
- Write sharding:
- Time-based partitioning:
- Composite keys:
Question 2: Explain the difference between on-demand and provisioned capacity modes.
Answer: On-Demand Mode:- Pay per request (no capacity planning)
- Automatically scales to handle traffic
- Higher cost per request (0.25/M reads)
- Best for unpredictable or spiky workloads
- No throttling (up to 40K RCU/WCU)
- Pre-define RCU/WCU capacity
- Lower cost per request (0.00013/RCU-hour)
- Requires capacity planning or auto-scaling
- Best for steady, predictable workloads
- Can throttle if capacity exceeded
- Use on-demand for: new apps, dev/test, unpredictable traffic
- Use provisioned for: production, predictable traffic, cost optimization (60%+ cheaper at scale)
Question 3: How would you optimize a query that retrieves 1,000 items frequently?
Answer: Multi-layered approach:- Caching (most important):
- DAX (DynamoDB Accelerator):
- Projection expressions:
- Pagination:
- Parallel queries (if sharded):
Question 4: What causes throttling in DynamoDB and how do you handle it?
Answer: Causes:- Exceeding provisioned capacity
- Hot partitions (uneven distribution)
- Burst capacity exhausted
- GSI throttling
- Exponential backoff with jitter:
- Increase capacity:
- Fix hot partitions:
- Enable auto-scaling:
Question 5: How do you optimize write performance for bulk data loads?
Answer: Strategies:- Use BatchWriteItem:
- Parallel batch writes:
- Temporarily increase capacity:
- Optimize item size:
- Disable streams/triggers temporarily:
Question 6: Explain how DynamoDB Accelerator (DAX) improves performance.
Answer: DAX is an in-memory cache for DynamoDB that provides: Benefits:- Microsecond latency: 1-2ms vs 10-20ms
- Transparent: Drop-in replacement for DynamoDB client
- Automatic cache management: No manual invalidation
- Read-through: Automatically populates cache on misses
- Write-through: Updates cache on writes
- Only eventually consistent reads
- Requires DAX cluster deployment
- Additional cost ($0.12/hour for t2.small)
- Read-heavy workloads
- Low-latency requirements
- Repeated reads of same items
Question 7: How do you calculate RCUs and WCUs for your table?
Answer: RCU Calculation:Question 8: What are the best practices for pagination in DynamoDB?
Answer: Standard pagination:- Use
Limitto control page size - Return
LastEvaluatedKeyas opaque token - Don’t expose internal key structure
- Implement timeout handling for large scans
- Consider caching for expensive queries
Question 9: How do you monitor and optimize table performance?
Answer: Key metrics to monitor:- Consumed capacity:
- Throttling:
- Latency:
- Identify hot partitions:
- Analyze access patterns:
- Set up alarms:
Question 10: How would you design a high-performance leaderboard system?
Answer: Requirements:- Millions of users
- Real-time score updates
- Top 100 leaderboard queries
- User rank queries
- Main table (for score updates):
- GSI for ranking (limited use):
- ElastiCache for rankings:
- Hybrid approach:
Summary
The overarching principle behind DynamoDB performance is that you are not tuning a database engine — you are tuning a distributed system. Every optimization in this chapter ultimately comes back to one idea: distribute load evenly across partitions, minimize the amount of data read or written per operation, and cache aggressively to avoid hitting the database at all. These principles are not unique to DynamoDB; they apply to any distributed storage system, from Cassandra to Bigtable to CockroachDB. What makes DynamoDB distinctive is that the performance model is explicit and quantified through capacity units, which means you can predict costs and performance characteristics before deploying to production — if you understand the mechanics. Key performance optimization strategies:- Partition key design: Use high-cardinality keys, avoid hot partitions. This is the single most important decision and the hardest to change later
- Batch operations: Use BatchGetItem/BatchWriteItem for multiple items to amortize network round-trip overhead
- Caching: Implement Redis/DAX for read-heavy workloads. An 80% cache hit rate reduces both latency and cost by 5x for reads
- Capacity planning: Choose on-demand vs provisioned based on workload predictability. Start with on-demand, migrate to provisioned once patterns stabilize
- Query optimization: Use projection expressions, parallel queries, and avoid FilterExpression as a substitute for proper key design
- GSI discipline: Treat every GSI as a write amplifier. Limit to 2-3 indexes per table in production, and use sparse indexes where possible
- Monitoring: Track CloudWatch metrics (especially
ThrottledRequestsand per-partition metrics via Contributor Insights), set up alarms before you need them
- Application cache (Redis): less than 1ms
- DAX: 1-2ms
- DynamoDB eventual read: 5-10ms
- DynamoDB strong read: 10-20ms
- DynamoDB query: 10-50ms
- DynamoDB scan: 100ms+