Module Overview
Estimated Time: 4-5 hours | Difficulty: Intermediate-Advanced | Prerequisites: Core Concepts
- DynamoDB fundamentals and architecture
- Data modeling and access patterns
- Primary keys, GSIs, and LSIs
- Capacity modes (On-Demand vs Provisioned)
- Transactions and consistency models
- DynamoDB Accelerator (DAX)
- Streams and change data capture
- Performance optimization and cost management
Why DynamoDB?
Fully Managed
No servers to manage, automatic scaling, built-in backup and restore
Single-Digit Milliseconds
Consistent performance at any scale, from 1 to millions of requests/second
Serverless
Pay-per-request pricing, no idle capacity charges with On-Demand mode
Global Tables
Multi-region, active-active replication for global applications
DynamoDB Architecture
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ DynamoDB Table │ │
│ │ │ │
│ │ Table: Orders │ │
│ │ ───────────────────────────────────────────────────────────── │ │
│ │ │ │
│ │ Partition Key (PK): customer_id │ │
│ │ Sort Key (SK): order_date#order_id │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ Partition 1 (customer_id = "C001") │ │ │
│ │ │ ├── 2024-01-15#ORD001 → {amount: 99.99, status: "done"}│ │ │
│ │ │ ├── 2024-01-20#ORD002 → {amount: 149.99, status: "new"}│ │ │
│ │ │ └── 2024-02-01#ORD003 → {amount: 29.99, status: "done"}│ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ Partition 2 (customer_id = "C002") │ │ │
│ │ │ ├── 2024-01-10#ORD004 → {amount: 59.99, status: "done"}│ │ │
│ │ │ └── 2024-01-25#ORD005 → {amount: 199.99, status: "new"}│ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Data Distribution: │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Partition 1 │ │ Partition 2 │ │ Partition N │ │
│ │ (10 GB max) │ │ (10 GB max) │ │ (10 GB max) │ │
│ │ 3 AZs │ │ 3 AZs │ │ 3 AZs │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────┴────────────────┘ │
│ │ │
│ Automatic Replication │
│ (3 copies, multi-AZ) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Core Concepts
Primary Keys
DynamoDB supports two types of primary keys:Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Primary Key Types │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. PARTITION KEY (Simple Primary Key) │
│ ───────────────────────────────────── │
│ • Single attribute │
│ • Must be unique across all items │
│ • Used to determine physical partition │
│ │
│ Example: user_id (unique per user) │
│ ┌───────────┬────────────────────────────┐ │
│ │ user_id │ data │ │
│ ├───────────┼────────────────────────────┤ │
│ │ U001 │ {name: "Alice", age: 30} │ │
│ │ U002 │ {name: "Bob", age: 25} │ │
│ └───────────┴────────────────────────────┘ │
│ │
│ 2. COMPOSITE PRIMARY KEY (Partition + Sort Key) │
│ ───────────────────────────────────────────── │
│ • Two attributes: partition key + sort key │
│ • Partition key doesn't need to be unique │
│ • Combination must be unique │
│ • Enables range queries on sort key │
│ │
│ Example: customer_id (PK) + order_date (SK) │
│ ┌─────────────┬─────────────┬──────────────────────┐ │
│ │ customer_id │ order_date │ data │ │
│ ├─────────────┼─────────────┼──────────────────────┤ │
│ │ C001 │ 2024-01-15 │ {amount: 99.99} │ │
│ │ C001 │ 2024-01-20 │ {amount: 149.99} │ │
│ │ C002 │ 2024-01-10 │ {amount: 59.99} │ │
│ └─────────────┴─────────────┴──────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Data Types
Copy
# DynamoDB Attribute Types
data_types = {
# Scalar Types
"S": "String", # "Hello World"
"N": "Number", # "123.45" (sent as string)
"B": "Binary", # base64-encoded binary
"BOOL": "Boolean", # true/false
"NULL": "Null", # null
# Document Types
"M": "Map", # {"key": {"S": "value"}}
"L": "List", # [{"S": "a"}, {"N": "1"}]
# Set Types (unique elements, same type)
"SS": "String Set", # ["a", "b", "c"]
"NS": "Number Set", # ["1", "2", "3"]
"BS": "Binary Set", # [binary1, binary2]
}
# Example Item
order_item = {
"PK": {"S": "CUSTOMER#C001"},
"SK": {"S": "ORDER#2024-01-15#ORD001"},
"order_id": {"S": "ORD001"},
"customer_id": {"S": "C001"},
"amount": {"N": "99.99"},
"items": {"L": [
{"M": {"product": {"S": "Widget"}, "qty": {"N": "2"}}},
{"M": {"product": {"S": "Gadget"}, "qty": {"N": "1"}}}
]},
"status": {"S": "COMPLETED"},
"tags": {"SS": ["express", "gift-wrapped"]},
"created_at": {"S": "2024-01-15T10:30:00Z"}
}
Data Modeling Patterns
Single-Table Design
Best Practice: Use single-table design for related entities. This enables fetching all related data in a single query, reducing latency and cost.
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Single-Table Design Example │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ E-Commerce: Customers, Orders, Order Items in ONE table │
│ │
│ ┌─────────────────────┬─────────────────────────┬─────────────────┐ │
│ │ PK │ SK │ Attributes │ │
│ ├─────────────────────┼─────────────────────────┼─────────────────┤ │
│ │ CUSTOMER#C001 │ PROFILE │ name, email... │ │
│ │ CUSTOMER#C001 │ ORDER#2024-01-15#O001 │ total, status │ │
│ │ CUSTOMER#C001 │ ORDER#2024-01-20#O002 │ total, status │ │
│ │ ORDER#O001 │ ITEM#1 │ product, qty │ │
│ │ ORDER#O001 │ ITEM#2 │ product, qty │ │
│ │ ORDER#O002 │ ITEM#1 │ product, qty │ │
│ │ PRODUCT#P001 │ METADATA │ name, price │ │
│ │ PRODUCT#P001 │ INVENTORY │ stock, location │ │
│ └─────────────────────┴─────────────────────────┴─────────────────┘ │
│ │
│ Access Patterns Enabled: │
│ • Get customer profile: PK = "CUSTOMER#C001", SK = "PROFILE" │
│ • Get all customer orders: PK = "CUSTOMER#C001", SK begins "ORDER#" │
│ • Get order items: PK = "ORDER#O001", SK begins "ITEM#" │
│ • Get product info: PK = "PRODUCT#P001" │
│ │
└────────────────────────────────────────────────────────────────────────┘
Access Pattern First Design
Copy
# Step 1: Define Access Patterns
access_patterns = [
"Get customer by ID",
"Get all orders for a customer",
"Get order details with items",
"Get orders by status (e.g., 'pending')",
"Get orders in date range",
"Get product inventory",
]
# Step 2: Design Keys Based on Patterns
table_design = {
"table_name": "EcommerceTable",
"primary_key": {
"PK": "Partition Key (entity type + ID)",
"SK": "Sort Key (relationship + details)"
},
"gsi1": {
"GSI1PK": "For alternate access patterns",
"GSI1SK": "Enable range queries"
}
}
# Step 3: Define Key Patterns
key_patterns = {
"Customer": {
"PK": "CUSTOMER#<customer_id>",
"SK": "PROFILE"
},
"Order": {
"PK": "CUSTOMER#<customer_id>",
"SK": "ORDER#<date>#<order_id>",
"GSI1PK": "STATUS#<status>",
"GSI1SK": "<date>#<order_id>"
},
"OrderItem": {
"PK": "ORDER#<order_id>",
"SK": "ITEM#<item_number>"
}
}
Secondary Indexes
Global Secondary Index (GSI)
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Global Secondary Index (GSI) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ • Different partition key and/or sort key from base table │
│ • Separate throughput capacity (own RCU/WCU) │
│ • Can be created/deleted anytime │
│ • Eventually consistent reads only │
│ • Maximum 20 GSIs per table │
│ │
│ BASE TABLE GSI: StatusDateIndex │
│ ───────────────────────────────── ────────────────────────────── │
│ PK: CUSTOMER#C001 GSI-PK: STATUS#pending │
│ SK: ORDER#2024-01-15#O001 GSI-SK: 2024-01-15#O001 │
│ status: pending customer_id: C001 │
│ ───────────────────────────────── ────────────────────────────── │
│ │
│ Use Case: "Find all pending orders sorted by date" │
│ Query: GSI-PK = "STATUS#pending", GSI-SK > "2024-01-01" │
│ │
└────────────────────────────────────────────────────────────────────────┘
Local Secondary Index (LSI)
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Local Secondary Index (LSI) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ • Same partition key, different sort key │
│ • Shares throughput with base table │
│ • Must be created at table creation time │
│ • Strongly consistent reads available │
│ • Maximum 5 LSIs per table │
│ • 10 GB limit per partition (includes all LSIs) │
│ │
│ BASE TABLE LSI: AmountIndex │
│ ───────────────────────────────── ────────────────────────────── │
│ PK: CUSTOMER#C001 PK: CUSTOMER#C001 (same) │
│ SK: ORDER#2024-01-15#O001 LSI-SK: 99.99 (amount) │
│ amount: 99.99 order_id: O001 │
│ ───────────────────────────────── ────────────────────────────── │
│ │
│ Use Case: "Get customer's highest-value orders" │
│ Query: PK = "CUSTOMER#C001", ordered by amount (descending) │
│ │
└────────────────────────────────────────────────────────────────────────┘
GSI vs LSI Comparison
| Feature | GSI | LSI |
|---|---|---|
| Partition Key | Different from base table | Same as base table |
| Sort Key | Different from base table | Different from base table |
| Capacity | Separate RCU/WCU | Shared with base table |
| Creation | Anytime | Table creation only |
| Consistency | Eventually consistent only | Strong or eventual |
| Limit | 20 per table | 5 per table |
| Size Limit | None | 10 GB per partition |
Capacity Modes
On-Demand vs Provisioned
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Capacity Mode Comparison │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ON-DEMAND MODE PROVISIONED MODE │
│ ──────────────── ──────────────── │
│ │
│ ✓ Pay per request ✓ Pay per capacity unit/hour │
│ ✓ Auto-scales instantly ✓ Reserve capacity (cheaper) │
│ ✓ No capacity planning ✓ Predictable costs │
│ ✓ No throttling (mostly) ✓ Auto Scaling available │
│ │
│ Best For: Best For: │
│ • Unpredictable traffic • Predictable, steady traffic │
│ • New applications • Cost optimization │
│ • Spiky workloads • High-volume applications │
│ • Dev/test environments • Reserved capacity discount │
│ │
│ Pricing (us-east-1): Pricing (us-east-1): │
│ • $1.25 per million WRU • $0.00065 per WCU/hour │
│ • $0.25 per million RRU • $0.00013 per RCU/hour │
│ │
│ Cost Example (1M writes/day): │
│ On-Demand: $1.25/day = $37.50/month │
│ Provisioned: ~12 WCU = $5.62/month (84% savings) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Capacity Units Explained
Copy
# Read Capacity Units (RCU)
# 1 RCU = 1 strongly consistent read per second for item up to 4 KB
# 1 RCU = 2 eventually consistent reads per second for item up to 4 KB
def calculate_rcu(item_size_kb: float, reads_per_second: int,
consistent: bool = False) -> int:
"""Calculate required RCUs."""
# Round up to nearest 4 KB
size_units = math.ceil(item_size_kb / 4)
if consistent:
return size_units * reads_per_second
else: # Eventually consistent
return math.ceil(size_units * reads_per_second / 2)
# Write Capacity Units (WCU)
# 1 WCU = 1 write per second for item up to 1 KB
def calculate_wcu(item_size_kb: float, writes_per_second: int) -> int:
"""Calculate required WCUs."""
# Round up to nearest 1 KB
size_units = math.ceil(item_size_kb)
return size_units * writes_per_second
# Examples
print(calculate_rcu(8, 100, consistent=True)) # 200 RCU
print(calculate_rcu(8, 100, consistent=False)) # 100 RCU
print(calculate_wcu(2.5, 50)) # 150 WCU
Operations
Basic CRUD Operations
Copy
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')
# CREATE - PutItem
def create_order(customer_id: str, order_id: str, amount: float):
table.put_item(
Item={
'PK': f'CUSTOMER#{customer_id}',
'SK': f'ORDER#{datetime.now().isoformat()}#{order_id}',
'order_id': order_id,
'customer_id': customer_id,
'amount': Decimal(str(amount)),
'status': 'PENDING',
'created_at': datetime.now().isoformat()
},
ConditionExpression='attribute_not_exists(PK)' # Prevent overwrite
)
# READ - GetItem (single item, by primary key)
def get_customer(customer_id: str):
response = table.get_item(
Key={
'PK': f'CUSTOMER#{customer_id}',
'SK': 'PROFILE'
},
ConsistentRead=True # Optional: strongly consistent read
)
return response.get('Item')
# READ - Query (multiple items, same partition)
def get_customer_orders(customer_id: str, limit: int = 20):
response = table.query(
KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') &
Key('SK').begins_with('ORDER#'),
ScanIndexForward=False, # Descending order
Limit=limit
)
return response['Items']
# READ - Query with filter (filter applied AFTER read)
def get_pending_orders(customer_id: str):
response = table.query(
KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') &
Key('SK').begins_with('ORDER#'),
FilterExpression=Attr('status').eq('PENDING')
)
return response['Items']
# UPDATE - UpdateItem
def update_order_status(customer_id: str, sk: str, new_status: str):
response = table.update_item(
Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
UpdateExpression='SET #status = :status, updated_at = :updated',
ExpressionAttributeNames={'#status': 'status'}, # 'status' is reserved
ExpressionAttributeValues={
':status': new_status,
':updated': datetime.now().isoformat()
},
ConditionExpression='attribute_exists(PK)', # Ensure item exists
ReturnValues='ALL_NEW'
)
return response['Attributes']
# DELETE - DeleteItem
def delete_order(customer_id: str, sk: str):
table.delete_item(
Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
ConditionExpression='#status <> :completed',
ExpressionAttributeNames={'#status': 'status'},
ExpressionAttributeValues={':completed': 'COMPLETED'}
)
Batch Operations
Copy
# BatchWriteItem - Up to 25 items, 16 MB max
def batch_create_items(items: list):
with table.batch_writer() as batch:
for item in items:
batch.put_item(Item=item)
# Handles retries for unprocessed items automatically
# BatchGetItem - Up to 100 items, 16 MB max
def batch_get_customers(customer_ids: list):
keys = [
{'PK': f'CUSTOMER#{cid}', 'SK': 'PROFILE'}
for cid in customer_ids
]
response = dynamodb.batch_get_item(
RequestItems={
'EcommerceTable': {
'Keys': keys,
'ProjectionExpression': 'customer_id, #name, email',
'ExpressionAttributeNames': {'#name': 'name'}
}
}
)
return response['Responses']['EcommerceTable']
Transactions
DynamoDB supports ACID transactions across multiple items and tables.Copy
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Transactions │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ TransactWriteItems: │
│ • Up to 100 items per transaction │
│ • All-or-nothing execution │
│ • Supported actions: Put, Update, Delete, ConditionCheck │
│ │
│ TransactGetItems: │
│ • Up to 100 items per transaction │
│ • Serializable isolation │
│ │
│ Cost: 2x the cost of standard writes (for durability) │
│ │
│ Example: Transfer funds between accounts │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Transaction: │ │
│ │ 1. Check source balance >= amount (ConditionCheck) │ │
│ │ 2. Deduct from source account (Update) │ │
│ │ 3. Add to destination account (Update) │ │
│ │ 4. Create transfer record (Put) │ │
│ │ │ │
│ │ All succeed → Committed │ │
│ │ Any fails → All rolled back │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Copy
from boto3.dynamodb.types import TypeSerializer
from decimal import Decimal
def transfer_funds(source_id: str, dest_id: str, amount: Decimal):
"""Transfer funds atomically between accounts."""
client = boto3.client('dynamodb')
serializer = TypeSerializer()
response = client.transact_write_items(
TransactItems=[
# 1. Condition check: source has sufficient balance
{
'ConditionCheck': {
'TableName': 'Accounts',
'Key': {
'PK': {'S': f'ACCOUNT#{source_id}'},
'SK': {'S': 'BALANCE'}
},
'ConditionExpression': 'balance >= :amount',
'ExpressionAttributeValues': {
':amount': serializer.serialize(amount)
}
}
},
# 2. Deduct from source
{
'Update': {
'TableName': 'Accounts',
'Key': {
'PK': {'S': f'ACCOUNT#{source_id}'},
'SK': {'S': 'BALANCE'}
},
'UpdateExpression': 'SET balance = balance - :amount',
'ExpressionAttributeValues': {
':amount': serializer.serialize(amount)
}
}
},
# 3. Add to destination
{
'Update': {
'TableName': 'Accounts',
'Key': {
'PK': {'S': f'ACCOUNT#{dest_id}'},
'SK': {'S': 'BALANCE'}
},
'UpdateExpression': 'SET balance = balance + :amount',
'ExpressionAttributeValues': {
':amount': serializer.serialize(amount)
}
}
},
# 4. Record the transfer
{
'Put': {
'TableName': 'Transfers',
'Item': {
'PK': {'S': f'TRANSFER#{uuid.uuid4()}'},
'SK': {'S': datetime.now().isoformat()},
'source': {'S': source_id},
'destination': {'S': dest_id},
'amount': serializer.serialize(amount),
'status': {'S': 'COMPLETED'}
}
}
}
]
)
return response
DynamoDB Accelerator (DAX)
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ DAX (DynamoDB Accelerator) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Application ──► DAX Cluster ──► DynamoDB │
│ │ │
│ └── In-memory cache │
│ (microsecond latency) │
│ │
│ Cache Types: │
│ ──────────── │
│ • Item Cache: Individual GetItem results (5-min default TTL) │
│ • Query Cache: Query/Scan results (5-min default TTL) │
│ │
│ Performance: │
│ • DynamoDB: 1-10 milliseconds │
│ • DAX: ~400 microseconds (up to 10x faster) │
│ │
│ When to Use: │
│ ✓ Read-heavy workloads │
│ ✓ Same items read repeatedly │
│ ✓ Microsecond response time required │
│ ✗ Write-heavy workloads (no benefit) │
│ ✗ Strongly consistent reads required │
│ │
│ Pricing: │
│ • dax.r5.large: ~$0.269/hour (~$194/month) │
│ • Minimum 3 nodes for production (multi-AZ) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Copy
# Using DAX (drop-in replacement for DynamoDB)
import amazondax
import boto3
# Create DAX client (same API as boto3 DynamoDB resource)
dax_endpoint = 'my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
dax = amazondax.AmazonDaxClient.resource(endpoint_url=dax_endpoint)
table = dax.Table('EcommerceTable')
# Use exactly like DynamoDB
response = table.get_item(
Key={'PK': 'CUSTOMER#C001', 'SK': 'PROFILE'}
)
# First call: ~1ms (cache miss, hits DynamoDB)
# Subsequent calls: ~0.4ms (cache hit)
DynamoDB Streams
Capture item-level changes for event-driven architectures.Copy
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Streams │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ DynamoDB Table ──► Stream ──► Lambda / Kinesis / Application │
│ │
│ Stream Record Types: │
│ ──────────────────── │
│ • KEYS_ONLY: Only partition and sort key │
│ • NEW_IMAGE: Entire item after modification │
│ • OLD_IMAGE: Entire item before modification │
│ • NEW_AND_OLD_IMAGES: Both before and after │
│ │
│ Use Cases: │
│ ──────────── │
│ • Real-time analytics │
│ • Cross-region replication │
│ • Materialized views │
│ • Search index synchronization (OpenSearch) │
│ • Audit logging │
│ • Event-driven workflows │
│ │
│ Retention: 24 hours │
│ Ordering: Per-partition ordering guaranteed │
│ │
└────────────────────────────────────────────────────────────────────────┘
Copy
# Lambda function processing DynamoDB Stream
def lambda_handler(event, context):
for record in event['Records']:
event_name = record['eventName'] # INSERT, MODIFY, REMOVE
if event_name == 'INSERT':
new_item = record['dynamodb']['NewImage']
# Process new item
print(f"New item: {new_item}")
elif event_name == 'MODIFY':
old_item = record['dynamodb']['OldImage']
new_item = record['dynamodb']['NewImage']
# Compare and process changes
print(f"Modified: {old_item} -> {new_item}")
elif event_name == 'REMOVE':
old_item = record['dynamodb']['OldImage']
# Handle deletion
print(f"Deleted: {old_item}")
return {'statusCode': 200}
Best Practices
Performance Optimization
Even Key Distribution
Use high-cardinality partition keys to avoid hot partitions
Sparse Indexes
Only include items with indexed attributes in GSIs
Projection Carefully
Only project needed attributes to GSIs (reduce WCU)
Use BatchGetItem
Batch reads instead of multiple GetItem calls
Cost Optimization
Copy
cost_tips = {
"capacity_mode": [
"Use On-Demand for unpredictable workloads",
"Use Provisioned with Auto Scaling for steady traffic",
"Use Reserved Capacity for 77% savings on provisioned",
],
"data_modeling": [
"Use single-table design to reduce table count",
"Project only needed attributes to GSIs",
"Use sparse GSIs (items without attributes aren't indexed)",
],
"operations": [
"Use eventually consistent reads (50% cheaper)",
"Batch operations to reduce request overhead",
"Enable TTL for auto-expiring data (free deletions)",
],
"storage": [
"Compress large attributes before storing",
"Use S3 for objects > 400 KB, store reference in DynamoDB",
"Delete unused GSIs",
]
}
🎯 Interview Questions
Q1: When would you choose DynamoDB over RDS?
Q1: When would you choose DynamoDB over RDS?
Choose DynamoDB when:
- Predictable, single-digit millisecond latency at any scale
- Simple access patterns (key-value or document)
- Massive scale requirements (millions of requests/second)
- Serverless architecture
- Global distribution needed (Global Tables)
- Complex queries with JOINs
- Strong ACID requirements across tables
- Existing SQL skills/codebase
- Complex reporting needs
Q2: How do you handle hot partitions?
Q2: How do you handle hot partitions?
Prevention strategies:When reading, query all shards in parallel and merge results.
- Use high-cardinality partition keys
- Add random suffix (write sharding)
- Use composite keys to distribute writes
Copy
# Instead of: ORDER#12345
# Use: ORDER#12345#7 (random 0-9)
Q3: Explain GSI vs LSI trade-offs
Q3: Explain GSI vs LSI trade-offs
GSI:
- Flexible (any partition/sort key)
- Own capacity (no throttling impact on base table)
- Eventually consistent only
- Can be added/removed anytime
- Must share partition key with base table
- Shares capacity (can throttle base table)
- Strongly consistent available
- Must be defined at table creation
- 10 GB partition limit
Q4: How do you implement pagination in DynamoDB?
Q4: How do you implement pagination in DynamoDB?
Copy
def paginated_query(table, pk_value, page_size=20, last_key=None):
params = {
'KeyConditionExpression': Key('PK').eq(pk_value),
'Limit': page_size
}
if last_key:
params['ExclusiveStartKey'] = last_key
response = table.query(**params)
return {
'items': response['Items'],
'last_key': response.get('LastEvaluatedKey') # None if no more pages
}
- Use
Limitfor page size - Use
ExclusiveStartKeyfor continuation LastEvaluatedKeyindicates more pages exist
Q5: Design a DynamoDB schema for a social media app
Q5: Design a DynamoDB schema for a social media app
Access patterns:
- Get user profile
- Get user’s posts
- Get user’s followers
- Get user’s following
- Get feed (posts from following)
Copy
PK | SK | Data
--------------------|---------------------|------------------
USER#alice | PROFILE | {name, bio, ...}
USER#alice | POST#2024-01-15#001 | {content, likes}
USER#alice | FOLLOWER#bob | {followed_at}
USER#alice | FOLLOWING#charlie | {followed_at}
GSI1:
GSI1PK | GSI1SK | For
--------------------|---------------------|------------------
FOLLOWING#charlie | 2024-01-15#POST#001 | Feed aggregation
🧪 Hands-On Lab
1
Create DynamoDB Table
Create a table with composite primary key and enable Streams
2
Implement Single-Table Design
Model users, orders, and order items in a single table
3
Create GSI
Add a GSI for querying orders by status
4
Implement Transactions
Build an atomic order placement with inventory check
5
Process Streams
Create Lambda to process DynamoDB Streams for real-time updates
Next Module
AWS Lambda
Master serverless compute with AWS Lambda