Skip to main content
DynamoDB Architecture

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate-Advanced | Prerequisites: Core Concepts
DynamoDB is AWS’s fully managed NoSQL database designed for single-digit millisecond performance at any scale. This module covers everything from data modeling to advanced patterns used in production systems. What You’ll Learn:
  • DynamoDB fundamentals and architecture
  • Data modeling and access patterns
  • Primary keys, GSIs, and LSIs
  • Capacity modes (On-Demand vs Provisioned)
  • Transactions and consistency models
  • DynamoDB Accelerator (DAX)
  • Streams and change data capture
  • Performance optimization and cost management

Why DynamoDB?

Fully Managed

No servers to manage, automatic scaling, built-in backup and restore

Single-Digit Milliseconds

Consistent performance at any scale, from 1 to millions of requests/second

Serverless

Pay-per-request pricing, no idle capacity charges with On-Demand mode

Global Tables

Multi-region, active-active replication for global applications

DynamoDB Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                        DynamoDB Table                            │  │
│   │                                                                  │  │
│   │   Table: Orders                                                  │  │
│   │   ─────────────────────────────────────────────────────────────  │  │
│   │                                                                  │  │
│   │   Partition Key (PK): customer_id                               │  │
│   │   Sort Key (SK): order_date#order_id                            │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 1 (customer_id = "C001")                      │  │  │
│   │   │  ├── 2024-01-15#ORD001 → {amount: 99.99, status: "done"}│  │  │
│   │   │  ├── 2024-01-20#ORD002 → {amount: 149.99, status: "new"}│  │  │
│   │   │  └── 2024-02-01#ORD003 → {amount: 29.99, status: "done"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 2 (customer_id = "C002")                      │  │  │
│   │   │  ├── 2024-01-10#ORD004 → {amount: 59.99, status: "done"}│  │  │
│   │   │  └── 2024-01-25#ORD005 → {amount: 199.99, status: "new"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Data Distribution:                                                    │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   │
│   │ Partition 1 │  │ Partition 2 │  │ Partition N │                   │
│   │ (10 GB max) │  │ (10 GB max) │  │ (10 GB max) │                   │
│   │   3 AZs     │  │   3 AZs     │  │   3 AZs     │                   │
│   └─────────────┘  └─────────────┘  └─────────────┘                   │
│         │                │                │                            │
│         └────────────────┴────────────────┘                            │
│                          │                                              │
│                  Automatic Replication                                  │
│                  (3 copies, multi-AZ)                                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Core Concepts

Primary Keys

DynamoDB supports two types of primary keys:
┌────────────────────────────────────────────────────────────────────────┐
│                      Primary Key Types                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   1. PARTITION KEY (Simple Primary Key)                                │
│   ─────────────────────────────────────                                │
│   • Single attribute                                                    │
│   • Must be unique across all items                                    │
│   • Used to determine physical partition                               │
│                                                                         │
│   Example: user_id (unique per user)                                   │
│   ┌───────────┬────────────────────────────┐                           │
│   │ user_id   │ data                       │                           │
│   ├───────────┼────────────────────────────┤                           │
│   │ U001      │ {name: "Alice", age: 30}  │                           │
│   │ U002      │ {name: "Bob", age: 25}    │                           │
│   └───────────┴────────────────────────────┘                           │
│                                                                         │
│   2. COMPOSITE PRIMARY KEY (Partition + Sort Key)                      │
│   ─────────────────────────────────────────────                        │
│   • Two attributes: partition key + sort key                           │
│   • Partition key doesn't need to be unique                            │
│   • Combination must be unique                                         │
│   • Enables range queries on sort key                                  │
│                                                                         │
│   Example: customer_id (PK) + order_date (SK)                          │
│   ┌─────────────┬─────────────┬──────────────────────┐                 │
│   │ customer_id │ order_date  │ data                 │                 │
│   ├─────────────┼─────────────┼──────────────────────┤                 │
│   │ C001        │ 2024-01-15  │ {amount: 99.99}     │                 │
│   │ C001        │ 2024-01-20  │ {amount: 149.99}    │                 │
│   │ C002        │ 2024-01-10  │ {amount: 59.99}     │                 │
│   └─────────────┴─────────────┴──────────────────────┘                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Data Types

# DynamoDB Attribute Types
data_types = {
    # Scalar Types
    "S": "String",              # "Hello World"
    "N": "Number",              # "123.45" (sent as string)
    "B": "Binary",              # base64-encoded binary
    "BOOL": "Boolean",          # true/false
    "NULL": "Null",             # null
    
    # Document Types
    "M": "Map",                 # {"key": {"S": "value"}}
    "L": "List",                # [{"S": "a"}, {"N": "1"}]
    
    # Set Types (unique elements, same type)
    "SS": "String Set",         # ["a", "b", "c"]
    "NS": "Number Set",         # ["1", "2", "3"]
    "BS": "Binary Set",         # [binary1, binary2]
}

# Example Item
order_item = {
    "PK": {"S": "CUSTOMER#C001"},
    "SK": {"S": "ORDER#2024-01-15#ORD001"},
    "order_id": {"S": "ORD001"},
    "customer_id": {"S": "C001"},
    "amount": {"N": "99.99"},
    "items": {"L": [
        {"M": {"product": {"S": "Widget"}, "qty": {"N": "2"}}},
        {"M": {"product": {"S": "Gadget"}, "qty": {"N": "1"}}}
    ]},
    "status": {"S": "COMPLETED"},
    "tags": {"SS": ["express", "gift-wrapped"]},
    "created_at": {"S": "2024-01-15T10:30:00Z"}
}

Data Modeling Patterns

Single-Table Design

Best Practice: Use single-table design for related entities. This enables fetching all related data in a single query, reducing latency and cost.
┌────────────────────────────────────────────────────────────────────────┐
│                    Single-Table Design Example                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   E-Commerce: Customers, Orders, Order Items in ONE table              │
│                                                                         │
│   ┌─────────────────────┬─────────────────────────┬─────────────────┐  │
│   │ PK                  │ SK                      │ Attributes      │  │
│   ├─────────────────────┼─────────────────────────┼─────────────────┤  │
│   │ CUSTOMER#C001       │ PROFILE                 │ name, email...  │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-15#O001   │ total, status   │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-20#O002   │ total, status   │  │
│   │ ORDER#O001          │ ITEM#1                  │ product, qty    │  │
│   │ ORDER#O001          │ ITEM#2                  │ product, qty    │  │
│   │ ORDER#O002          │ ITEM#1                  │ product, qty    │  │
│   │ PRODUCT#P001        │ METADATA                │ name, price     │  │
│   │ PRODUCT#P001        │ INVENTORY               │ stock, location │  │
│   └─────────────────────┴─────────────────────────┴─────────────────┘  │
│                                                                         │
│   Access Patterns Enabled:                                              │
│   • Get customer profile: PK = "CUSTOMER#C001", SK = "PROFILE"         │
│   • Get all customer orders: PK = "CUSTOMER#C001", SK begins "ORDER#"  │
│   • Get order items: PK = "ORDER#O001", SK begins "ITEM#"              │
│   • Get product info: PK = "PRODUCT#P001"                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Access Pattern First Design

# Step 1: Define Access Patterns
access_patterns = [
    "Get customer by ID",
    "Get all orders for a customer",
    "Get order details with items",
    "Get orders by status (e.g., 'pending')",
    "Get orders in date range",
    "Get product inventory",
]

# Step 2: Design Keys Based on Patterns
table_design = {
    "table_name": "EcommerceTable",
    "primary_key": {
        "PK": "Partition Key (entity type + ID)",
        "SK": "Sort Key (relationship + details)"
    },
    "gsi1": {
        "GSI1PK": "For alternate access patterns",
        "GSI1SK": "Enable range queries"
    }
}

# Step 3: Define Key Patterns
key_patterns = {
    "Customer": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "PROFILE"
    },
    "Order": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "ORDER#<date>#<order_id>",
        "GSI1PK": "STATUS#<status>",
        "GSI1SK": "<date>#<order_id>"
    },
    "OrderItem": {
        "PK": "ORDER#<order_id>",
        "SK": "ITEM#<item_number>"
    }
}

Secondary Indexes

Global Secondary Index (GSI)

DynamoDB GSI
┌────────────────────────────────────────────────────────────────────────┐
│                    Global Secondary Index (GSI)                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Different partition key and/or sort key from base table            │
│   • Separate throughput capacity (own RCU/WCU)                         │
│   • Can be created/deleted anytime                                      │
│   • Eventually consistent reads only                                    │
│   • Maximum 20 GSIs per table                                          │
│                                                                         │
│   BASE TABLE                           GSI: StatusDateIndex            │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    GSI-PK: STATUS#pending          │
│   SK: ORDER#2024-01-15#O001            GSI-SK: 2024-01-15#O001         │
│   status: pending                      customer_id: C001               │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Find all pending orders sorted by date"                   │
│   Query: GSI-PK = "STATUS#pending", GSI-SK > "2024-01-01"              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Local Secondary Index (LSI)

┌────────────────────────────────────────────────────────────────────────┐
│                    Local Secondary Index (LSI)                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Same partition key, different sort key                             │
│   • Shares throughput with base table                                  │
│   • Must be created at table creation time                             │
│   • Strongly consistent reads available                                │
│   • Maximum 5 LSIs per table                                           │
│   • 10 GB limit per partition (includes all LSIs)                      │
│                                                                         │
│   BASE TABLE                           LSI: AmountIndex                │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    PK: CUSTOMER#C001 (same)        │
│   SK: ORDER#2024-01-15#O001            LSI-SK: 99.99 (amount)          │
│   amount: 99.99                        order_id: O001                  │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Get customer's highest-value orders"                      │
│   Query: PK = "CUSTOMER#C001", ordered by amount (descending)          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

GSI vs LSI Comparison

FeatureGSILSI
Partition KeyDifferent from base tableSame as base table
Sort KeyDifferent from base tableDifferent from base table
CapacitySeparate RCU/WCUShared with base table
CreationAnytimeTable creation only
ConsistencyEventually consistent onlyStrong or eventual
Limit20 per table5 per table
Size LimitNone10 GB per partition

Capacity Modes

On-Demand vs Provisioned

┌────────────────────────────────────────────────────────────────────────┐
│                    Capacity Mode Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ON-DEMAND MODE                        PROVISIONED MODE               │
│   ────────────────                      ────────────────               │
│                                                                         │
│   ✓ Pay per request                     ✓ Pay per capacity unit/hour   │
│   ✓ Auto-scales instantly               ✓ Reserve capacity (cheaper)   │
│   ✓ No capacity planning                ✓ Predictable costs            │
│   ✓ No throttling (mostly)              ✓ Auto Scaling available       │
│                                                                         │
│   Best For:                             Best For:                      │
│   • Unpredictable traffic               • Predictable, steady traffic  │
│   • New applications                    • Cost optimization            │
│   • Spiky workloads                     • High-volume applications     │
│   • Dev/test environments               • Reserved capacity discount   │
│                                                                         │
│   Pricing (us-east-1):                  Pricing (us-east-1):           │
│   • $1.25 per million WRU               • $0.00065 per WCU/hour        │
│   • $0.25 per million RRU               • $0.00013 per RCU/hour        │
│                                                                         │
│   Cost Example (1M writes/day):                                        │
│   On-Demand: $1.25/day = $37.50/month                                  │
│   Provisioned: ~12 WCU = $5.62/month (84% savings)                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Capacity Units Explained

# Read Capacity Units (RCU)
# 1 RCU = 1 strongly consistent read per second for item up to 4 KB
# 1 RCU = 2 eventually consistent reads per second for item up to 4 KB

def calculate_rcu(item_size_kb: float, reads_per_second: int, 
                  consistent: bool = False) -> int:
    """Calculate required RCUs."""
    # Round up to nearest 4 KB
    size_units = math.ceil(item_size_kb / 4)
    
    if consistent:
        return size_units * reads_per_second
    else:  # Eventually consistent
        return math.ceil(size_units * reads_per_second / 2)

# Write Capacity Units (WCU)
# 1 WCU = 1 write per second for item up to 1 KB

def calculate_wcu(item_size_kb: float, writes_per_second: int) -> int:
    """Calculate required WCUs."""
    # Round up to nearest 1 KB
    size_units = math.ceil(item_size_kb)
    return size_units * writes_per_second

# Examples
print(calculate_rcu(8, 100, consistent=True))   # 200 RCU
print(calculate_rcu(8, 100, consistent=False))  # 100 RCU
print(calculate_wcu(2.5, 50))                   # 150 WCU

Operations

Basic CRUD Operations

import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')

# CREATE - PutItem
def create_order(customer_id: str, order_id: str, amount: float):
    table.put_item(
        Item={
            'PK': f'CUSTOMER#{customer_id}',
            'SK': f'ORDER#{datetime.now().isoformat()}#{order_id}',
            'order_id': order_id,
            'customer_id': customer_id,
            'amount': Decimal(str(amount)),
            'status': 'PENDING',
            'created_at': datetime.now().isoformat()
        },
        ConditionExpression='attribute_not_exists(PK)'  # Prevent overwrite
    )

# READ - GetItem (single item, by primary key)
def get_customer(customer_id: str):
    response = table.get_item(
        Key={
            'PK': f'CUSTOMER#{customer_id}',
            'SK': 'PROFILE'
        },
        ConsistentRead=True  # Optional: strongly consistent read
    )
    return response.get('Item')

# READ - Query (multiple items, same partition)
def get_customer_orders(customer_id: str, limit: int = 20):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        ScanIndexForward=False,  # Descending order
        Limit=limit
    )
    return response['Items']

# READ - Query with filter (filter applied AFTER read)
def get_pending_orders(customer_id: str):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        FilterExpression=Attr('status').eq('PENDING')
    )
    return response['Items']

# UPDATE - UpdateItem
def update_order_status(customer_id: str, sk: str, new_status: str):
    response = table.update_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        UpdateExpression='SET #status = :status, updated_at = :updated',
        ExpressionAttributeNames={'#status': 'status'},  # 'status' is reserved
        ExpressionAttributeValues={
            ':status': new_status,
            ':updated': datetime.now().isoformat()
        },
        ConditionExpression='attribute_exists(PK)',  # Ensure item exists
        ReturnValues='ALL_NEW'
    )
    return response['Attributes']

# DELETE - DeleteItem
def delete_order(customer_id: str, sk: str):
    table.delete_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        ConditionExpression='#status <> :completed',
        ExpressionAttributeNames={'#status': 'status'},
        ExpressionAttributeValues={':completed': 'COMPLETED'}
    )

Batch Operations

# BatchWriteItem - Up to 25 items, 16 MB max
def batch_create_items(items: list):
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    # Handles retries for unprocessed items automatically

# BatchGetItem - Up to 100 items, 16 MB max
def batch_get_customers(customer_ids: list):
    keys = [
        {'PK': f'CUSTOMER#{cid}', 'SK': 'PROFILE'}
        for cid in customer_ids
    ]
    
    response = dynamodb.batch_get_item(
        RequestItems={
            'EcommerceTable': {
                'Keys': keys,
                'ProjectionExpression': 'customer_id, #name, email',
                'ExpressionAttributeNames': {'#name': 'name'}
            }
        }
    )
    return response['Responses']['EcommerceTable']

Transactions

DynamoDB supports ACID transactions across multiple items and tables.
┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Transactions                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   TransactWriteItems:                                                   │
│   • Up to 100 items per transaction                                    │
│   • All-or-nothing execution                                           │
│   • Supported actions: Put, Update, Delete, ConditionCheck             │
│                                                                         │
│   TransactGetItems:                                                     │
│   • Up to 100 items per transaction                                    │
│   • Serializable isolation                                             │
│                                                                         │
│   Cost: 2x the cost of standard writes (for durability)                │
│                                                                         │
│   Example: Transfer funds between accounts                              │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Transaction:                                                    │  │
│   │  1. Check source balance >= amount (ConditionCheck)             │  │
│   │  2. Deduct from source account (Update)                         │  │
│   │  3. Add to destination account (Update)                         │  │
│   │  4. Create transfer record (Put)                                │  │
│   │                                                                  │  │
│   │  All succeed → Committed                                        │  │
│   │  Any fails → All rolled back                                    │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
from boto3.dynamodb.types import TypeSerializer
from decimal import Decimal

def transfer_funds(source_id: str, dest_id: str, amount: Decimal):
    """Transfer funds atomically between accounts."""
    
    client = boto3.client('dynamodb')
    serializer = TypeSerializer()
    
    response = client.transact_write_items(
        TransactItems=[
            # 1. Condition check: source has sufficient balance
            {
                'ConditionCheck': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'ConditionExpression': 'balance >= :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 2. Deduct from source
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance - :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 3. Add to destination
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{dest_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance + :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 4. Record the transfer
            {
                'Put': {
                    'TableName': 'Transfers',
                    'Item': {
                        'PK': {'S': f'TRANSFER#{uuid.uuid4()}'},
                        'SK': {'S': datetime.now().isoformat()},
                        'source': {'S': source_id},
                        'destination': {'S': dest_id},
                        'amount': serializer.serialize(amount),
                        'status': {'S': 'COMPLETED'}
                    }
                }
            }
        ]
    )
    return response

DynamoDB Accelerator (DAX)

DAX Architecture
DAX is an in-memory cache for DynamoDB, providing microsecond response times.
┌────────────────────────────────────────────────────────────────────────┐
│                    DAX (DynamoDB Accelerator)                           │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Application ──► DAX Cluster ──► DynamoDB                             │
│                      │                                                  │
│                      └── In-memory cache                               │
│                          (microsecond latency)                         │
│                                                                         │
│   Cache Types:                                                          │
│   ────────────                                                          │
│   • Item Cache: Individual GetItem results (5-min default TTL)         │
│   • Query Cache: Query/Scan results (5-min default TTL)                │
│                                                                         │
│   Performance:                                                          │
│   • DynamoDB: 1-10 milliseconds                                        │
│   • DAX: ~400 microseconds (up to 10x faster)                          │
│                                                                         │
│   When to Use:                                                          │
│   ✓ Read-heavy workloads                                               │
│   ✓ Same items read repeatedly                                          │
│   ✓ Microsecond response time required                                 │
│   ✗ Write-heavy workloads (no benefit)                                 │
│   ✗ Strongly consistent reads required                                 │
│                                                                         │
│   Pricing:                                                              │
│   • dax.r5.large: ~$0.269/hour (~$194/month)                           │
│   • Minimum 3 nodes for production (multi-AZ)                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
# Using DAX (drop-in replacement for DynamoDB)
import amazondax
import boto3

# Create DAX client (same API as boto3 DynamoDB resource)
dax_endpoint = 'my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
dax = amazondax.AmazonDaxClient.resource(endpoint_url=dax_endpoint)
table = dax.Table('EcommerceTable')

# Use exactly like DynamoDB
response = table.get_item(
    Key={'PK': 'CUSTOMER#C001', 'SK': 'PROFILE'}
)
# First call: ~1ms (cache miss, hits DynamoDB)
# Subsequent calls: ~0.4ms (cache hit)

DynamoDB Streams

Capture item-level changes for event-driven architectures.
┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Streams                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   DynamoDB Table ──► Stream ──► Lambda / Kinesis / Application         │
│                                                                         │
│   Stream Record Types:                                                  │
│   ────────────────────                                                  │
│   • KEYS_ONLY: Only partition and sort key                             │
│   • NEW_IMAGE: Entire item after modification                          │
│   • OLD_IMAGE: Entire item before modification                         │
│   • NEW_AND_OLD_IMAGES: Both before and after                          │
│                                                                         │
│   Use Cases:                                                            │
│   ────────────                                                          │
│   • Real-time analytics                                                │
│   • Cross-region replication                                           │
│   • Materialized views                                                 │
│   • Search index synchronization (OpenSearch)                          │
│   • Audit logging                                                      │
│   • Event-driven workflows                                             │
│                                                                         │
│   Retention: 24 hours                                                  │
│   Ordering: Per-partition ordering guaranteed                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
# Lambda function processing DynamoDB Stream
def lambda_handler(event, context):
    for record in event['Records']:
        event_name = record['eventName']  # INSERT, MODIFY, REMOVE
        
        if event_name == 'INSERT':
            new_item = record['dynamodb']['NewImage']
            # Process new item
            print(f"New item: {new_item}")
            
        elif event_name == 'MODIFY':
            old_item = record['dynamodb']['OldImage']
            new_item = record['dynamodb']['NewImage']
            # Compare and process changes
            print(f"Modified: {old_item} -> {new_item}")
            
        elif event_name == 'REMOVE':
            old_item = record['dynamodb']['OldImage']
            # Handle deletion
            print(f"Deleted: {old_item}")
    
    return {'statusCode': 200}

Best Practices

Performance Optimization

Even Key Distribution

Use high-cardinality partition keys to avoid hot partitions

Sparse Indexes

Only include items with indexed attributes in GSIs

Projection Carefully

Only project needed attributes to GSIs (reduce WCU)

Use BatchGetItem

Batch reads instead of multiple GetItem calls

Cost Optimization

cost_tips = {
    "capacity_mode": [
        "Use On-Demand for unpredictable workloads",
        "Use Provisioned with Auto Scaling for steady traffic",
        "Use Reserved Capacity for 77% savings on provisioned",
    ],
    "data_modeling": [
        "Use single-table design to reduce table count",
        "Project only needed attributes to GSIs",
        "Use sparse GSIs (items without attributes aren't indexed)",
    ],
    "operations": [
        "Use eventually consistent reads (50% cheaper)",
        "Batch operations to reduce request overhead",
        "Enable TTL for auto-expiring data (free deletions)",
    ],
    "storage": [
        "Compress large attributes before storing",
        "Use S3 for objects > 400 KB, store reference in DynamoDB",
        "Delete unused GSIs",
    ]
}

🎯 Interview Questions

Choose DynamoDB when:
  • Predictable, single-digit millisecond latency at any scale
  • Simple access patterns (key-value or document)
  • Massive scale requirements (millions of requests/second)
  • Serverless architecture
  • Global distribution needed (Global Tables)
Choose RDS when:
  • Complex queries with JOINs
  • Strong ACID requirements across tables
  • Existing SQL skills/codebase
  • Complex reporting needs
Prevention strategies:
  1. Use high-cardinality partition keys
  2. Add random suffix (write sharding)
  3. Use composite keys to distribute writes
Example - Order ID with random suffix:
# Instead of: ORDER#12345
# Use: ORDER#12345#7 (random 0-9)
When reading, query all shards in parallel and merge results.
GSI:
  • Flexible (any partition/sort key)
  • Own capacity (no throttling impact on base table)
  • Eventually consistent only
  • Can be added/removed anytime
LSI:
  • Must share partition key with base table
  • Shares capacity (can throttle base table)
  • Strongly consistent available
  • Must be defined at table creation
  • 10 GB partition limit
Recommendation: Prefer GSIs unless you need strongly consistent reads on alternate sort key.
def paginated_query(table, pk_value, page_size=20, last_key=None):
    params = {
        'KeyConditionExpression': Key('PK').eq(pk_value),
        'Limit': page_size
    }
    
    if last_key:
        params['ExclusiveStartKey'] = last_key
    
    response = table.query(**params)
    
    return {
        'items': response['Items'],
        'last_key': response.get('LastEvaluatedKey')  # None if no more pages
    }
Key points:
  • Use Limit for page size
  • Use ExclusiveStartKey for continuation
  • LastEvaluatedKey indicates more pages exist
Access patterns:
  • Get user profile
  • Get user’s posts
  • Get user’s followers
  • Get user’s following
  • Get feed (posts from following)
Single-table design:
PK                  | SK                  | Data
--------------------|---------------------|------------------
USER#alice          | PROFILE             | {name, bio, ...}
USER#alice          | POST#2024-01-15#001 | {content, likes}
USER#alice          | FOLLOWER#bob        | {followed_at}
USER#alice          | FOLLOWING#charlie   | {followed_at}

GSI1:
GSI1PK              | GSI1SK              | For
--------------------|---------------------|------------------
FOLLOWING#charlie   | 2024-01-15#POST#001 | Feed aggregation

🧪 Hands-On Lab

1

Create DynamoDB Table

Create a table with composite primary key and enable Streams
2

Implement Single-Table Design

Model users, orders, and order items in a single table
3

Create GSI

Add a GSI for querying orders by status
4

Implement Transactions

Build an atomic order placement with inventory check
5

Process Streams

Create Lambda to process DynamoDB Streams for real-time updates

Next Module

AWS Lambda

Master serverless compute with AWS Lambda