> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Amazon DynamoDB

> Master DynamoDB data modeling, capacity modes, GSIs, LSIs, transactions, and performance optimization

<Frame>
  <img src="https://mintcdn.com/devweeekends/sTu6A4whRFPJo0_g/images/aws/dynamodb-architecture.svg?fit=max&auto=format&n=sTu6A4whRFPJo0_g&q=85&s=c4af0ddc9df5a6e2d04545da5db32cc0" alt="DynamoDB Architecture" width="1080" height="1080" data-path="images/aws/dynamodb-architecture.svg" />
</Frame>

## Module Overview

<Info>
  **Estimated Time**: 4-5 hours | **Difficulty**: Intermediate-Advanced | **Prerequisites**: Core Concepts
</Info>

DynamoDB is AWS's fully managed NoSQL database designed for single-digit millisecond performance at any scale. The fundamental trade-off: DynamoDB gives you predictable, fast performance and infinite scalability, but in exchange you must design your data model around your access patterns upfront. Unlike a relational database where you can write any SQL query against any column, DynamoDB requires you to know how you will query your data before you create the table. This is the single biggest mental shift for developers coming from PostgreSQL or MySQL -- and the source of most DynamoDB frustration. This module covers everything from data modeling to advanced patterns used in production systems.

**What You'll Learn:**

* DynamoDB fundamentals and architecture
* Data modeling and access patterns
* Primary keys, GSIs, and LSIs
* Capacity modes (On-Demand vs Provisioned)
* Transactions and consistency models
* DynamoDB Accelerator (DAX)
* Streams and change data capture
* Performance optimization and cost management

***

## Why DynamoDB?

<CardGroup cols={2}>
  <Card title="Fully Managed" icon="server">
    No servers to manage, automatic scaling, built-in backup and restore
  </Card>

  <Card title="Single-Digit Milliseconds" icon="bolt">
    Consistent performance at any scale, from 1 to millions of requests/second
  </Card>

  <Card title="Serverless" icon="cloud">
    Pay-per-request pricing, no idle capacity charges with On-Demand mode
  </Card>

  <Card title="Global Tables" icon="globe">
    Multi-region, active-active replication for global applications
  </Card>
</CardGroup>

***

## DynamoDB Architecture

```
┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                        DynamoDB Table                            │  │
│   │                                                                  │  │
│   │   Table: Orders                                                  │  │
│   │   ─────────────────────────────────────────────────────────────  │  │
│   │                                                                  │  │
│   │   Partition Key (PK): customer_id                               │  │
│   │   Sort Key (SK): order_date#order_id                            │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 1 (customer_id = "C001")                      │  │  │
│   │   │  ├── 2024-01-15#ORD001 → {amount: 99.99, status: "done"}│  │  │
│   │   │  ├── 2024-01-20#ORD002 → {amount: 149.99, status: "new"}│  │  │
│   │   │  └── 2024-02-01#ORD003 → {amount: 29.99, status: "done"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 2 (customer_id = "C002")                      │  │  │
│   │   │  ├── 2024-01-10#ORD004 → {amount: 59.99, status: "done"}│  │  │
│   │   │  └── 2024-01-25#ORD005 → {amount: 199.99, status: "new"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Data Distribution:                                                    │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   │
│   │ Partition 1 │  │ Partition 2 │  │ Partition N │                   │
│   │ (10 GB max) │  │ (10 GB max) │  │ (10 GB max) │                   │
│   │   3 AZs     │  │   3 AZs     │  │   3 AZs     │                   │
│   └─────────────┘  └─────────────┘  └─────────────┘                   │
│         │                │                │                            │
│         └────────────────┴────────────────┘                            │
│                          │                                              │
│                  Automatic Replication                                  │
│                  (3 copies, multi-AZ)                                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

***

## Core Concepts

### Primary Keys

DynamoDB supports two types of primary keys:

```
┌────────────────────────────────────────────────────────────────────────┐
│                      Primary Key Types                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   1. PARTITION KEY (Simple Primary Key)                                │
│   ─────────────────────────────────────                                │
│   • Single attribute                                                    │
│   • Must be unique across all items                                    │
│   • Used to determine physical partition                               │
│                                                                         │
│   Example: user_id (unique per user)                                   │
│   ┌───────────┬────────────────────────────┐                           │
│   │ user_id   │ data                       │                           │
│   ├───────────┼────────────────────────────┤                           │
│   │ U001      │ {name: "Alice", age: 30}  │                           │
│   │ U002      │ {name: "Bob", age: 25}    │                           │
│   └───────────┴────────────────────────────┘                           │
│                                                                         │
│   2. COMPOSITE PRIMARY KEY (Partition + Sort Key)                      │
│   ─────────────────────────────────────────────                        │
│   • Two attributes: partition key + sort key                           │
│   • Partition key doesn't need to be unique                            │
│   • Combination must be unique                                         │
│   • Enables range queries on sort key                                  │
│                                                                         │
│   Example: customer_id (PK) + order_date (SK)                          │
│   ┌─────────────┬─────────────┬──────────────────────┐                 │
│   │ customer_id │ order_date  │ data                 │                 │
│   ├─────────────┼─────────────┼──────────────────────┤                 │
│   │ C001        │ 2024-01-15  │ {amount: 99.99}     │                 │
│   │ C001        │ 2024-01-20  │ {amount: 149.99}    │                 │
│   │ C002        │ 2024-01-10  │ {amount: 59.99}     │                 │
│   └─────────────┴─────────────┴──────────────────────┘                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Data Types

```python theme={null}
# DynamoDB Attribute Types
data_types = {
    # Scalar Types
    "S": "String",              # "Hello World"
    "N": "Number",              # "123.45" (sent as string)
    "B": "Binary",              # base64-encoded binary
    "BOOL": "Boolean",          # true/false
    "NULL": "Null",             # null
    
    # Document Types
    "M": "Map",                 # {"key": {"S": "value"}}
    "L": "List",                # [{"S": "a"}, {"N": "1"}]
    
    # Set Types (unique elements, same type)
    "SS": "String Set",         # ["a", "b", "c"]
    "NS": "Number Set",         # ["1", "2", "3"]
    "BS": "Binary Set",         # [binary1, binary2]
}

# Example Item
order_item = {
    "PK": {"S": "CUSTOMER#C001"},
    "SK": {"S": "ORDER#2024-01-15#ORD001"},
    "order_id": {"S": "ORD001"},
    "customer_id": {"S": "C001"},
    "amount": {"N": "99.99"},
    "items": {"L": [
        {"M": {"product": {"S": "Widget"}, "qty": {"N": "2"}}},
        {"M": {"product": {"S": "Gadget"}, "qty": {"N": "1"}}}
    ]},
    "status": {"S": "COMPLETED"},
    "tags": {"SS": ["express", "gift-wrapped"]},
    "created_at": {"S": "2024-01-15T10:30:00Z"}
}
```

***

## Data Modeling Patterns

### Single-Table Design

<Warning>
  **Best Practice**: Use single-table design for related entities. This enables fetching all related data in a single query, reducing latency and cost. However, single-table design is NOT always the right answer. It adds complexity that may not be justified for simple CRUD applications. Use it when: (1) you need to fetch related entities in a single query, (2) you have well-defined access patterns, and (3) your team understands the pattern. For a simple user-profile service with one access pattern, a regular table with a partition key is perfectly fine.
</Warning>

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Single-Table Design Example                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   E-Commerce: Customers, Orders, Order Items in ONE table              │
│                                                                         │
│   ┌─────────────────────┬─────────────────────────┬─────────────────┐  │
│   │ PK                  │ SK                      │ Attributes      │  │
│   ├─────────────────────┼─────────────────────────┼─────────────────┤  │
│   │ CUSTOMER#C001       │ PROFILE                 │ name, email...  │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-15#O001   │ total, status   │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-20#O002   │ total, status   │  │
│   │ ORDER#O001          │ ITEM#1                  │ product, qty    │  │
│   │ ORDER#O001          │ ITEM#2                  │ product, qty    │  │
│   │ ORDER#O002          │ ITEM#1                  │ product, qty    │  │
│   │ PRODUCT#P001        │ METADATA                │ name, price     │  │
│   │ PRODUCT#P001        │ INVENTORY               │ stock, location │  │
│   └─────────────────────┴─────────────────────────┴─────────────────┘  │
│                                                                         │
│   Access Patterns Enabled:                                              │
│   • Get customer profile: PK = "CUSTOMER#C001", SK = "PROFILE"         │
│   • Get all customer orders: PK = "CUSTOMER#C001", SK begins "ORDER#"  │
│   • Get order items: PK = "ORDER#O001", SK begins "ITEM#"              │
│   • Get product info: PK = "PRODUCT#P001"                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Access Pattern First Design

```python theme={null}
# Step 1: Define Access Patterns
access_patterns = [
    "Get customer by ID",
    "Get all orders for a customer",
    "Get order details with items",
    "Get orders by status (e.g., 'pending')",
    "Get orders in date range",
    "Get product inventory",
]

# Step 2: Design Keys Based on Patterns
table_design = {
    "table_name": "EcommerceTable",
    "primary_key": {
        "PK": "Partition Key (entity type + ID)",
        "SK": "Sort Key (relationship + details)"
    },
    "gsi1": {
        "GSI1PK": "For alternate access patterns",
        "GSI1SK": "Enable range queries"
    }
}

# Step 3: Define Key Patterns
key_patterns = {
    "Customer": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "PROFILE"
    },
    "Order": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "ORDER#<date>#<order_id>",
        "GSI1PK": "STATUS#<status>",
        "GSI1SK": "<date>#<order_id>"
    },
    "OrderItem": {
        "PK": "ORDER#<order_id>",
        "SK": "ITEM#<item_number>"
    }
}
```

***

## Secondary Indexes

### Global Secondary Index (GSI)

<Frame>
  <img src="https://mintlify.s3.us-west-1.amazonaws.com/devweeekends/images/aws/dynamodb-gsi.svg" alt="DynamoDB GSI" />
</Frame>

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Global Secondary Index (GSI)                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Different partition key and/or sort key from base table            │
│   • Separate throughput capacity (own RCU/WCU)                         │
│   • Can be created/deleted anytime                                      │
│   • Eventually consistent reads only                                    │
│   • Maximum 20 GSIs per table                                          │
│                                                                         │
│   BASE TABLE                           GSI: StatusDateIndex            │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    GSI-PK: STATUS#pending          │
│   SK: ORDER#2024-01-15#O001            GSI-SK: 2024-01-15#O001         │
│   status: pending                      customer_id: C001               │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Find all pending orders sorted by date"                   │
│   Query: GSI-PK = "STATUS#pending", GSI-SK > "2024-01-01"              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Local Secondary Index (LSI)

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Local Secondary Index (LSI)                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Same partition key, different sort key                             │
│   • Shares throughput with base table                                  │
│   • Must be created at table creation time                             │
│   • Strongly consistent reads available                                │
│   • Maximum 5 LSIs per table                                           │
│   • 10 GB limit per partition (includes all LSIs)                      │
│                                                                         │
│   BASE TABLE                           LSI: AmountIndex                │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    PK: CUSTOMER#C001 (same)        │
│   SK: ORDER#2024-01-15#O001            LSI-SK: 99.99 (amount)          │
│   amount: 99.99                        order_id: O001                  │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Get customer's highest-value orders"                      │
│   Query: PK = "CUSTOMER#C001", ordered by amount (descending)          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### GSI vs LSI Comparison

| Feature           | GSI                        | LSI                       |
| ----------------- | -------------------------- | ------------------------- |
| **Partition Key** | Different from base table  | Same as base table        |
| **Sort Key**      | Different from base table  | Different from base table |
| **Capacity**      | Separate RCU/WCU           | Shared with base table    |
| **Creation**      | Anytime                    | Table creation only       |
| **Consistency**   | Eventually consistent only | Strong or eventual        |
| **Limit**         | 20 per table               | 5 per table               |
| **Size Limit**    | None                       | 10 GB per partition       |

***

## Capacity Modes

### On-Demand vs Provisioned

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Capacity Mode Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ON-DEMAND MODE                        PROVISIONED MODE               │
│   ────────────────                      ────────────────               │
│                                                                         │
│   ✓ Pay per request                     ✓ Pay per capacity unit/hour   │
│   ✓ Auto-scales instantly               ✓ Reserve capacity (cheaper)   │
│   ✓ No capacity planning                ✓ Predictable costs            │
│   ✓ No throttling (mostly)              ✓ Auto Scaling available       │
│                                                                         │
│   Best For:                             Best For:                      │
│   • Unpredictable traffic               • Predictable, steady traffic  │
│   • New applications                    • Cost optimization            │
│   • Spiky workloads                     • High-volume applications     │
│   • Dev/test environments               • Reserved capacity discount   │
│                                                                         │
│   Pricing (us-east-1):                  Pricing (us-east-1):           │
│   • $1.25 per million WRU               • $0.00065 per WCU/hour        │
│   • $0.25 per million RRU               • $0.00013 per RCU/hour        │
│                                                                         │
│   Cost Example (1M writes/day):                                        │
│   On-Demand: $1.25/day = $37.50/month                                  │
│   Provisioned: ~12 WCU = $5.62/month (84% savings!)                   │
│                                                                         │
│   Common mistake: starting with provisioned mode for a new service.    │
│   Always start with on-demand to learn your actual traffic patterns,   │
│   then switch to provisioned after 2-4 weeks when you have data.       │
│   Switching between modes is free (once every 24 hours)                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Capacity Units Explained

```python theme={null}
# Read Capacity Units (RCU)
# 1 RCU = 1 strongly consistent read per second for item up to 4 KB
# 1 RCU = 2 eventually consistent reads per second for item up to 4 KB

def calculate_rcu(item_size_kb: float, reads_per_second: int, 
                  consistent: bool = False) -> int:
    """Calculate required RCUs."""
    # Round up to nearest 4 KB
    size_units = math.ceil(item_size_kb / 4)
    
    if consistent:
        return size_units * reads_per_second
    else:  # Eventually consistent
        return math.ceil(size_units * reads_per_second / 2)

# Write Capacity Units (WCU)
# 1 WCU = 1 write per second for item up to 1 KB

def calculate_wcu(item_size_kb: float, writes_per_second: int) -> int:
    """Calculate required WCUs."""
    # Round up to nearest 1 KB
    size_units = math.ceil(item_size_kb)
    return size_units * writes_per_second

# Examples
print(calculate_rcu(8, 100, consistent=True))   # 200 RCU
print(calculate_rcu(8, 100, consistent=False))  # 100 RCU
print(calculate_wcu(2.5, 50))                   # 150 WCU
```

***

## Operations

### Basic CRUD Operations

```python theme={null}
import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')

# CREATE - PutItem
def create_order(customer_id: str, order_id: str, amount: float):
    table.put_item(
        Item={
            'PK': f'CUSTOMER#{customer_id}',
            # Sort key embeds date for chronological ordering.
            # Pattern: ORDER#<ISO-date>#<order_id> enables efficient range queries
            # like "get all orders after January 2024" using begins_with or between.
            'SK': f'ORDER#{datetime.now().isoformat()}#{order_id}',
            'order_id': order_id,
            'customer_id': customer_id,
            # Always use Decimal for money -- float introduces rounding errors.
            # Common mistake: Decimal(0.1) gives 0.1000000000000000055511...
            # Always pass strings: Decimal(str(amount)) or Decimal("0.10")
            'amount': Decimal(str(amount)),
            'status': 'PENDING',
            'created_at': datetime.now().isoformat()
        },
        # ConditionExpression prevents silent overwrites. Without this,
        # PutItem replaces the entire item if the key already exists.
        # This is the DynamoDB equivalent of INSERT ... ON CONFLICT DO NOTHING.
        ConditionExpression='attribute_not_exists(PK)'
    )

# READ - GetItem (single item, by primary key)
def get_customer(customer_id: str):
    response = table.get_item(
        Key={
            'PK': f'CUSTOMER#{customer_id}',
            'SK': 'PROFILE'
        },
        ConsistentRead=True  # Optional: strongly consistent read
    )
    return response.get('Item')

# READ - Query (multiple items, same partition)
def get_customer_orders(customer_id: str, limit: int = 20):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        ScanIndexForward=False,  # Descending order
        Limit=limit
    )
    return response['Items']

# READ - Query with filter (filter applied AFTER read)
def get_pending_orders(customer_id: str):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        FilterExpression=Attr('status').eq('PENDING')
    )
    return response['Items']

# UPDATE - UpdateItem
def update_order_status(customer_id: str, sk: str, new_status: str):
    response = table.update_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        UpdateExpression='SET #status = :status, updated_at = :updated',
        ExpressionAttributeNames={'#status': 'status'},  # 'status' is reserved
        ExpressionAttributeValues={
            ':status': new_status,
            ':updated': datetime.now().isoformat()
        },
        ConditionExpression='attribute_exists(PK)',  # Ensure item exists
        ReturnValues='ALL_NEW'
    )
    return response['Attributes']

# DELETE - DeleteItem
def delete_order(customer_id: str, sk: str):
    table.delete_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        ConditionExpression='#status <> :completed',
        ExpressionAttributeNames={'#status': 'status'},
        ExpressionAttributeValues={':completed': 'COMPLETED'}
    )
```

### Batch Operations

```python theme={null}
# BatchWriteItem - Up to 25 items, 16 MB max
def batch_create_items(items: list):
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    # Handles retries for unprocessed items automatically

# BatchGetItem - Up to 100 items, 16 MB max
def batch_get_customers(customer_ids: list):
    keys = [
        {'PK': f'CUSTOMER#{cid}', 'SK': 'PROFILE'}
        for cid in customer_ids
    ]
    
    response = dynamodb.batch_get_item(
        RequestItems={
            'EcommerceTable': {
                'Keys': keys,
                'ProjectionExpression': 'customer_id, #name, email',
                'ExpressionAttributeNames': {'#name': 'name'}
            }
        }
    )
    return response['Responses']['EcommerceTable']
```

***

## Transactions

DynamoDB supports ACID transactions across multiple items and tables.

```
┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Transactions                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   TransactWriteItems:                                                   │
│   • Up to 100 items per transaction                                    │
│   • All-or-nothing execution                                           │
│   • Supported actions: Put, Update, Delete, ConditionCheck             │
│                                                                         │
│   TransactGetItems:                                                     │
│   • Up to 100 items per transaction                                    │
│   • Serializable isolation                                             │
│                                                                         │
│   Cost: 2x the cost of standard writes (for durability)                │
│                                                                         │
│   Example: Transfer funds between accounts                              │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Transaction:                                                    │  │
│   │  1. Check source balance >= amount (ConditionCheck)             │  │
│   │  2. Deduct from source account (Update)                         │  │
│   │  3. Add to destination account (Update)                         │  │
│   │  4. Create transfer record (Put)                                │  │
│   │                                                                  │  │
│   │  All succeed → Committed                                        │  │
│   │  Any fails → All rolled back                                    │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

```python theme={null}
from boto3.dynamodb.types import TypeSerializer
from decimal import Decimal

def transfer_funds(source_id: str, dest_id: str, amount: Decimal):
    """Transfer funds atomically between accounts."""
    
    client = boto3.client('dynamodb')
    serializer = TypeSerializer()
    
    response = client.transact_write_items(
        TransactItems=[
            # 1. Condition check: source has sufficient balance
            {
                'ConditionCheck': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'ConditionExpression': 'balance >= :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 2. Deduct from source
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance - :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 3. Add to destination
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{dest_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance + :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 4. Record the transfer
            {
                'Put': {
                    'TableName': 'Transfers',
                    'Item': {
                        'PK': {'S': f'TRANSFER#{uuid.uuid4()}'},
                        'SK': {'S': datetime.now().isoformat()},
                        'source': {'S': source_id},
                        'destination': {'S': dest_id},
                        'amount': serializer.serialize(amount),
                        'status': {'S': 'COMPLETED'}
                    }
                }
            }
        ]
    )
    return response
```

***

## DynamoDB Accelerator (DAX)

<Frame>
  <img src="https://mintlify.s3.us-west-1.amazonaws.com/devweeekends/images/aws/dax-architecture.svg" alt="DAX Architecture" />
</Frame>

DAX is an in-memory cache for DynamoDB, providing microsecond response times.

```
┌────────────────────────────────────────────────────────────────────────┐
│                    DAX (DynamoDB Accelerator)                           │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Application ──► DAX Cluster ──► DynamoDB                             │
│                      │                                                  │
│                      └── In-memory cache                               │
│                          (microsecond latency)                         │
│                                                                         │
│   Cache Types:                                                          │
│   ────────────                                                          │
│   • Item Cache: Individual GetItem results (5-min default TTL)         │
│   • Query Cache: Query/Scan results (5-min default TTL)                │
│                                                                         │
│   Performance:                                                          │
│   • DynamoDB: 1-10 milliseconds                                        │
│   • DAX: ~400 microseconds (up to 10x faster)                          │
│                                                                         │
│   When to Use:                                                          │
│   ✓ Read-heavy workloads                                               │
│   ✓ Same items read repeatedly                                          │
│   ✓ Microsecond response time required                                 │
│   ✗ Write-heavy workloads (no benefit)                                 │
│   ✗ Strongly consistent reads required                                 │
│                                                                         │
│   Pricing:                                                              │
│   • dax.r5.large: ~$0.269/hour (~$194/month)                           │
│   • Minimum 3 nodes for production (multi-AZ)                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

```python theme={null}
# Using DAX (drop-in replacement for DynamoDB)
import amazondax
import boto3

# Create DAX client (same API as boto3 DynamoDB resource)
dax_endpoint = 'my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
dax = amazondax.AmazonDaxClient.resource(endpoint_url=dax_endpoint)
table = dax.Table('EcommerceTable')

# Use exactly like DynamoDB
response = table.get_item(
    Key={'PK': 'CUSTOMER#C001', 'SK': 'PROFILE'}
)
# First call: ~1ms (cache miss, hits DynamoDB)
# Subsequent calls: ~0.4ms (cache hit)
```

***

## DynamoDB Streams

Capture item-level changes for event-driven architectures.

```
┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Streams                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   DynamoDB Table ──► Stream ──► Lambda / Kinesis / Application         │
│                                                                         │
│   Stream Record Types:                                                  │
│   ────────────────────                                                  │
│   • KEYS_ONLY: Only partition and sort key                             │
│   • NEW_IMAGE: Entire item after modification                          │
│   • OLD_IMAGE: Entire item before modification                         │
│   • NEW_AND_OLD_IMAGES: Both before and after                          │
│                                                                         │
│   Use Cases:                                                            │
│   ────────────                                                          │
│   • Real-time analytics                                                │
│   • Cross-region replication                                           │
│   • Materialized views                                                 │
│   • Search index synchronization (OpenSearch)                          │
│   • Audit logging                                                      │
│   • Event-driven workflows                                             │
│                                                                         │
│   Retention: 24 hours                                                  │
│   Ordering: Per-partition ordering guaranteed                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

```python theme={null}
# Lambda function processing DynamoDB Stream
def lambda_handler(event, context):
    for record in event['Records']:
        event_name = record['eventName']  # INSERT, MODIFY, REMOVE
        
        if event_name == 'INSERT':
            new_item = record['dynamodb']['NewImage']
            # Process new item
            print(f"New item: {new_item}")
            
        elif event_name == 'MODIFY':
            old_item = record['dynamodb']['OldImage']
            new_item = record['dynamodb']['NewImage']
            # Compare and process changes
            print(f"Modified: {old_item} -> {new_item}")
            
        elif event_name == 'REMOVE':
            old_item = record['dynamodb']['OldImage']
            # Handle deletion
            print(f"Deleted: {old_item}")
    
    return {'statusCode': 200}
```

***

## Best Practices

### Performance Optimization

<CardGroup cols={2}>
  <Card title="Even Key Distribution" icon="chart-pie">
    Use high-cardinality partition keys to avoid hot partitions
  </Card>

  <Card title="Sparse Indexes" icon="filter">
    Only include items with indexed attributes in GSIs
  </Card>

  <Card title="Projection Carefully" icon="list">
    Only project needed attributes to GSIs (reduce WCU)
  </Card>

  <Card title="Use BatchGetItem" icon="layer-group">
    Batch reads instead of multiple GetItem calls
  </Card>
</CardGroup>

### Cost Optimization

```python theme={null}
cost_tips = {
    "capacity_mode": [
        "Use On-Demand for unpredictable workloads",
        "Use Provisioned with Auto Scaling for steady traffic",
        "Use Reserved Capacity for 77% savings on provisioned",
    ],
    "data_modeling": [
        "Use single-table design to reduce table count",
        "Project only needed attributes to GSIs",
        "Use sparse GSIs (items without attributes aren't indexed)",
    ],
    "operations": [
        "Use eventually consistent reads (50% cheaper)",
        "Batch operations to reduce request overhead",
        "Enable TTL for auto-expiring data (free deletions)",
    ],
    "storage": [
        "Compress large attributes before storing",
        "Use S3 for objects > 400 KB, store reference in DynamoDB",
        "Delete unused GSIs",
    ]
}
```

***

## 🎯 Interview Questions

<AccordionGroup>
  <Accordion title="Q1: When would you choose DynamoDB over RDS?">
    **Choose DynamoDB when:**

    * Predictable, single-digit millisecond latency at any scale
    * Simple access patterns (key-value or document)
    * Massive scale requirements (millions of requests/second)
    * Serverless architecture
    * Global distribution needed (Global Tables)

    **Choose RDS when:**

    * Complex queries with JOINs
    * Strong ACID requirements across tables
    * Existing SQL skills/codebase
    * Complex reporting needs
  </Accordion>

  <Accordion title="Q2: How do you handle hot partitions?">
    **Prevention strategies:**

    1. Use high-cardinality partition keys
    2. Add random suffix (write sharding)
    3. Use composite keys to distribute writes

    **Example - Order ID with random suffix:**

    ```python theme={null}
    # Instead of: ORDER#12345
    # Use: ORDER#12345#7 (random 0-9)
    ```

    **When reading, query all shards in parallel and merge results.**
  </Accordion>

  <Accordion title="Q3: Explain GSI vs LSI trade-offs">
    **GSI:**

    * Flexible (any partition/sort key)
    * Own capacity (no throttling impact on base table)
    * Eventually consistent only
    * Can be added/removed anytime

    **LSI:**

    * Must share partition key with base table
    * Shares capacity (can throttle base table)
    * Strongly consistent available
    * Must be defined at table creation
    * 10 GB partition limit

    **Recommendation:** Prefer GSIs unless you need strongly consistent reads on alternate sort key.
  </Accordion>

  <Accordion title="Q4: How do you implement pagination in DynamoDB?">
    ```python theme={null}
    def paginated_query(table, pk_value, page_size=20, last_key=None):
        params = {
            'KeyConditionExpression': Key('PK').eq(pk_value),
            'Limit': page_size
        }
        
        if last_key:
            params['ExclusiveStartKey'] = last_key
        
        response = table.query(**params)
        
        return {
            'items': response['Items'],
            'last_key': response.get('LastEvaluatedKey')  # None if no more pages
        }
    ```

    **Key points:**

    * Use `Limit` for page size
    * Use `ExclusiveStartKey` for continuation
    * `LastEvaluatedKey` indicates more pages exist
  </Accordion>

  <Accordion title="Q5: Design a DynamoDB schema for a social media app">
    **Access patterns:**

    * Get user profile
    * Get user's posts
    * Get user's followers
    * Get user's following
    * Get feed (posts from following)

    **Single-table design:**

    ```
    PK                  | SK                  | Data
    --------------------|---------------------|------------------
    USER#alice          | PROFILE             | {name, bio, ...}
    USER#alice          | POST#2024-01-15#001 | {content, likes}
    USER#alice          | FOLLOWER#bob        | {followed_at}
    USER#alice          | FOLLOWING#charlie   | {followed_at}

    GSI1:
    GSI1PK              | GSI1SK              | For
    --------------------|---------------------|------------------
    FOLLOWING#charlie   | 2024-01-15#POST#001 | Feed aggregation
    ```
  </Accordion>
</AccordionGroup>

***

## 🧪 Hands-On Lab

<Steps>
  <Step title="Create DynamoDB Table">
    Create a table with composite primary key and enable Streams
  </Step>

  <Step title="Implement Single-Table Design">
    Model users, orders, and order items in a single table
  </Step>

  <Step title="Create GSI">
    Add a GSI for querying orders by status
  </Step>

  <Step title="Implement Transactions">
    Build an atomic order placement with inventory check
  </Step>

  <Step title="Process Streams">
    Create Lambda to process DynamoDB Streams for real-time updates
  </Step>
</Steps>

***

## Next Module

<Card title="AWS Lambda" icon="bolt" href="/aws/lambda">
  Master serverless compute with AWS Lambda
</Card>