Amazon DynamoDB - Dev Weekends

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate-Advanced | Prerequisites: Core Concepts

DynamoDB is AWS’s fully managed NoSQL database designed for single-digit millisecond performance at any scale. This module covers everything from data modeling to advanced patterns used in production systems. What You’ll Learn:

DynamoDB fundamentals and architecture
Data modeling and access patterns
Primary keys, GSIs, and LSIs
Capacity modes (On-Demand vs Provisioned)
Transactions and consistency models
DynamoDB Accelerator (DAX)
Streams and change data capture
Performance optimization and cost management

Why DynamoDB?

Fully Managed

No servers to manage, automatic scaling, built-in backup and restore

Single-Digit Milliseconds

Consistent performance at any scale, from 1 to millions of requests/second

Serverless

Pay-per-request pricing, no idle capacity charges with On-Demand mode

Global Tables

Multi-region, active-active replication for global applications

DynamoDB Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                        DynamoDB Table                            │  │
│   │                                                                  │  │
│   │   Table: Orders                                                  │  │
│   │   ─────────────────────────────────────────────────────────────  │  │
│   │                                                                  │  │
│   │   Partition Key (PK): customer_id                               │  │
│   │   Sort Key (SK): order_date#order_id                            │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 1 (customer_id = "C001")                      │  │  │
│   │   │  ├── 2024-01-15#ORD001 → {amount: 99.99, status: "done"}│  │  │
│   │   │  ├── 2024-01-20#ORD002 → {amount: 149.99, status: "new"}│  │  │
│   │   │  └── 2024-02-01#ORD003 → {amount: 29.99, status: "done"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   │   ┌──────────────────────────────────────────────────────────┐  │  │
│   │   │  Partition 2 (customer_id = "C002")                      │  │  │
│   │   │  ├── 2024-01-10#ORD004 → {amount: 59.99, status: "done"}│  │  │
│   │   │  └── 2024-01-25#ORD005 → {amount: 199.99, status: "new"}│  │  │
│   │   └──────────────────────────────────────────────────────────┘  │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Data Distribution:                                                    │
│   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐                   │
│   │ Partition 1 │  │ Partition 2 │  │ Partition N │                   │
│   │ (10 GB max) │  │ (10 GB max) │  │ (10 GB max) │                   │
│   │   3 AZs     │  │   3 AZs     │  │   3 AZs     │                   │
│   └─────────────┘  └─────────────┘  └─────────────┘                   │
│         │                │                │                            │
│         └────────────────┴────────────────┘                            │
│                          │                                              │
│                  Automatic Replication                                  │
│                  (3 copies, multi-AZ)                                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Core Concepts

Primary Keys

DynamoDB supports two types of primary keys:

┌────────────────────────────────────────────────────────────────────────┐
│                      Primary Key Types                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   1. PARTITION KEY (Simple Primary Key)                                │
│   ─────────────────────────────────────                                │
│   • Single attribute                                                    │
│   • Must be unique across all items                                    │
│   • Used to determine physical partition                               │
│                                                                         │
│   Example: user_id (unique per user)                                   │
│   ┌───────────┬────────────────────────────┐                           │
│   │ user_id   │ data                       │                           │
│   ├───────────┼────────────────────────────┤                           │
│   │ U001      │ {name: "Alice", age: 30}  │                           │
│   │ U002      │ {name: "Bob", age: 25}    │                           │
│   └───────────┴────────────────────────────┘                           │
│                                                                         │
│   2. COMPOSITE PRIMARY KEY (Partition + Sort Key)                      │
│   ─────────────────────────────────────────────                        │
│   • Two attributes: partition key + sort key                           │
│   • Partition key doesn't need to be unique                            │
│   • Combination must be unique                                         │
│   • Enables range queries on sort key                                  │
│                                                                         │
│   Example: customer_id (PK) + order_date (SK)                          │
│   ┌─────────────┬─────────────┬──────────────────────┐                 │
│   │ customer_id │ order_date  │ data                 │                 │
│   ├─────────────┼─────────────┼──────────────────────┤                 │
│   │ C001        │ 2024-01-15  │ {amount: 99.99}     │                 │
│   │ C001        │ 2024-01-20  │ {amount: 149.99}    │                 │
│   │ C002        │ 2024-01-10  │ {amount: 59.99}     │                 │
│   └─────────────┴─────────────┴──────────────────────┘                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Data Types

# DynamoDB Attribute Types
data_types = {
    # Scalar Types
    "S": "String",              # "Hello World"
    "N": "Number",              # "123.45" (sent as string)
    "B": "Binary",              # base64-encoded binary
    "BOOL": "Boolean",          # true/false
    "NULL": "Null",             # null
    
    # Document Types
    "M": "Map",                 # {"key": {"S": "value"}}
    "L": "List",                # [{"S": "a"}, {"N": "1"}]
    
    # Set Types (unique elements, same type)
    "SS": "String Set",         # ["a", "b", "c"]
    "NS": "Number Set",         # ["1", "2", "3"]
    "BS": "Binary Set",         # [binary1, binary2]
}

# Example Item
order_item = {
    "PK": {"S": "CUSTOMER#C001"},
    "SK": {"S": "ORDER#2024-01-15#ORD001"},
    "order_id": {"S": "ORD001"},
    "customer_id": {"S": "C001"},
    "amount": {"N": "99.99"},
    "items": {"L": [
        {"M": {"product": {"S": "Widget"}, "qty": {"N": "2"}}},
        {"M": {"product": {"S": "Gadget"}, "qty": {"N": "1"}}}
    ]},
    "status": {"S": "COMPLETED"},
    "tags": {"SS": ["express", "gift-wrapped"]},
    "created_at": {"S": "2024-01-15T10:30:00Z"}
}

Data Modeling Patterns

Single-Table Design

Best Practice: Use single-table design for related entities. This enables fetching all related data in a single query, reducing latency and cost.

┌────────────────────────────────────────────────────────────────────────┐
│                    Single-Table Design Example                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   E-Commerce: Customers, Orders, Order Items in ONE table              │
│                                                                         │
│   ┌─────────────────────┬─────────────────────────┬─────────────────┐  │
│   │ PK                  │ SK                      │ Attributes      │  │
│   ├─────────────────────┼─────────────────────────┼─────────────────┤  │
│   │ CUSTOMER#C001       │ PROFILE                 │ name, email...  │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-15#O001   │ total, status   │  │
│   │ CUSTOMER#C001       │ ORDER#2024-01-20#O002   │ total, status   │  │
│   │ ORDER#O001          │ ITEM#1                  │ product, qty    │  │
│   │ ORDER#O001          │ ITEM#2                  │ product, qty    │  │
│   │ ORDER#O002          │ ITEM#1                  │ product, qty    │  │
│   │ PRODUCT#P001        │ METADATA                │ name, price     │  │
│   │ PRODUCT#P001        │ INVENTORY               │ stock, location │  │
│   └─────────────────────┴─────────────────────────┴─────────────────┘  │
│                                                                         │
│   Access Patterns Enabled:                                              │
│   • Get customer profile: PK = "CUSTOMER#C001", SK = "PROFILE"         │
│   • Get all customer orders: PK = "CUSTOMER#C001", SK begins "ORDER#"  │
│   • Get order items: PK = "ORDER#O001", SK begins "ITEM#"              │
│   • Get product info: PK = "PRODUCT#P001"                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Access Pattern First Design

# Step 1: Define Access Patterns
access_patterns = [
    "Get customer by ID",
    "Get all orders for a customer",
    "Get order details with items",
    "Get orders by status (e.g., 'pending')",
    "Get orders in date range",
    "Get product inventory",
]

# Step 2: Design Keys Based on Patterns
table_design = {
    "table_name": "EcommerceTable",
    "primary_key": {
        "PK": "Partition Key (entity type + ID)",
        "SK": "Sort Key (relationship + details)"
    },
    "gsi1": {
        "GSI1PK": "For alternate access patterns",
        "GSI1SK": "Enable range queries"
    }
}

# Step 3: Define Key Patterns
key_patterns = {
    "Customer": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "PROFILE"
    },
    "Order": {
        "PK": "CUSTOMER#<customer_id>",
        "SK": "ORDER#<date>#<order_id>",
        "GSI1PK": "STATUS#<status>",
        "GSI1SK": "<date>#<order_id>"
    },
    "OrderItem": {
        "PK": "ORDER#<order_id>",
        "SK": "ITEM#<item_number>"
    }
}

Secondary Indexes

Global Secondary Index (GSI)

┌────────────────────────────────────────────────────────────────────────┐
│                    Global Secondary Index (GSI)                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Different partition key and/or sort key from base table            │
│   • Separate throughput capacity (own RCU/WCU)                         │
│   • Can be created/deleted anytime                                      │
│   • Eventually consistent reads only                                    │
│   • Maximum 20 GSIs per table                                          │
│                                                                         │
│   BASE TABLE                           GSI: StatusDateIndex            │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    GSI-PK: STATUS#pending          │
│   SK: ORDER#2024-01-15#O001            GSI-SK: 2024-01-15#O001         │
│   status: pending                      customer_id: C001               │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Find all pending orders sorted by date"                   │
│   Query: GSI-PK = "STATUS#pending", GSI-SK > "2024-01-01"              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Local Secondary Index (LSI)

┌────────────────────────────────────────────────────────────────────────┐
│                    Local Secondary Index (LSI)                          │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   • Same partition key, different sort key                             │
│   • Shares throughput with base table                                  │
│   • Must be created at table creation time                             │
│   • Strongly consistent reads available                                │
│   • Maximum 5 LSIs per table                                           │
│   • 10 GB limit per partition (includes all LSIs)                      │
│                                                                         │
│   BASE TABLE                           LSI: AmountIndex                │
│   ─────────────────────────────────    ──────────────────────────────  │
│   PK: CUSTOMER#C001                    PK: CUSTOMER#C001 (same)        │
│   SK: ORDER#2024-01-15#O001            LSI-SK: 99.99 (amount)          │
│   amount: 99.99                        order_id: O001                  │
│   ─────────────────────────────────    ──────────────────────────────  │
│                                                                         │
│   Use Case: "Get customer's highest-value orders"                      │
│   Query: PK = "CUSTOMER#C001", ordered by amount (descending)          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

GSI vs LSI Comparison

Feature	GSI	LSI
Partition Key	Different from base table	Same as base table
Sort Key	Different from base table	Different from base table
Capacity	Separate RCU/WCU	Shared with base table
Creation	Anytime	Table creation only
Consistency	Eventually consistent only	Strong or eventual
Limit	20 per table	5 per table
Size Limit	None	10 GB per partition

Capacity Modes

On-Demand vs Provisioned

┌────────────────────────────────────────────────────────────────────────┐
│                    Capacity Mode Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ON-DEMAND MODE                        PROVISIONED MODE               │
│   ────────────────                      ────────────────               │
│                                                                         │
│   ✓ Pay per request                     ✓ Pay per capacity unit/hour   │
│   ✓ Auto-scales instantly               ✓ Reserve capacity (cheaper)   │
│   ✓ No capacity planning                ✓ Predictable costs            │
│   ✓ No throttling (mostly)              ✓ Auto Scaling available       │
│                                                                         │
│   Best For:                             Best For:                      │
│   • Unpredictable traffic               • Predictable, steady traffic  │
│   • New applications                    • Cost optimization            │
│   • Spiky workloads                     • High-volume applications     │
│   • Dev/test environments               • Reserved capacity discount   │
│                                                                         │
│   Pricing (us-east-1):                  Pricing (us-east-1):           │
│   • $1.25 per million WRU               • $0.00065 per WCU/hour        │
│   • $0.25 per million RRU               • $0.00013 per RCU/hour        │
│                                                                         │
│   Cost Example (1M writes/day):                                        │
│   On-Demand: $1.25/day = $37.50/month                                  │
│   Provisioned: ~12 WCU = $5.62/month (84% savings)                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Capacity Units Explained

# Read Capacity Units (RCU)
# 1 RCU = 1 strongly consistent read per second for item up to 4 KB
# 1 RCU = 2 eventually consistent reads per second for item up to 4 KB

def calculate_rcu(item_size_kb: float, reads_per_second: int, 
                  consistent: bool = False) -> int:
    """Calculate required RCUs."""
    # Round up to nearest 4 KB
    size_units = math.ceil(item_size_kb / 4)
    
    if consistent:
        return size_units * reads_per_second
    else:  # Eventually consistent
        return math.ceil(size_units * reads_per_second / 2)

# Write Capacity Units (WCU)
# 1 WCU = 1 write per second for item up to 1 KB

def calculate_wcu(item_size_kb: float, writes_per_second: int) -> int:
    """Calculate required WCUs."""
    # Round up to nearest 1 KB
    size_units = math.ceil(item_size_kb)
    return size_units * writes_per_second

# Examples
print(calculate_rcu(8, 100, consistent=True))   # 200 RCU
print(calculate_rcu(8, 100, consistent=False))  # 100 RCU
print(calculate_wcu(2.5, 50))                   # 150 WCU

Operations

Basic CRUD Operations

import boto3
from boto3.dynamodb.conditions import Key, Attr

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('EcommerceTable')

# CREATE - PutItem
def create_order(customer_id: str, order_id: str, amount: float):
    table.put_item(
        Item={
            'PK': f'CUSTOMER#{customer_id}',
            'SK': f'ORDER#{datetime.now().isoformat()}#{order_id}',
            'order_id': order_id,
            'customer_id': customer_id,
            'amount': Decimal(str(amount)),
            'status': 'PENDING',
            'created_at': datetime.now().isoformat()
        },
        ConditionExpression='attribute_not_exists(PK)'  # Prevent overwrite
    )

# READ - GetItem (single item, by primary key)
def get_customer(customer_id: str):
    response = table.get_item(
        Key={
            'PK': f'CUSTOMER#{customer_id}',
            'SK': 'PROFILE'
        },
        ConsistentRead=True  # Optional: strongly consistent read
    )
    return response.get('Item')

# READ - Query (multiple items, same partition)
def get_customer_orders(customer_id: str, limit: int = 20):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        ScanIndexForward=False,  # Descending order
        Limit=limit
    )
    return response['Items']

# READ - Query with filter (filter applied AFTER read)
def get_pending_orders(customer_id: str):
    response = table.query(
        KeyConditionExpression=Key('PK').eq(f'CUSTOMER#{customer_id}') & 
                              Key('SK').begins_with('ORDER#'),
        FilterExpression=Attr('status').eq('PENDING')
    )
    return response['Items']

# UPDATE - UpdateItem
def update_order_status(customer_id: str, sk: str, new_status: str):
    response = table.update_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        UpdateExpression='SET #status = :status, updated_at = :updated',
        ExpressionAttributeNames={'#status': 'status'},  # 'status' is reserved
        ExpressionAttributeValues={
            ':status': new_status,
            ':updated': datetime.now().isoformat()
        },
        ConditionExpression='attribute_exists(PK)',  # Ensure item exists
        ReturnValues='ALL_NEW'
    )
    return response['Attributes']

# DELETE - DeleteItem
def delete_order(customer_id: str, sk: str):
    table.delete_item(
        Key={'PK': f'CUSTOMER#{customer_id}', 'SK': sk},
        ConditionExpression='#status <> :completed',
        ExpressionAttributeNames={'#status': 'status'},
        ExpressionAttributeValues={':completed': 'COMPLETED'}
    )

Batch Operations

# BatchWriteItem - Up to 25 items, 16 MB max
def batch_create_items(items: list):
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    # Handles retries for unprocessed items automatically

# BatchGetItem - Up to 100 items, 16 MB max
def batch_get_customers(customer_ids: list):
    keys = [
        {'PK': f'CUSTOMER#{cid}', 'SK': 'PROFILE'}
        for cid in customer_ids
    ]
    
    response = dynamodb.batch_get_item(
        RequestItems={
            'EcommerceTable': {
                'Keys': keys,
                'ProjectionExpression': 'customer_id, #name, email',
                'ExpressionAttributeNames': {'#name': 'name'}
            }
        }
    )
    return response['Responses']['EcommerceTable']

Transactions

DynamoDB supports ACID transactions across multiple items and tables.

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Transactions                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   TransactWriteItems:                                                   │
│   • Up to 100 items per transaction                                    │
│   • All-or-nothing execution                                           │
│   • Supported actions: Put, Update, Delete, ConditionCheck             │
│                                                                         │
│   TransactGetItems:                                                     │
│   • Up to 100 items per transaction                                    │
│   • Serializable isolation                                             │
│                                                                         │
│   Cost: 2x the cost of standard writes (for durability)                │
│                                                                         │
│   Example: Transfer funds between accounts                              │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Transaction:                                                    │  │
│   │  1. Check source balance >= amount (ConditionCheck)             │  │
│   │  2. Deduct from source account (Update)                         │  │
│   │  3. Add to destination account (Update)                         │  │
│   │  4. Create transfer record (Put)                                │  │
│   │                                                                  │  │
│   │  All succeed → Committed                                        │  │
│   │  Any fails → All rolled back                                    │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

from boto3.dynamodb.types import TypeSerializer
from decimal import Decimal

def transfer_funds(source_id: str, dest_id: str, amount: Decimal):
    """Transfer funds atomically between accounts."""
    
    client = boto3.client('dynamodb')
    serializer = TypeSerializer()
    
    response = client.transact_write_items(
        TransactItems=[
            # 1. Condition check: source has sufficient balance
            {
                'ConditionCheck': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'ConditionExpression': 'balance >= :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 2. Deduct from source
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{source_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance - :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 3. Add to destination
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {
                        'PK': {'S': f'ACCOUNT#{dest_id}'},
                        'SK': {'S': 'BALANCE'}
                    },
                    'UpdateExpression': 'SET balance = balance + :amount',
                    'ExpressionAttributeValues': {
                        ':amount': serializer.serialize(amount)
                    }
                }
            },
            # 4. Record the transfer
            {
                'Put': {
                    'TableName': 'Transfers',
                    'Item': {
                        'PK': {'S': f'TRANSFER#{uuid.uuid4()}'},
                        'SK': {'S': datetime.now().isoformat()},
                        'source': {'S': source_id},
                        'destination': {'S': dest_id},
                        'amount': serializer.serialize(amount),
                        'status': {'S': 'COMPLETED'}
                    }
                }
            }
        ]
    )
    return response

DynamoDB Accelerator (DAX)

DAX is an in-memory cache for DynamoDB, providing microsecond response times.

┌────────────────────────────────────────────────────────────────────────┐
│                    DAX (DynamoDB Accelerator)                           │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Application ──► DAX Cluster ──► DynamoDB                             │
│                      │                                                  │
│                      └── In-memory cache                               │
│                          (microsecond latency)                         │
│                                                                         │
│   Cache Types:                                                          │
│   ────────────                                                          │
│   • Item Cache: Individual GetItem results (5-min default TTL)         │
│   • Query Cache: Query/Scan results (5-min default TTL)                │
│                                                                         │
│   Performance:                                                          │
│   • DynamoDB: 1-10 milliseconds                                        │
│   • DAX: ~400 microseconds (up to 10x faster)                          │
│                                                                         │
│   When to Use:                                                          │
│   ✓ Read-heavy workloads                                               │
│   ✓ Same items read repeatedly                                          │
│   ✓ Microsecond response time required                                 │
│   ✗ Write-heavy workloads (no benefit)                                 │
│   ✗ Strongly consistent reads required                                 │
│                                                                         │
│   Pricing:                                                              │
│   • dax.r5.large: ~$0.269/hour (~$194/month)                           │
│   • Minimum 3 nodes for production (multi-AZ)                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

# Using DAX (drop-in replacement for DynamoDB)
import amazondax
import boto3

# Create DAX client (same API as boto3 DynamoDB resource)
dax_endpoint = 'my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'
dax = amazondax.AmazonDaxClient.resource(endpoint_url=dax_endpoint)
table = dax.Table('EcommerceTable')

# Use exactly like DynamoDB
response = table.get_item(
    Key={'PK': 'CUSTOMER#C001', 'SK': 'PROFILE'}
)
# First call: ~1ms (cache miss, hits DynamoDB)
# Subsequent calls: ~0.4ms (cache hit)

DynamoDB Streams

Capture item-level changes for event-driven architectures.

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Streams                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   DynamoDB Table ──► Stream ──► Lambda / Kinesis / Application         │
│                                                                         │
│   Stream Record Types:                                                  │
│   ────────────────────                                                  │
│   • KEYS_ONLY: Only partition and sort key                             │
│   • NEW_IMAGE: Entire item after modification                          │
│   • OLD_IMAGE: Entire item before modification                         │
│   • NEW_AND_OLD_IMAGES: Both before and after                          │
│                                                                         │
│   Use Cases:                                                            │
│   ────────────                                                          │
│   • Real-time analytics                                                │
│   • Cross-region replication                                           │
│   • Materialized views                                                 │
│   • Search index synchronization (OpenSearch)                          │
│   • Audit logging                                                      │
│   • Event-driven workflows                                             │
│                                                                         │
│   Retention: 24 hours                                                  │
│   Ordering: Per-partition ordering guaranteed                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

# Lambda function processing DynamoDB Stream
def lambda_handler(event, context):
    for record in event['Records']:
        event_name = record['eventName']  # INSERT, MODIFY, REMOVE
        
        if event_name == 'INSERT':
            new_item = record['dynamodb']['NewImage']
            # Process new item
            print(f"New item: {new_item}")
            
        elif event_name == 'MODIFY':
            old_item = record['dynamodb']['OldImage']
            new_item = record['dynamodb']['NewImage']
            # Compare and process changes
            print(f"Modified: {old_item} -> {new_item}")
            
        elif event_name == 'REMOVE':
            old_item = record['dynamodb']['OldImage']
            # Handle deletion
            print(f"Deleted: {old_item}")
    
    return {'statusCode': 200}

Best Practices

Performance Optimization

Even Key Distribution

Use high-cardinality partition keys to avoid hot partitions

Sparse Indexes

Only include items with indexed attributes in GSIs

Projection Carefully

Only project needed attributes to GSIs (reduce WCU)

Use BatchGetItem

Batch reads instead of multiple GetItem calls

Cost Optimization

cost_tips = {
    "capacity_mode": [
        "Use On-Demand for unpredictable workloads",
        "Use Provisioned with Auto Scaling for steady traffic",
        "Use Reserved Capacity for 77% savings on provisioned",
    ],
    "data_modeling": [
        "Use single-table design to reduce table count",
        "Project only needed attributes to GSIs",
        "Use sparse GSIs (items without attributes aren't indexed)",
    ],
    "operations": [
        "Use eventually consistent reads (50% cheaper)",
        "Batch operations to reduce request overhead",
        "Enable TTL for auto-expiring data (free deletions)",
    ],
    "storage": [
        "Compress large attributes before storing",
        "Use S3 for objects > 400 KB, store reference in DynamoDB",
        "Delete unused GSIs",
    ]
}

🎯 Interview Questions

Q1: When would you choose DynamoDB over RDS?

Choose DynamoDB when:

Predictable, single-digit millisecond latency at any scale
Simple access patterns (key-value or document)
Massive scale requirements (millions of requests/second)
Serverless architecture
Global distribution needed (Global Tables)

Choose RDS when:

Complex queries with JOINs
Strong ACID requirements across tables
Existing SQL skills/codebase
Complex reporting needs

Q2: How do you handle hot partitions?

Prevention strategies:

Use high-cardinality partition keys
Add random suffix (write sharding)
Use composite keys to distribute writes

Example - Order ID with random suffix:

# Instead of: ORDER#12345
# Use: ORDER#12345#7 (random 0-9)

When reading, query all shards in parallel and merge results.

Q3: Explain GSI vs LSI trade-offs

GSI:

Flexible (any partition/sort key)
Own capacity (no throttling impact on base table)
Eventually consistent only
Can be added/removed anytime

LSI:

Must share partition key with base table
Shares capacity (can throttle base table)
Strongly consistent available
Must be defined at table creation
10 GB partition limit

Recommendation: Prefer GSIs unless you need strongly consistent reads on alternate sort key.

Q4: How do you implement pagination in DynamoDB?

def paginated_query(table, pk_value, page_size=20, last_key=None):
    params = {
        'KeyConditionExpression': Key('PK').eq(pk_value),
        'Limit': page_size
    }
    
    if last_key:
        params['ExclusiveStartKey'] = last_key
    
    response = table.query(**params)
    
    return {
        'items': response['Items'],
        'last_key': response.get('LastEvaluatedKey')  # None if no more pages
    }

Key points:

Use Limit for page size
Use ExclusiveStartKey for continuation
LastEvaluatedKey indicates more pages exist

Q5: Design a DynamoDB schema for a social media app

Access patterns:

Get user profile
Get user’s posts
Get user’s followers
Get user’s following
Get feed (posts from following)

Single-table design:

PK                  | SK                  | Data
--------------------|---------------------|------------------
USER#alice          | PROFILE             | {name, bio, ...}
USER#alice          | POST#2024-01-15#001 | {content, likes}
USER#alice          | FOLLOWER#bob        | {followed_at}
USER#alice          | FOLLOWING#charlie   | {followed_at}

GSI1:
GSI1PK              | GSI1SK              | For
--------------------|---------------------|------------------
FOLLOWING#charlie   | 2024-01-15#POST#001 | Feed aggregation

🧪 Hands-On Lab

Create DynamoDB Table

Create a table with composite primary key and enable Streams

Implement Single-Table Design

Model users, orders, and order items in a single table

Create GSI

Add a GSI for querying orders by status

Implement Transactions

Build an atomic order placement with inventory check

Process Streams

Create Lambda to process DynamoDB Streams for real-time updates

Next Module

AWS Lambda

Master serverless compute with AWS Lambda

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Module Overview

​Why DynamoDB?

Fully Managed

Single-Digit Milliseconds

Serverless

Global Tables

​DynamoDB Architecture

​Core Concepts

​Primary Keys

​Data Types

​Data Modeling Patterns

​Single-Table Design

​Access Pattern First Design

​Secondary Indexes

​Global Secondary Index (GSI)

​Local Secondary Index (LSI)

​GSI vs LSI Comparison

​Capacity Modes

​On-Demand vs Provisioned

​Capacity Units Explained

​Operations

​Basic CRUD Operations

​Batch Operations

​Transactions

​DynamoDB Accelerator (DAX)

​DynamoDB Streams

​Best Practices

​Performance Optimization

Even Key Distribution

Sparse Indexes

Projection Carefully

Use BatchGetItem

​Cost Optimization

​🎯 Interview Questions

​🧪 Hands-On Lab

​Next Module

AWS Lambda

Module Overview

Why DynamoDB?

DynamoDB Architecture

Core Concepts

Primary Keys

Data Types

Data Modeling Patterns

Single-Table Design

Access Pattern First Design

Secondary Indexes

Global Secondary Index (GSI)

Local Secondary Index (LSI)

GSI vs LSI Comparison

Capacity Modes

On-Demand vs Provisioned

Capacity Units Explained

Operations

Basic CRUD Operations

Batch Operations

Transactions

DynamoDB Accelerator (DAX)

DynamoDB Streams

Best Practices

Performance Optimization

Cost Optimization

🎯 Interview Questions

🧪 Hands-On Lab

Next Module