Storage & Databases - Dev Weekends

Module Overview

Estimated Time: 5-6 hours | Difficulty: Intermediate | Prerequisites: Core Concepts, Compute

This module covers all AWS storage and database services. You’ll learn data modeling, performance optimization, cost management, and when to use each service. What You’ll Learn:

S3 storage classes, lifecycle policies, and security
EBS volumes and snapshots for EC2
EFS for shared file systems
RDS and Aurora for relational databases
DynamoDB for NoSQL at scale
ElastiCache for sub-millisecond response times
Data migration strategies

Storage Service Selection Guide

┌────────────────────────────────────────────────────────────────────────┐
│                  AWS Storage Decision Tree                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   What type of data access pattern?                                     │
│         │                                                               │
│         ├─── Random access to blocks ──────────► EBS (EC2 volumes)     │
│         │    (Database, boot volumes)               │                   │
│         │                                           └── gp3: General   │
│         │                                           └── io2: High IOPS │
│         │                                                               │
│         ├─── Object/file storage ──────────────► S3 (Object Storage)   │
│         │    (Any size, any format)                 │                   │
│         │                                           └── Standard: Hot  │
│         │                                           └── IA: Warm       │
│         │                                           └── Glacier: Cold  │
│         │                                                               │
│         ├─── Shared file system (Linux) ───────► EFS (NFS)             │
│         │    (Multiple EC2, containers)                                 │
│         │                                                               │
│         └─── Shared file system (Windows) ─────► FSx for Windows       │
│              (AD integration, SMB)                                      │
│                                                                         │
│   DATABASE Decision:                                                    │
│   ──────────────────                                                    │
│         │                                                               │
│         ├─── Relational, complex queries ──────► RDS / Aurora          │
│         │    (Joins, transactions, ACID)                                │
│         │                                                               │
│         ├─── Key-value, high scale ────────────► DynamoDB              │
│         │    (Single-digit ms, serverless)                              │
│         │                                                               │
│         ├─── Document store ───────────────────► DocumentDB (MongoDB)  │
│         │                                                               │
│         ├─── Graph relationships ──────────────► Neptune              │
│         │                                                               │
│         └─── Caching layer ────────────────────► ElastiCache           │
│              (Redis/Memcached)                                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Storage Types Comparison

┌───────────────────────────────────────────────────────────────────────┐
│                      AWS Storage Types                                 │
├───────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   BLOCK STORAGE           OBJECT STORAGE          FILE STORAGE         │
│   ─────────────           ─────────────           ─────────────        │
│   ┌─────────────┐         ┌─────────────┐         ┌─────────────┐      │
│   │    EBS      │         │     S3      │         │    EFS      │      │
│   │  (Volumes)  │         │  (Buckets)  │         │   (NFS)     │      │
│   └─────────────┘         └─────────────┘         └─────────────┘      │
│         │                       │                       │              │
│   Characteristics:         Characteristics:       Characteristics:     │
│   • Like a hard disk       • Like Dropbox         • Like network share │
│   • Attach to 1 EC2        • HTTP access          • Multiple EC2s      │
│   • Single AZ              • Global access        • Multi-AZ           │
│   • Low latency            • Unlimited size       • POSIX compliant    │
│   • Boot volumes           • 11 9s durability     • Auto-scaling       │
│                                                                        │
│   Use Cases:               Use Cases:             Use Cases:           │
│   • Databases              • Static websites      • Content management │
│   • Enterprise apps        • Data lakes           • Web serving        │
│   • Boot/root volumes      • Backups/archives     • Container storage  │
│   • High-perf workloads    • Media hosting        • Dev environments   │
│                                                                        │
│   Durability & Performance:                                            │
│   ─────────────────────────                                            │
│   EBS: 99.999% (single AZ), <1ms latency                              │
│   S3:  99.999999999% (11 9s), ~100ms latency                          │
│   EFS: 99.999999999%, few ms latency                                  │
│                                                                        │
└───────────────────────────────────────────────────────────────────────┘

S3 (Simple Storage Service)

Object storage with unlimited capacity. The backbone of AWS storage and data lakes.

S3 Architecture Deep Dive

┌──────────────────────────────────────────────────────────────────────┐
│                         S3 Architecture                               │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   Bucket: my-company-data-prod (globally unique name)                │
│   Region: us-east-1                                                   │
│   ─────────────────────────────────────────────────────────────────  │
│                                                                       │
│   Object Structure:                                                   │
│   ┌─────────────────────────────────────────────────────────────┐    │
│   │  Key: images/2024/01/photo.jpg                               │    │
│   │  ┌────────────────────────────────────────────────────────┐ │    │
│   │  │  Value: [binary data up to 5 TB]                        │ │    │
│   │  │                                                         │ │    │
│   │  │  Metadata:                                              │ │    │
│   │  │  • System: Content-Type, Last-Modified, ETag            │ │    │
│   │  │  • User-defined: x-amz-meta-*                           │ │    │
│   │  │                                                         │ │    │
│   │  │  Version ID: 3sL4kqtJlcpXroDTDmJ+rmSpXd3dIbrHY+MTRCxf3  │ │    │
│   │  └────────────────────────────────────────────────────────┘ │    │
│   └─────────────────────────────────────────────────────────────┘    │
│                                                                       │
│   Limits & Capabilities:                                              │
│   • Max object size: 5 TB                                            │
│   • Max single PUT: 5 GB (use multipart for larger)                  │
│   • Max parts in multipart: 10,000                                   │
│   • Recommended multipart for > 100 MB                               │
│   • Unlimited objects per bucket                                      │
│   • Bucket names: 3-63 chars, globally unique                        │
│                                                                       │
│   Data Consistency (as of Dec 2020):                                 │
│   • Strong read-after-write consistency for all operations           │
│   • No eventual consistency delays anymore!                          │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

S3 Storage Classes Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                     S3 Storage Classes                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Storage Class     │ Access         │ Min Storage │ Retrieval │ Cost  │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Standard       │ Milliseconds   │ None        │ None      │ $$$$  │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Standard-IA    │ Milliseconds   │ 30 days     │ Per GB    │ $$    │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 One Zone-IA    │ Milliseconds   │ 30 days     │ Per GB    │ $     │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Intelligent    │ Milliseconds   │ None        │ None      │ Auto  │
│   Tiering           │                │             │ monitoring│       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Instant   │ Milliseconds   │ 90 days     │ Per GB    │ ¢¢    │
│   Retrieval         │                │             │           │       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Flexible  │ 1-5 min to     │ 90 days     │ Per GB +  │ ¢     │
│   Retrieval         │ 12 hours       │             │ per req   │       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Deep      │ 12-48 hours    │ 180 days    │ Per GB +  │ ¢     │
│   Archive           │                │             │ per req   │       │
│   ─────────────────────────────────────────────────────────────────────│
│                                                                         │
│   Cost Comparison (per GB/month, us-east-1):                           │
│   Standard:       $0.023                                               │
│   Standard-IA:    $0.0125 (+ $0.01/GB retrieval)                       │
│   One Zone-IA:    $0.01   (+ $0.01/GB retrieval)                       │
│   Glacier IR:     $0.004  (+ $0.03/GB retrieval)                       │
│   Glacier FR:     $0.0036 (+ $0.01-$0.03/GB retrieval)                 │
│   Deep Archive:   $0.00099 (+ $0.02/GB retrieval)                      │
│                                                                         │
│   USE INTELLIGENT TIERING when access patterns are unknown!            │
│   It automatically moves objects between tiers based on access.        │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

S3 Lifecycle Policies

# Example lifecycle policy configuration
import boto3
import json

s3 = boto3.client('s3')

lifecycle_policy = {
    "Rules": [
        {
            "ID": "MoveToIA30Days",
            "Status": "Enabled",
            "Filter": {"Prefix": "logs/"},
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER"
                },
                {
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            "Expiration": {
                "Days": 2555  # Delete after 7 years
            }
        },
        {
            "ID": "DeleteIncompleteMultipartUploads",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 7
            }
        },
        {
            "ID": "ExpireOldVersions",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},
            "NoncurrentVersionTransitions": [
                {
                    "NoncurrentDays": 30,
                    "StorageClass": "GLACIER"
                }
            ],
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 365
            }
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket='my-bucket',
    LifecycleConfiguration=lifecycle_policy
)

S3 Security Best Practices

┌────────────────────────────────────────────────────────────────────────┐
│                     S3 Security Layers                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Layer 1: BLOCK PUBLIC ACCESS (Account & Bucket Level)               │
│   ─────────────────────────────────────────────────────                │
│   Enable all four settings by default:                                 │
│   ✅ BlockPublicAcls                                                   │
│   ✅ IgnorePublicAcls                                                  │
│   ✅ BlockPublicPolicy                                                 │
│   ✅ RestrictPublicBuckets                                             │
│                                                                         │
│   Layer 2: BUCKET POLICY (Resource-Based)                              │
│   ────────────────────────────────────────                             │
│   {                                                                     │
│     "Version": "2012-10-17",                                           │
│     "Statement": [{                                                     │
│       "Sid": "EnforceTLS",                                             │
│       "Effect": "Deny",                                                │
│       "Principal": "*",                                                │
│       "Action": "s3:*",                                                │
│       "Resource": ["arn:aws:s3:::bucket/*"],                           │
│       "Condition": {                                                    │
│         "Bool": {"aws:SecureTransport": "false"}                       │
│       }                                                                 │
│     }]                                                                  │
│   }                                                                     │
│                                                                         │
│   Layer 3: IAM POLICIES (Identity-Based)                               │
│   ──────────────────────────────────────                               │
│   Attached to users, groups, or roles                                  │
│                                                                         │
│   Layer 4: ENCRYPTION                                                   │
│   ───────────────────                                                   │
│   • SSE-S3: AWS managed keys (default, free)                           │
│   • SSE-KMS: Customer managed keys (audit trail)                       │
│   • SSE-C: Customer provided keys                                      │
│   • Client-side: Encrypt before upload                                 │
│                                                                         │
│   Layer 5: ACCESS POINTS                                               │
│   ──────────────────────                                               │
│   Simplified access management for multi-tenant buckets               │
│                                                                         │
│   Layer 6: OBJECT LOCK                                                 │
│   ─────────────────────                                                │
│   WORM (Write Once Read Many) for compliance                          │
│   • Governance mode: Can be overridden with special permissions       │
│   • Compliance mode: Cannot be overridden by anyone                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

S3 Performance Optimization

import boto3
from concurrent.futures import ThreadPoolExecutor
import os

s3 = boto3.client('s3')

class S3Uploader:
    """High-performance S3 uploader with best practices."""
    
    def __init__(self, bucket: str):
        self.bucket = bucket
        # Configure transfer settings for optimal performance
        self.config = boto3.s3.transfer.TransferConfig(
            multipart_threshold=8 * 1024 * 1024,  # 8 MB
            max_concurrency=10,
            multipart_chunksize=8 * 1024 * 1024,  # 8 MB
            use_threads=True
        )
    
    def upload_file(self, local_path: str, s3_key: str):
        """Upload with automatic multipart handling."""
        s3.upload_file(
            local_path, 
            self.bucket, 
            s3_key,
            Config=self.config
        )
    
    def upload_large_file_optimized(self, local_path: str, s3_key: str):
        """
        Best practices for large file uploads:
        1. Use multipart upload (automatic for > 8 MB)
        2. Use S3 Transfer Acceleration for cross-region
        3. Use byte-range requests for downloads
        """
        # Enable Transfer Acceleration for the bucket first
        # aws s3api put-bucket-accelerate-configuration \
        #   --bucket my-bucket --accelerate-configuration Status=Enabled
        
        s3_accelerate = boto3.client(
            's3',
            endpoint_url='https://s3-accelerate.amazonaws.com'
        )
        
        s3_accelerate.upload_file(
            local_path,
            self.bucket,
            s3_key,
            Config=self.config
        )

# S3 Performance Tips:
# 1. Use random prefixes for high request rates (3,500+ PUT/s)
#    Instead of: logs/2024/01/15/file1.log
#    Use:        logs/a7b3c1d2/2024/01/15/file1.log
#
# 2. Enable S3 Transfer Acceleration for upload speeds
#    20-200% faster for long-distance transfers
#
# 3. Use byte-range fetches for parallel downloads
#
# 4. S3 can handle 5,500+ GET requests/second per prefix
#
# 5. Use S3 Select to retrieve only needed data (up to 400% faster)

import boto3
from datetime import datetime, timedelta

s3 = boto3.client('s3')

def generate_presigned_download_url(bucket: str, key: str, 
                                     expiry_seconds: int = 3600) -> str:
    """Generate a presigned URL for downloading."""
    url = s3.generate_presigned_url(
        'get_object',
        Params={
            'Bucket': bucket,
            'Key': key
        },
        ExpiresIn=expiry_seconds
    )
    return url

def generate_presigned_upload_url(bucket: str, key: str,
                                   content_type: str = 'application/octet-stream',
                                   expiry_seconds: int = 3600) -> dict:
    """Generate a presigned URL for uploading."""
    response = s3.generate_presigned_post(
        Bucket=bucket,
        Key=key,
        Fields={
            'Content-Type': content_type
        },
        Conditions=[
            {'Content-Type': content_type},
            ['content-length-range', 1, 104857600]  # 1 byte to 100 MB
        ],
        ExpiresIn=expiry_seconds
    )
    return response  # Returns {'url': ..., 'fields': {...}}

# Usage in API response
download_url = generate_presigned_download_url('my-bucket', 'reports/q1.pdf')
upload_info = generate_presigned_upload_url('my-bucket', 'uploads/user123/file.jpg')

EBS (Elastic Block Store)

Persistent block storage for EC2 instances. Like a high-performance SSD/HDD attached to your server.

EBS Volume Types Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                      EBS Volume Types                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   SSD-BASED (Random I/O):                                              │
│   ─────────────────────────                                            │
│                                                                         │
│   gp3 (General Purpose SSD) - RECOMMENDED DEFAULT                      │
│   ├── Baseline: 3,000 IOPS, 125 MB/s                                   │
│   ├── Max: 16,000 IOPS, 1,000 MB/s                                     │
│   ├── Size: 1 GB - 16 TB                                               │
│   ├── Cost: $0.08/GB + $0.005/IOPS (above 3K)                          │
│   └── Use: Boot volumes, most workloads                                │
│                                                                         │
│   gp2 (Previous Gen) - LEGACY                                          │
│   ├── IOPS: 3 IOPS/GB (burst to 3,000)                                 │
│   ├── Max: 16,000 IOPS                                                 │
│   └── Note: gp3 is usually more cost-effective                         │
│                                                                         │
│   io2 Block Express (Provisioned IOPS) - HIGHEST PERFORMANCE           │
│   ├── Max: 256,000 IOPS, 4,000 MB/s                                    │
│   ├── Latency: Sub-millisecond                                         │
│   ├── Durability: 99.999% (5 9s vs 99.9% for others)                   │
│   ├── Size: 4 GB - 64 TB                                               │
│   ├── Cost: $0.125/GB + $0.065/IOPS                                    │
│   └── Use: Databases (Oracle, SAP), critical I/O workloads            │
│                                                                         │
│   io1 (Provisioned IOPS) - LEGACY                                      │
│   └── Max: 64,000 IOPS, use io2 instead                                │
│                                                                         │
│   HDD-BASED (Sequential I/O):                                          │
│   ──────────────────────────                                           │
│                                                                         │
│   st1 (Throughput Optimized HDD)                                       │
│   ├── Max: 500 IOPS, 500 MB/s                                          │
│   ├── Size: 125 GB - 16 TB                                             │
│   ├── Cost: $0.045/GB                                                  │
│   └── Use: Big data, log processing, data warehouses                  │
│                                                                         │
│   sc1 (Cold HDD) - LOWEST COST                                         │
│   ├── Max: 250 IOPS, 250 MB/s                                          │
│   ├── Size: 125 GB - 16 TB                                             │
│   ├── Cost: $0.015/GB                                                  │
│   └── Use: Infrequent access, archival                                 │
│                                                                         │
│   BOOT VOLUME ELIGIBILITY:                                             │
│   • gp2, gp3, io1, io2 ✅ Can be boot volumes                          │
│   • st1, sc1 ❌ Cannot be boot volumes                                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

EBS Snapshots and Backup

import boto3
from datetime import datetime, timedelta

ec2 = boto3.client('ec2')

def create_ebs_snapshot(volume_id: str, description: str = None):
    """Create EBS snapshot (incremental backup to S3)."""
    response = ec2.create_snapshot(
        VolumeId=volume_id,
        Description=description or f"Backup-{datetime.now().isoformat()}",
        TagSpecifications=[
            {
                'ResourceType': 'snapshot',
                'Tags': [
                    {'Key': 'Name', 'Value': 'AutomatedBackup'},
                    {'Key': 'CreatedBy', 'Value': 'Python Script'},
                ]
            }
        ]
    )
    return response['SnapshotId']

def copy_snapshot_cross_region(snapshot_id: str, 
                                source_region: str,
                                destination_region: str):
    """Copy snapshot to another region for DR."""
    dest_ec2 = boto3.client('ec2', region_name=destination_region)
    
    response = dest_ec2.copy_snapshot(
        SourceSnapshotId=snapshot_id,
        SourceRegion=source_region,
        Description=f"DR copy from {source_region}",
        Encrypted=True  # Always encrypt DR copies
    )
    return response['SnapshotId']

def create_volume_from_snapshot(snapshot_id: str, 
                                 az: str,
                                 volume_type: str = 'gp3'):
    """Create new volume from snapshot (for restore or DR)."""
    response = ec2.create_volume(
        SnapshotId=snapshot_id,
        AvailabilityZone=az,
        VolumeType=volume_type,
        Encrypted=True,
        TagSpecifications=[
            {
                'ResourceType': 'volume',
                'Tags': [
                    {'Key': 'Name', 'Value': 'RestoredVolume'},
                ]
            }
        ]
    )
    return response['VolumeId']

# Key Points:
# 1. Snapshots are incremental (only changed blocks stored)
# 2. Snapshots stored in S3 (managed by AWS, not visible)
# 3. Can create volumes in any AZ from snapshot
# 4. First snapshot takes longest, subsequent are fast
# 5. Deleting snapshots only removes data not needed by others

EBS Encryption

┌────────────────────────────────────────────────────────────────────────┐
│                      EBS Encryption                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   What's encrypted:                                                     │
│   ✅ Data at rest on volume                                            │
│   ✅ Data in transit between EC2 and EBS                               │
│   ✅ All snapshots                                                      │
│   ✅ All volumes created from encrypted snapshots                       │
│                                                                         │
│   Encryption Keys:                                                      │
│   • Default: AWS managed key (aws/ebs)                                 │
│   • Custom: Your KMS CMK (better audit, control)                       │
│                                                                         │
│   Converting Unencrypted to Encrypted:                                 │
│   ┌──────────────────────────────────────────────────────────────┐    │
│   │  1. Create snapshot of unencrypted volume                     │    │
│   │  2. Copy snapshot with encryption enabled                     │    │
│   │  3. Create encrypted volume from encrypted snapshot           │    │
│   │  4. Attach new volume, migrate data, detach old               │    │
│   └──────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Best Practice: Enable "Encrypt new EBS volumes by default"          │
│   in EC2 → EBS → Settings                                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

EBS is AZ-specific! Volumes only exist in one AZ. To move across AZs or regions:

Create snapshot
Copy to target region (if cross-region)
Create volume in target AZ from snapshot

EFS (Elastic File System)

Managed NFS file system that scales automatically. Shared storage for Linux workloads.

┌────────────────────────────────────────────────────────────────────────┐
│                         EFS Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                    EFS File System                             │    │
│   │                   /efs-shared-data                             │    │
│   │                                                                │    │
│   │   Storage Classes:                                             │    │
│   │   ┌────────────────┐  ┌────────────────┐                      │    │
│   │   │ Standard       │  │ Standard-IA    │                      │    │
│   │   │ (Frequent)     │  │ (Infrequent)   │                      │    │
│   │   │ $0.30/GB       │  │ $0.016/GB +    │                      │    │
│   │   │                │  │ $0.01/access   │                      │    │
│   │   └────────────────┘  └────────────────┘                      │    │
│   │                                                                │    │
│   │   Lifecycle: Auto-move to IA after 7/14/30/60/90 days         │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                         │                                               │
│            ┌────────────┼────────────┬────────────┐                    │
│            │            │            │            │                    │
│            ▼            ▼            ▼            ▼                    │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     │
│   │Mount Target │ │Mount Target │ │Mount Target │ │ Lambda      │     │
│   │  (AZ-1a)    │ │  (AZ-1b)    │ │  (AZ-1c)    │ │ (via VPC)   │     │
│   └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘     │
│         │               │               │               │              │
│      EC2-1           EC2-2           EC2-3          Lambda            │
│     (/mnt/efs)     (/mnt/efs)      (/mnt/efs)     Functions           │
│                                                                        │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  Performance Modes:                                          │     │
│   │  • General Purpose: Low latency (default)                    │     │
│   │  • Max I/O: Higher latency, higher throughput (big data)    │     │
│   │                                                              │     │
│   │  Throughput Modes:                                           │     │
│   │  • Bursting: Scales with size (50 MB/s per TB)              │     │
│   │  • Provisioned: Set fixed throughput (1-3000+ MB/s)         │     │
│   │  • Elastic: Auto-scales (up to 10+ GB/s reads)              │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

EFS vs EBS vs S3

Feature	EFS	EBS	S3
Type	File (NFS)	Block	Object
Access	Multi-AZ, Multi-EC2	Single AZ, Single EC2	Global, HTTP
Scaling	Automatic (petabytes)	Manual (16 TB max)	Unlimited
Cost	$0.30/GB (Standard)	$0.08-0.125/GB	$0.023/GB
Latency	~ms	sub-ms	~100ms
Use Case	Shared files	Databases	Static content

RDS (Relational Database Service)

Managed relational databases: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora.

RDS Architecture and Features

┌────────────────────────────────────────────────────────────────────────┐
│                      RDS Architecture                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                    RDS Instance                                │    │
│   │                                                                │    │
│   │   What AWS Manages:             What You Manage:               │    │
│   │   ✅ Hardware provisioning      ✅ Schema design               │    │
│   │   ✅ OS patching                ✅ Query optimization          │    │
│   │   ✅ Database patching          ✅ Index creation              │    │
│   │   ✅ Automated backups          ✅ Application tuning          │    │
│   │   ✅ Multi-AZ failover          ✅ Security groups             │    │
│   │   ✅ Scaling                    ✅ Parameter groups            │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Supported Engines:                                                   │
│   ┌─────────────┬────────────────────────────────────────────────┐    │
│   │ MySQL       │ 5.7, 8.0 - Most popular open source           │    │
│   │ PostgreSQL  │ 11-16 - Advanced features, extensions          │    │
│   │ MariaDB     │ 10.x - MySQL fork, community driven            │    │
│   │ Oracle      │ Enterprise, Standard - BYOL or License Included│    │
│   │ SQL Server  │ Express, Web, Standard, Enterprise             │    │
│   │ Aurora      │ MySQL/PostgreSQL compatible, 5x faster         │    │
│   └─────────────┴────────────────────────────────────────────────┘    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Multi-AZ vs Read Replicas

┌────────────────────────────────────────────────────────────────────────┐
│                    Multi-AZ Deployment                                  │
│                   (High Availability)                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Purpose: AUTOMATIC FAILOVER for disaster recovery                    │
│                                                                         │
│   ┌─────────────────────┐  Synchronous  ┌─────────────────────┐       │
│   │    Primary DB       │◄─────────────►│    Standby DB       │       │
│   │    (AZ-1a)          │  Replication  │    (AZ-1b)          │       │
│   │                     │               │                     │       │
│   │  ✅ All reads       │               │  ❌ No reads        │       │
│   │  ✅ All writes      │               │  ❌ No writes       │       │
│   └─────────────────────┘               └─────────────────────┘       │
│            │                                    │                      │
│            │    Automatic DNS failover          │                      │
│            │    (60-120 seconds)                │                      │
│            │                                    │                      │
│            ▼                                    ▼                      │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  Application connects to: mydb.abc123.us-east-1.rds.aws    │     │
│   │  (Single endpoint, AWS handles failover automatically)      │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                         │
│   When failover happens:                                               │
│   • AZ outage or instance failure                                      │
│   • Instance type change                                               │
│   • Manual failover (for testing)                                      │
│   • OS patching                                                        │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                    Read Replicas                                        │
│                   (Read Scaling)                                        │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Purpose: SCALE READS by offloading queries                           │
│                                                                         │
│   ┌─────────────────────┐ Asynchronous ┌─────────────────────┐        │
│   │    Primary DB       │─────────────►│   Read Replica 1    │        │
│   │  (writes + reads)   │              │   (reads only)      │        │
│   └─────────────────────┘              └─────────────────────┘        │
│            │                                                           │
│            │              Async        ┌─────────────────────┐        │
│            └──────────────────────────►│   Read Replica 2    │        │
│                                        │   (reads only)      │        │
│                                        └─────────────────────┘        │
│                                                                         │
│   Key Points:                                                          │
│   • Up to 5 replicas (15 for Aurora)                                   │
│   • Can be cross-region (for DR or local reads)                       │
│   • Each replica has own endpoint                                      │
│   • Replication is ASYNC (eventual consistency)                        │
│   • Can be promoted to standalone DB                                   │
│                                                                         │
│   Application Pattern:                                                 │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  def get_connection(is_read_only=False):                     │     │
│   │      if is_read_only:                                        │     │
│   │          return connect("replica.abc123.us-east-1.rds.aws") │     │
│   │      return connect("primary.abc123.us-east-1.rds.aws")     │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

RDS Backup and Recovery

import boto3
from datetime import datetime

rds = boto3.client('rds')

# Automated Backups (AWS Managed)
# - Daily during backup window
# - Transaction logs every 5 minutes
# - Retention: 0-35 days (0 disables)
# - Point-in-time recovery to any second

def restore_to_point_in_time(source_db: str, 
                              target_db: str,
                              restore_time: datetime):
    """Restore RDS to a specific point in time."""
    response = rds.restore_db_instance_to_point_in_time(
        SourceDBInstanceIdentifier=source_db,
        TargetDBInstanceIdentifier=target_db,
        RestoreTime=restore_time,
        UseLatestRestorableTime=False,
        DBInstanceClass='db.t3.medium',
        PubliclyAccessible=False,
        MultiAZ=True,
    )
    return response

# Manual Snapshots
# - User-initiated
# - Kept until you delete them
# - Can share across accounts/regions

def create_manual_snapshot(db_identifier: str):
    """Create manual DB snapshot."""
    snapshot_id = f"{db_identifier}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    response = rds.create_db_snapshot(
        DBInstanceIdentifier=db_identifier,
        DBSnapshotIdentifier=snapshot_id,
        Tags=[
            {'Key': 'Purpose', 'Value': 'ManualBackup'},
        ]
    )
    return snapshot_id

Aurora

AWS’s cloud-native relational database. MySQL and PostgreSQL compatible with 5x better performance.

Aurora Architecture Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                      Aurora Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                      Aurora Cluster                              │  │
│   │                                                                  │  │
│   │   Writer Endpoint: mydb.cluster-abc123.us-east-1.rds.aws        │  │
│   │   Reader Endpoint: mydb.cluster-ro-abc123.us-east-1.rds.aws     │  │
│   │                                                                  │  │
│   │   ┌───────────────────┐   ┌───────────────────┐                 │  │
│   │   │  Writer Instance  │   │ Reader Instance 1 │                 │  │
│   │   │    (Primary)      │   │    (Replica)      │                 │  │
│   │   │     (AZ-1a)       │   │     (AZ-1b)       │                 │  │
│   │   └─────────┬─────────┘   └─────────┬─────────┘                 │  │
│   │             │                       │                            │  │
│   │             └───────────┬───────────┘                            │  │
│   │                         │                                        │  │
│   │                         ▼                                        │  │
│   │   ┌─────────────────────────────────────────────────────────┐   │  │
│   │   │              Shared Cluster Storage                      │   │  │
│   │   │           (Aurora Storage Engine)                        │   │  │
│   │   │                                                          │   │  │
│   │   │   ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  │   │  │
│   │   │   │10GB │  │10GB │  │10GB │  │10GB │  │10GB │  │10GB │  │   │  │
│   │   │   │AZ-1a│  │AZ-1b│  │AZ-1c│  │AZ-1a│  │AZ-1b│  │AZ-1c│  │   │  │
│   │   │   └─────┘  └─────┘  └─────┘  └─────┘  └─────┘  └─────┘  │   │  │
│   │   │                                                          │   │  │
│   │   │   • 6 copies of data across 3 AZs                       │   │  │
│   │   │   • Can lose 2 copies and still write                   │   │  │
│   │   │   • Can lose 3 copies and still read                    │   │  │
│   │   │   • Auto-heals damaged segments                         │   │  │
│   │   │   • Storage auto-scales 10 GB → 128 TB                  │   │  │
│   │   └─────────────────────────────────────────────────────────┘   │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   AURORA ADVANTAGES:                                                   │
│   • 5x throughput of MySQL, 3x of PostgreSQL                          │
│   • Up to 15 read replicas (vs 5 for RDS MySQL)                       │
│   • Failover in < 30 seconds                                          │
│   • Continuous backup to S3 (no performance impact)                   │
│   • Point-in-time recovery to any second                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Aurora Serverless v2

┌────────────────────────────────────────────────────────────────────────┐
│                    Aurora Serverless v2                                 │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Auto-scaling capacity based on demand                                │
│                                                                         │
│   Capacity (ACUs)                                                       │
│   128 ┤                                                                │
│   100 ┤              ████████                                          │
│    80 ┤            ██        ██                                        │
│    60 ┤          ██            ██                                      │
│    40 ┤        ██                ██                                    │
│    20 ┤──────██────────────────────██────────────                      │
│       └─────────────────────────────────────────► Time                 │
│                Peak usage period                                        │
│                                                                         │
│   Configuration:                                                        │
│   • Min ACU: 0.5 (can scale to zero with Serverless v2)                │
│   • Max ACU: 128 (choose based on peak needs)                          │
│   • Scales in seconds (not minutes like v1)                            │
│                                                                         │
│   Pricing (us-east-1):                                                 │
│   • $0.12/ACU-hour (Aurora MySQL)                                      │
│   • Storage: $0.10/GB-month                                            │
│   • I/O: $0.20 per million requests                                    │
│                                                                         │
│   Best For:                                                             │
│   ✅ Variable/unpredictable workloads                                  │
│   ✅ Development/test databases                                        │
│   ✅ Multi-tenant SaaS applications                                    │
│   ✅ Infrequently used applications                                    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB

Fully managed NoSQL database with single-digit millisecond latency at any scale.

DynamoDB Data Model

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Data Model                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   TABLE: Orders                                                         │
│   ──────────────                                                        │
│                                                                         │
│   Primary Key:                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  PARTITION KEY (HASH)    │  SORT KEY (RANGE)                    │  │
│   │  Required, determines    │  Optional, enables                   │  │
│   │  data distribution       │  range queries                       │  │
│   │                          │                                       │  │
│   │  customer_id             │  order_date                          │  │
│   │  "CUST#12345"           │  "2024-01-15T10:30:00Z"               │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Item (Document):                                                      │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │ {                                                                │  │
│   │   "customer_id": "CUST#12345",      // Partition key            │  │
│   │   "order_date": "2024-01-15",       // Sort key                 │  │
│   │   "order_id": "ORD#98765",          // Attribute                │  │
│   │   "total": 299.99,                  // Attribute                │  │
│   │   "items": [                        // Nested list              │  │
│   │     {"sku": "ABC123", "qty": 2},                                │  │
│   │     {"sku": "XYZ789", "qty": 1}                                 │  │
│   │   ],                                                             │  │
│   │   "status": "SHIPPED"                                           │  │
│   │ }                                                                │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   KEY DESIGN PATTERNS:                                                 │
│   ────────────────────                                                 │
│   1. One-to-Many: Use composite sort key                              │
│      PK: USER#123, SK: ORDER#001, ORDER#002, PROFILE                  │
│                                                                         │
│   2. Many-to-Many: Use GSI with inverted index                        │
│      PK: EMPLOYEE#1, SK: PROJECT#A                                    │
│      GSI: PK: PROJECT#A, SK: EMPLOYEE#1                               │
│                                                                         │
│   3. Hierarchical: Use sort key prefixes                              │
│      SK: COUNTRY#USA#STATE#CA#CITY#LA                                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB Operations (Best Practices)

import boto3
from boto3.dynamodb.conditions import Key, Attr
from decimal import Decimal
import json

# Initialize
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')

# =====================================
# SINGLE ITEM OPERATIONS
# =====================================

def put_item_example():
    """Write an item (creates or replaces)."""
    table.put_item(
        Item={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z',
            'order_id': 'ORD#98765',
            'total': Decimal('299.99'),  # Use Decimal for numbers
            'status': 'PENDING',
            'items': [
                {'sku': 'ABC123', 'qty': 2, 'price': Decimal('99.99')},
            ]
        },
        # Conditional write - only if doesn't exist
        ConditionExpression='attribute_not_exists(order_id)'
    )

def get_item_example():
    """Read a single item by primary key."""
    response = table.get_item(
        Key={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z'
        },
        # Strongly consistent read (vs eventually consistent)
        ConsistentRead=True,
        # Only return specific attributes
        ProjectionExpression='order_id, #s, total',
        ExpressionAttributeNames={'#s': 'status'}  # 'status' is reserved
    )
    return response.get('Item')

def update_item_example():
    """Update specific attributes (atomic)."""
    response = table.update_item(
        Key={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z'
        },
        UpdateExpression='SET #s = :status, updated_at = :now ADD version :inc',
        ExpressionAttributeNames={'#s': 'status'},
        ExpressionAttributeValues={
            ':status': 'SHIPPED',
            ':now': '2024-01-16T14:00:00Z',
            ':inc': 1
        },
        # Optimistic locking
        ConditionExpression='version = :expected_version',
        ReturnValues='ALL_NEW'
    )
    return response['Attributes']

# =====================================
# QUERY (Efficient - uses partition key)
# =====================================

def query_customer_orders(customer_id: str, start_date: str = None):
    """Get all orders for a customer (uses partition key)."""
    key_condition = Key('customer_id').eq(customer_id)
    
    if start_date:
        key_condition = key_condition & Key('order_date').gte(start_date)
    
    response = table.query(
        KeyConditionExpression=key_condition,
        ScanIndexForward=False,  # Descending order (newest first)
        Limit=20,
        # Filter after query (use sparingly - costs RCUs)
        FilterExpression=Attr('status').ne('CANCELLED')
    )
    return response['Items']

# =====================================
# BATCH OPERATIONS
# =====================================

def batch_write_items(items: list):
    """Write up to 25 items in one request."""
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    # boto3 handles batching, retries, and unprocessed items

def batch_get_items(keys: list):
    """Read up to 100 items in one request."""
    response = dynamodb.batch_get_item(
        RequestItems={
            'Orders': {
                'Keys': keys,
                'ProjectionExpression': 'order_id, total, status'
            }
        }
    )
    return response['Responses']['Orders']

# =====================================
# TRANSACTIONS (ACID)
# =====================================

def transfer_with_transaction():
    """Atomic multi-item transaction."""
    dynamodb_client = boto3.client('dynamodb')
    
    dynamodb_client.transact_write_items(
        TransactItems=[
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {'account_id': {'S': 'ACC#001'}},
                    'UpdateExpression': 'SET balance = balance - :amount',
                    'ConditionExpression': 'balance >= :amount',
                    'ExpressionAttributeValues': {':amount': {'N': '100'}}
                }
            },
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {'account_id': {'S': 'ACC#002'}},
                    'UpdateExpression': 'SET balance = balance + :amount',
                    'ExpressionAttributeValues': {':amount': {'N': '100'}}
                }
            }
        ]
    )

Global Secondary Indexes (GSI)

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Indexes                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   BASE TABLE: Orders                                                    │
│   PK: customer_id, SK: order_date                                       │
│                                                                         │
│   Access Pattern: "Get orders by customer" ✅ Query on PK               │
│                                                                         │
│   NEW Access Pattern: "Get orders by status"                            │
│   ❌ Can't query - status is not a key!                                 │
│   ✅ Solution: Create GSI                                               │
│                                                                         │
│   GLOBAL SECONDARY INDEX (GSI):                                         │
│   ─────────────────────────────                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  GSI: orders-by-status                                          │  │
│   │  PK: status      SK: order_date                                 │  │
│   │                                                                  │  │
│   │  Projected Attributes: order_id, customer_id, total             │  │
│   │  (Can project ALL, KEYS_ONLY, or specific attributes)           │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   # Query GSI                                                           │
│   table.query(                                                          │
│       IndexName='orders-by-status',                                     │
│       KeyConditionExpression=Key('status').eq('PENDING')                │
│   )                                                                     │
│                                                                         │
│   GSI vs LSI:                                                           │
│   ┌──────────────────────────────────────────────────────────────────┐ │
│   │  GSI (Global Secondary)      │  LSI (Local Secondary)           │ │
│   │  ────────────────────────────┼───────────────────────────────── │ │
│   │  Different partition key     │  Same partition key              │ │
│   │  Create anytime              │  Create at table creation only   │ │
│   │  Own RCU/WCU                 │  Shares table RCU/WCU            │ │
│   │  Eventually consistent only  │  Strongly consistent available   │ │
│   │  Up to 20 per table          │  Up to 5 per table              │ │
│   └──────────────────────────────────────────────────────────────────┘ │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB Capacity and Pricing

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Capacity Modes                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ON-DEMAND MODE:                                                       │
│   ───────────────                                                       │
│   • Pay per request ($1.25 per million writes, $0.25 per million reads)│
│   • No capacity planning                                               │
│   • Scales instantly to any traffic                                    │
│   • Best for: Unpredictable traffic, new applications                  │
│                                                                         │
│   PROVISIONED MODE:                                                     │
│   ─────────────────                                                     │
│   • Pre-allocate RCU (Read Capacity Units) and WCU (Write Capacity)    │
│   • Cheaper at scale (~5-7x cheaper at steady state)                   │
│   • Auto Scaling available                                              │
│   • Reserved Capacity for 1-3 years (up to 70% discount)               │
│                                                                         │
│   CAPACITY UNITS:                                                       │
│   ───────────────                                                       │
│   1 RCU = 1 strongly consistent read/sec (up to 4 KB)                  │
│         = 2 eventually consistent reads/sec (up to 4 KB)               │
│   1 WCU = 1 write/sec (up to 1 KB)                                     │
│                                                                         │
│   Example Calculation:                                                  │
│   ────────────────────                                                  │
│   100 reads/sec × 8 KB items × strongly consistent                     │
│   = 100 × (8 KB / 4 KB) × 1 = 200 RCU                                  │
│                                                                         │
│   50 writes/sec × 3 KB items                                           │
│   = 50 × ceil(3 KB / 1 KB) = 150 WCU                                   │
│                                                                         │
│   COST COMPARISON (us-east-1):                                         │
│   ────────────────────────────                                         │
│   On-Demand: $0.25/million reads, $1.25/million writes                 │
│   Provisioned: $0.00013/RCU-hr, $0.00065/WCU-hr                        │
│                                                                         │
│   Break-even: ~65,000 reads/hr or ~20,000 writes/hr                    │
│   Below this → On-Demand cheaper                                       │
│   Above this → Provisioned cheaper                                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

ElastiCache

Managed in-memory caching for sub-millisecond response times: Redis or Memcached.

ElastiCache Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                    ElastiCache Architecture                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   APPLICATION CACHING PATTERN:                                         │
│   ────────────────────────────                                         │
│                                                                         │
│   ┌────────────┐                                                       │
│   │   Client   │                                                       │
│   └──────┬─────┘                                                       │
│          │ 1. Request                                                   │
│          ▼                                                              │
│   ┌────────────┐      2. Check cache     ┌─────────────────────┐       │
│   │ Application│ ────────────────────► │   ElastiCache       │       │
│   │   Server   │                        │   (Redis/Memcached) │       │
│   └──────┬─────┘ ◄──────────────────── └─────────────────────┘       │
│          │        3a. Cache HIT                     │                  │
│          │           (return)            3b. Cache MISS                │
│          │                                          │                  │
│          │ 4. Query if miss              ┌──────────┘                  │
│          ▼                               ▼                              │
│   ┌─────────────────────────────────────────────────┐                  │
│   │              Database (RDS/DynamoDB)             │                  │
│   └─────────────────────────────────────────────────┘                  │
│                               │                                         │
│                               │ 5. Return data                          │
│                               ▼                                         │
│   ┌────────────────────────────────────────────────────────────────┐   │
│   │  Application: Store in cache with TTL, return to client        │   │
│   └────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Redis vs Memcached Comparison

Feature	Redis	Memcached
Data Structures	Strings, Lists, Sets, Sorted Sets, Hashes, Streams	Simple key-value only
Persistence	Yes (RDB, AOF)	No
Replication	Yes (Multi-AZ)	No
Pub/Sub	Yes	No
Lua Scripting	Yes	No
Cluster Mode	Yes (sharding)	Yes (sharding)
Multi-threading	Single-threaded (per shard)	Multi-threaded
Use Case	Sessions, leaderboards, queues	Simple caching, high throughput

Caching Strategies

import redis
import json
from datetime import timedelta

# Connect to ElastiCache Redis
redis_client = redis.Redis(
    host='my-cluster.abc123.cache.amazonaws.com',
    port=6379,
    ssl=True,
    decode_responses=True
)

# ======================
# CACHE-ASIDE (Lazy Loading)
# ======================
def get_user_with_cache_aside(user_id: str) -> dict:
    """
    Cache-Aside: Application manages cache
    
    Pros: Only requested data cached, cache failures don't break app
    Cons: Cache miss = slow, data can be stale
    """
    cache_key = f"user:{user_id}"
    
    # 1. Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss - query database
    user = db.query_user(user_id)  # Your DB call
    
    # 3. Store in cache with TTL
    redis_client.setex(
        cache_key,
        timedelta(hours=1),  # TTL
        json.dumps(user)
    )
    
    return user

# ======================
# WRITE-THROUGH
# ======================
def update_user_write_through(user_id: str, data: dict) -> dict:
    """
    Write-Through: Write to cache AND database together
    
    Pros: Cache always consistent, never stale
    Cons: Write latency (two writes), cache churn for unused data
    """
    cache_key = f"user:{user_id}"
    
    # 1. Write to database
    user = db.update_user(user_id, data)
    
    # 2. Write to cache
    redis_client.setex(
        cache_key,
        timedelta(hours=1),
        json.dumps(user)
    )
    
    return user

# ======================
# WRITE-BEHIND (Write-Back)
# ======================
def update_user_write_behind(user_id: str, data: dict) -> dict:
    """
    Write-Behind: Write to cache, async write to database
    
    Pros: Fast writes, reduced database load
    Cons: Data loss risk, complex implementation
    """
    cache_key = f"user:{user_id}"
    
    # 1. Write to cache immediately
    user = {**get_user(user_id), **data}
    redis_client.setex(cache_key, timedelta(hours=1), json.dumps(user))
    
    # 2. Queue database write (async)
    write_queue.send({'user_id': user_id, 'data': data})
    
    return user

# ======================
# CACHE INVALIDATION
# ======================
def invalidate_user_cache(user_id: str):
    """Delete cache entry when data changes."""
    redis_client.delete(f"user:{user_id}")

def invalidate_pattern(pattern: str):
    """Delete all keys matching pattern (use with caution)."""
    cursor = 0
    while True:
        cursor, keys = redis_client.scan(cursor, match=pattern, count=100)
        if keys:
            redis_client.delete(*keys)
        if cursor == 0:
            break

🎯 Interview Questions

Q1: S3 vs EBS vs EFS - when to use each?

S3 (Object Storage):

Static files, backups, data lakes
HTTP access from anywhere
Unlimited storage, 11 9s durability

EBS (Block Storage):

EC2 boot volumes, databases
Single EC2, single AZ
Low latency (sub-ms), resizable

EFS (File Storage):

Shared file systems across EC2
Multi-AZ, POSIX compliant
Auto-scaling, Linux only

Decision Matrix:

Need HTTP access globally → S3
Need database storage → EBS
Need shared NFS mount → EFS

Q2: How would you design DynamoDB for a social media app?

Access Patterns:

Get user profile
Get user’s posts (sorted by date)
Get post’s comments
Get user’s followers
Get posts by hashtag

Single Table Design:

PK: USER#123, SK: PROFILE       → User profile
PK: USER#123, SK: POST#2024...  → User's posts
PK: POST#456, SK: COMMENT#...   → Post comments
PK: USER#123, SK: FOLLOWS#789   → Followers

GSI1: hashtag, created_at       → Posts by hashtag
GSI2: post_id, created_at       → Comments on post

Key Points:

Denormalize for read performance
Use composite sort keys
Create GSIs for access patterns
Use sparse indexes where appropriate

Q3: RDS Multi-AZ vs Read Replicas - what's the difference?

Multi-AZ (High Availability):

Purpose: Disaster recovery
Synchronous replication
Automatic failover (60-120s)
Standby NOT readable
Same region only

Read Replicas (Read Scaling):

Purpose: Performance/offload reads
Asynchronous replication
Manual promotion (becomes primary)
Replicas ARE readable
Can be cross-region

Best Practice: Use BOTH

Multi-AZ for production HA
Read replicas for read scaling

Q4: How do you handle cache invalidation?

Strategies:

TTL-Based (Simplest)
- Set expiration on cache entries
- Accept eventual consistency
Event-Driven
- Publish events on data changes
- Consumers invalidate cache
Write-Through
- Update cache on every write
- Never stale, but slower writes
Cache-Aside with Versioning
- Include version in cache key
- Bump version on update

Code Pattern:

def update_product(product_id, data):
    # Update database
    db.update(product_id, data)
    # Invalidate cache
    cache.delete(f"product:{product_id}")
    # Or publish event
    sns.publish(topic, {"product_id": product_id})

Q5: Design a cost-effective storage strategy for 100TB data

Strategy: Tiered Storage with Lifecycle

Hot Data (Recent 30 days): S3 Standard
Warm Data (30-90 days): S3 Standard-IA
Cold Data (90+ days): S3 Glacier
Archive (1+ year): S3 Glacier Deep Archive

Lifecycle Policy:

{
  "Rules": [{
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"},
      {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
    ],
    "Expiration": {"Days": 2555}
  }]
}

Cost Estimate (100 TB, us-east-1):

All Standard: $2,300/month
With lifecycle: ~$200/month (91% savings!)

🧪 Hands-On Lab: Build a Caching Layer

Objective: Add Redis caching to reduce database load by 80%

Create ElastiCache Redis Cluster

Use t3.micro for testing, enable encryption in transit

Modify Security Groups

Allow port 6379 from application servers

Implement Cache-Aside Pattern

Add caching layer to your application

Add TTL and Invalidation

Set appropriate TTLs, invalidate on writes

Monitor with CloudWatch

Track cache hit ratio, memory usage

Storage Comparison Summary

Service	Type	Durability	Latency	Best For
S3	Object	11 9s	~100ms	Files, backups, static sites
EBS	Block	99.999%	sub-ms	EC2 volumes, databases
EFS	File	11 9s	~ms	Shared file systems
DynamoDB	NoSQL	11 9s	sub-10ms	Key-value, high scale
RDS/Aurora	SQL	99.95%	~ms	Relational, complex queries
ElastiCache	In-memory	N/A	sub-ms	Caching, sessions

Next Module

Networking

Master VPC, subnets, security groups, and load balancers

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Module Overview

​Storage Service Selection Guide

​Storage Types Comparison

​S3 (Simple Storage Service)

​S3 Architecture Deep Dive

​S3 Storage Classes Deep Dive

​S3 Lifecycle Policies

​S3 Security Best Practices

​S3 Performance Optimization

​Presigned URLs for Secure Sharing

​EBS (Elastic Block Store)

​EBS Volume Types Deep Dive

​EBS Snapshots and Backup

​EBS Encryption

​EFS (Elastic File System)

​EFS vs EBS vs S3

​RDS (Relational Database Service)

​RDS Architecture and Features

​Multi-AZ vs Read Replicas

​RDS Backup and Recovery

​Aurora

​Aurora Architecture Deep Dive

​Aurora Serverless v2

​DynamoDB

​DynamoDB Data Model

​DynamoDB Operations (Best Practices)

​Global Secondary Indexes (GSI)

​DynamoDB Capacity and Pricing

​ElastiCache

​ElastiCache Architecture

​Redis vs Memcached Comparison

​Caching Strategies

​🎯 Interview Questions

​🧪 Hands-On Lab: Build a Caching Layer

​Storage Comparison Summary

​Next Module

Networking

Module Overview

Storage Service Selection Guide

Storage Types Comparison

S3 (Simple Storage Service)

S3 Architecture Deep Dive

S3 Storage Classes Deep Dive

S3 Lifecycle Policies

S3 Security Best Practices

S3 Performance Optimization

Presigned URLs for Secure Sharing

EBS (Elastic Block Store)

EBS Volume Types Deep Dive

EBS Snapshots and Backup

EBS Encryption

EFS (Elastic File System)

EFS vs EBS vs S3

RDS (Relational Database Service)

RDS Architecture and Features

Multi-AZ vs Read Replicas

RDS Backup and Recovery

Aurora

Aurora Architecture Deep Dive

Aurora Serverless v2

DynamoDB

DynamoDB Data Model

DynamoDB Operations (Best Practices)

Global Secondary Indexes (GSI)

DynamoDB Capacity and Pricing

ElastiCache

ElastiCache Architecture

Redis vs Memcached Comparison

Caching Strategies

🎯 Interview Questions

🧪 Hands-On Lab: Build a Caching Layer

Storage Comparison Summary

Next Module