Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

S3 Storage Classes

Module Overview

Estimated Time: 5-6 hours | Difficulty: Intermediate | Prerequisites: Core Concepts, Compute
This module covers all AWS storage and database services. You’ll learn data modeling, performance optimization, cost management, and when to use each service. What You’ll Learn:
  • S3 storage classes, lifecycle policies, and security
  • EBS volumes and snapshots for EC2
  • EFS for shared file systems
  • RDS and Aurora for relational databases
  • DynamoDB for NoSQL at scale
  • ElastiCache for sub-millisecond response times
  • Data migration strategies

Storage Service Selection Guide

┌────────────────────────────────────────────────────────────────────────┐
│                  AWS Storage Decision Tree                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   What type of data access pattern?                                     │
│         │                                                               │
│         ├─── Random access to blocks ──────────► EBS (EC2 volumes)     │
│         │    (Database, boot volumes)               │                   │
│         │                                           └── gp3: General   │
│         │                                           └── io2: High IOPS │
│         │                                                               │
│         ├─── Object/file storage ──────────────► S3 (Object Storage)   │
│         │    (Any size, any format)                 │                   │
│         │                                           └── Standard: Hot  │
│         │                                           └── IA: Warm       │
│         │                                           └── Glacier: Cold  │
│         │                                                               │
│         ├─── Shared file system (Linux) ───────► EFS (NFS)             │
│         │    (Multiple EC2, containers)                                 │
│         │                                                               │
│         └─── Shared file system (Windows) ─────► FSx for Windows       │
│              (AD integration, SMB)                                      │
│                                                                         │
│   DATABASE Decision:                                                    │
│   ──────────────────                                                    │
│         │                                                               │
│         ├─── Relational, complex queries ──────► RDS / Aurora          │
│         │    (Joins, transactions, ACID)                                │
│         │                                                               │
│         ├─── Key-value, high scale ────────────► DynamoDB              │
│         │    (Single-digit ms, serverless)                              │
│         │                                                               │
│         ├─── Document store ───────────────────► DocumentDB (MongoDB)  │
│         │                                                               │
│         ├─── Graph relationships ──────────────► Neptune              │
│         │                                                               │
│         └─── Caching layer ────────────────────► ElastiCache           │
│              (Redis/Memcached)                                          │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Storage Types Comparison

┌───────────────────────────────────────────────────────────────────────┐
│                      AWS Storage Types                                 │
├───────────────────────────────────────────────────────────────────────┤
│                                                                        │
│   BLOCK STORAGE           OBJECT STORAGE          FILE STORAGE         │
│   ─────────────           ─────────────           ─────────────        │
│   ┌─────────────┐         ┌─────────────┐         ┌─────────────┐      │
│   │    EBS      │         │     S3      │         │    EFS      │      │
│   │  (Volumes)  │         │  (Buckets)  │         │   (NFS)     │      │
│   └─────────────┘         └─────────────┘         └─────────────┘      │
│         │                       │                       │              │
│   Characteristics:         Characteristics:       Characteristics:     │
│   • Like a hard disk       • Like Dropbox         • Like network share │
│   • Attach to 1 EC2        • HTTP access          • Multiple EC2s      │
│   • Single AZ              • Global access        • Multi-AZ           │
│   • Low latency            • Unlimited size       • POSIX compliant    │
│   • Boot volumes           • 11 9s durability     • Auto-scaling       │
│                                                                        │
│   Use Cases:               Use Cases:             Use Cases:           │
│   • Databases              • Static websites      • Content management │
│   • Enterprise apps        • Data lakes           • Web serving        │
│   • Boot/root volumes      • Backups/archives     • Container storage  │
│   • High-perf workloads    • Media hosting        • Dev environments   │
│                                                                        │
│   Durability & Performance:                                            │
│   ─────────────────────────                                            │
│   EBS: 99.999% (single AZ), <1ms latency                              │
│   S3:  99.999999999% (11 9s), ~100ms latency                          │
│   EFS: 99.999999999%, few ms latency                                  │
│                                                                        │
└───────────────────────────────────────────────────────────────────────┘

S3 (Simple Storage Service)

Object storage with unlimited capacity. The backbone of AWS storage and data lakes. S3 is deceptively simple on the surface — just PUT and GET objects. But underneath it powers some of the largest data platforms in the world and has more configuration knobs than almost any other AWS service. The biggest cost mistake teams make with S3 is not setting up lifecycle policies — storing 5 years of logs in S3 Standard instead of Glacier Deep Archive can cost 20x more than necessary.

S3 Architecture Deep Dive

┌──────────────────────────────────────────────────────────────────────┐
│                         S3 Architecture                               │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   Bucket: my-company-data-prod (globally unique name)                │
│   Region: us-east-1                                                   │
│   ─────────────────────────────────────────────────────────────────  │
│                                                                       │
│   Object Structure:                                                   │
│   ┌─────────────────────────────────────────────────────────────┐    │
│   │  Key: images/2024/01/photo.jpg                               │    │
│   │  ┌────────────────────────────────────────────────────────┐ │    │
│   │  │  Value: [binary data up to 5 TB]                        │ │    │
│   │  │                                                         │ │    │
│   │  │  Metadata:                                              │ │    │
│   │  │  • System: Content-Type, Last-Modified, ETag            │ │    │
│   │  │  • User-defined: x-amz-meta-*                           │ │    │
│   │  │                                                         │ │    │
│   │  │  Version ID: 3sL4kqtJlcpXroDTDmJ+rmSpXd3dIbrHY+MTRCxf3  │ │    │
│   │  └────────────────────────────────────────────────────────┘ │    │
│   └─────────────────────────────────────────────────────────────┘    │
│                                                                       │
│   Limits & Capabilities:                                              │
│   • Max object size: 5 TB                                            │
│   • Max single PUT: 5 GB (use multipart for larger)                  │
│   • Max parts in multipart: 10,000                                   │
│   • Recommended multipart for > 100 MB                               │
│   • Unlimited objects per bucket                                      │
│   • Bucket names: 3-63 chars, globally unique                        │
│                                                                       │
│   Data Consistency (as of Dec 2020):                                 │
│   • Strong read-after-write consistency for all operations           │
│   • No eventual consistency delays anymore!                          │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

S3 Storage Classes Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                     S3 Storage Classes                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Storage Class     │ Access         │ Min Storage │ Retrieval │ Cost  │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Standard       │ Milliseconds   │ None        │ None      │ $$$$  │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Standard-IA    │ Milliseconds   │ 30 days     │ Per GB    │ $$    │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 One Zone-IA    │ Milliseconds   │ 30 days     │ Per GB    │ $     │
│   ─────────────────────────────────────────────────────────────────────│
│   S3 Intelligent    │ Milliseconds   │ None        │ None      │ Auto  │
│   Tiering           │                │             │ monitoring│       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Instant   │ Milliseconds   │ 90 days     │ Per GB    │ ¢¢    │
│   Retrieval         │                │             │           │       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Flexible  │ 1-5 min to     │ 90 days     │ Per GB +  │ ¢     │
│   Retrieval         │ 12 hours       │             │ per req   │       │
│   ─────────────────────────────────────────────────────────────────────│
│   Glacier Deep      │ 12-48 hours    │ 180 days    │ Per GB +  │ ¢     │
│   Archive           │                │             │ per req   │       │
│   ─────────────────────────────────────────────────────────────────────│
│                                                                         │
│   Cost Comparison (per GB/month, us-east-1):                           │
│   Standard:       $0.023                                               │
│   Standard-IA:    $0.0125 (+ $0.01/GB retrieval)                       │
│   One Zone-IA:    $0.01   (+ $0.01/GB retrieval)                       │
│   Glacier IR:     $0.004  (+ $0.03/GB retrieval)                       │
│   Glacier FR:     $0.0036 (+ $0.01-$0.03/GB retrieval)                 │
│   Deep Archive:   $0.00099 (+ $0.02/GB retrieval)                      │
│                                                                         │
│   USE INTELLIGENT TIERING when access patterns are unknown!            │
│   It automatically moves objects between tiers based on access.        │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

S3 Lifecycle Policies

# Example lifecycle policy configuration
import boto3
import json

s3 = boto3.client('s3')

# Lifecycle policies are the single most impactful cost optimization for S3.
# A real-world example: a company storing 50 TB of logs at S3 Standard costs
# ~$1,150/month. Moving 30-day-old logs to Glacier cuts that to ~$180/month.
# Adding Deep Archive after 1 year brings long-term storage to ~$50/month.
lifecycle_policy = {
    "Rules": [
        {
            "ID": "MoveToIA30Days",
            "Status": "Enabled",
            "Filter": {"Prefix": "logs/"},
            "Transitions": [
                {
                    "Days": 30,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 90,
                    "StorageClass": "GLACIER"
                },
                {
                    "Days": 365,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ],
            "Expiration": {
                "Days": 2555  # Delete after 7 years
            }
        },
        {
            # This rule is critical and often overlooked. Failed multipart
            # uploads leave invisible partial data in your bucket that you
            # still pay for. Some teams discover gigabytes of orphaned parts
            # eating their S3 budget. 7 days is generous -- most uploads
            # either complete in minutes or never will.
            "ID": "DeleteIncompleteMultipartUploads",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},
            "AbortIncompleteMultipartUpload": {
                "DaysAfterInitiation": 7
            }
        },
        {
            "ID": "ExpireOldVersions",
            "Status": "Enabled",
            "Filter": {"Prefix": ""},
            "NoncurrentVersionTransitions": [
                {
                    "NoncurrentDays": 30,
                    "StorageClass": "GLACIER"
                }
            ],
            "NoncurrentVersionExpiration": {
                "NoncurrentDays": 365
            }
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket='my-bucket',
    LifecycleConfiguration=lifecycle_policy
)

S3 Security Best Practices

┌────────────────────────────────────────────────────────────────────────┐
│                     S3 Security Layers                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Layer 1: BLOCK PUBLIC ACCESS (Account & Bucket Level)               │
│   ─────────────────────────────────────────────────────                │
│   Enable all four settings by default:                                 │
│   ✅ BlockPublicAcls                                                   │
│   ✅ IgnorePublicAcls                                                  │
│   ✅ BlockPublicPolicy                                                 │
│   ✅ RestrictPublicBuckets                                             │
│                                                                         │
│   Layer 2: BUCKET POLICY (Resource-Based)                              │
│   ────────────────────────────────────────                             │
│   {                                                                     │
│     "Version": "2012-10-17",                                           │
│     "Statement": [{                                                     │
│       "Sid": "EnforceTLS",                                             │
│       "Effect": "Deny",                                                │
│       "Principal": "*",                                                │
│       "Action": "s3:*",                                                │
│       "Resource": ["arn:aws:s3:::bucket/*"],                           │
│       "Condition": {                                                    │
│         "Bool": {"aws:SecureTransport": "false"}                       │
│       }                                                                 │
│     }]                                                                  │
│   }                                                                     │
│                                                                         │
│   Layer 3: IAM POLICIES (Identity-Based)                               │
│   ──────────────────────────────────────                               │
│   Attached to users, groups, or roles                                  │
│                                                                         │
│   Layer 4: ENCRYPTION                                                   │
│   ───────────────────                                                   │
│   • SSE-S3: AWS managed keys (default, free)                           │
│   • SSE-KMS: Customer managed keys (audit trail)                       │
│   • SSE-C: Customer provided keys                                      │
│   • Client-side: Encrypt before upload                                 │
│                                                                         │
│   Layer 5: ACCESS POINTS                                               │
│   ──────────────────────                                               │
│   Simplified access management for multi-tenant buckets               │
│                                                                         │
│   Layer 6: OBJECT LOCK                                                 │
│   ─────────────────────                                                │
│   WORM (Write Once Read Many) for compliance                          │
│   • Governance mode: Can be overridden with special permissions       │
│   • Compliance mode: Cannot be overridden by anyone                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

S3 Performance Optimization

import boto3
from concurrent.futures import ThreadPoolExecutor
import os

s3 = boto3.client('s3')

class S3Uploader:
    """High-performance S3 uploader with best practices."""
    
    def __init__(self, bucket: str):
        self.bucket = bucket
        # Configure transfer settings for optimal performance
        self.config = boto3.s3.transfer.TransferConfig(
            multipart_threshold=8 * 1024 * 1024,  # 8 MB
            max_concurrency=10,
            multipart_chunksize=8 * 1024 * 1024,  # 8 MB
            use_threads=True
        )
    
    def upload_file(self, local_path: str, s3_key: str):
        """Upload with automatic multipart handling."""
        s3.upload_file(
            local_path, 
            self.bucket, 
            s3_key,
            Config=self.config
        )
    
    def upload_large_file_optimized(self, local_path: str, s3_key: str):
        """
        Best practices for large file uploads:
        1. Use multipart upload (automatic for > 8 MB)
        2. Use S3 Transfer Acceleration for cross-region
        3. Use byte-range requests for downloads
        """
        # Enable Transfer Acceleration for the bucket first
        # aws s3api put-bucket-accelerate-configuration \
        #   --bucket my-bucket --accelerate-configuration Status=Enabled
        
        s3_accelerate = boto3.client(
            's3',
            endpoint_url='https://s3-accelerate.amazonaws.com'
        )
        
        s3_accelerate.upload_file(
            local_path,
            self.bucket,
            s3_key,
            Config=self.config
        )

# S3 Performance Tips:
# 1. Use random prefixes for high request rates (3,500+ PUT/s)
#    Instead of: logs/2024/01/15/file1.log
#    Use:        logs/a7b3c1d2/2024/01/15/file1.log
#
# 2. Enable S3 Transfer Acceleration for upload speeds
#    20-200% faster for long-distance transfers
#
# 3. Use byte-range fetches for parallel downloads
#
# 4. S3 can handle 5,500+ GET requests/second per prefix
#
# 5. Use S3 Select to retrieve only needed data (up to 400% faster)

Presigned URLs for Secure Sharing

import boto3
from datetime import datetime, timedelta

s3 = boto3.client('s3')

def generate_presigned_download_url(bucket: str, key: str, 
                                     expiry_seconds: int = 3600) -> str:
    """Generate a presigned URL for downloading."""
    url = s3.generate_presigned_url(
        'get_object',
        Params={
            'Bucket': bucket,
            'Key': key
        },
        ExpiresIn=expiry_seconds
    )
    return url

def generate_presigned_upload_url(bucket: str, key: str,
                                   content_type: str = 'application/octet-stream',
                                   expiry_seconds: int = 3600) -> dict:
    """Generate a presigned URL for uploading."""
    response = s3.generate_presigned_post(
        Bucket=bucket,
        Key=key,
        Fields={
            'Content-Type': content_type
        },
        Conditions=[
            {'Content-Type': content_type},
            ['content-length-range', 1, 104857600]  # 1 byte to 100 MB
        ],
        ExpiresIn=expiry_seconds
    )
    return response  # Returns {'url': ..., 'fields': {...}}

# Usage in API response
download_url = generate_presigned_download_url('my-bucket', 'reports/q1.pdf')
upload_info = generate_presigned_upload_url('my-bucket', 'uploads/user123/file.jpg')

EBS (Elastic Block Store)

Persistent block storage for EC2 instances. Like a high-performance SSD/HDD attached to your server. The key mental model: EBS volumes are network-attached storage that happens to be very fast. They persist independently of the EC2 instance (unlike instance store, which is physically attached and lost on stop/terminate). This means you can snapshot, detach, and reattach volumes — but it also means there is a small network latency penalty compared to local NVMe drives.

EBS Volume Types Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                      EBS Volume Types                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   SSD-BASED (Random I/O):                                              │
│   ─────────────────────────                                            │
│                                                                         │
│   gp3 (General Purpose SSD) - RECOMMENDED DEFAULT                      │
│   ├── Baseline: 3,000 IOPS, 125 MB/s (free, included in GB price)     │
│   ├── Max: 16,000 IOPS, 1,000 MB/s                                     │
│   ├── Size: 1 GB - 16 TB                                               │
│   ├── Cost: $0.08/GB + $0.005/IOPS (above 3K)                          │
│   ├── Cost tip: gp3 is almost always cheaper than gp2 -- same price per│
│   │   GB but 3K IOPS baseline is free vs gp2 needing 1TB to reach 3K  │
│   └── Use: Boot volumes, most workloads, databases under 16K IOPS     │                                │
│                                                                         │
│   gp2 (Previous Gen) - LEGACY                                          │
│   ├── IOPS: 3 IOPS/GB (burst to 3,000)                                 │
│   ├── Max: 16,000 IOPS                                                 │
│   └── Note: gp3 is usually more cost-effective                         │
│                                                                         │
│   io2 Block Express (Provisioned IOPS) - HIGHEST PERFORMANCE           │
│   ├── Max: 256,000 IOPS, 4,000 MB/s                                    │
│   ├── Latency: Sub-millisecond                                         │
│   ├── Durability: 99.999% (5 9s vs 99.9% for others)                   │
│   ├── Size: 4 GB - 64 TB                                               │
│   ├── Cost: $0.125/GB + $0.065/IOPS                                    │
│   └── Use: Databases (Oracle, SAP), critical I/O workloads            │
│                                                                         │
│   io1 (Provisioned IOPS) - LEGACY                                      │
│   └── Max: 64,000 IOPS, use io2 instead                                │
│                                                                         │
│   HDD-BASED (Sequential I/O):                                          │
│   ──────────────────────────                                           │
│                                                                         │
│   st1 (Throughput Optimized HDD)                                       │
│   ├── Max: 500 IOPS, 500 MB/s                                          │
│   ├── Size: 125 GB - 16 TB                                             │
│   ├── Cost: $0.045/GB                                                  │
│   └── Use: Big data, log processing, data warehouses                  │
│                                                                         │
│   sc1 (Cold HDD) - LOWEST COST                                         │
│   ├── Max: 250 IOPS, 250 MB/s                                          │
│   ├── Size: 125 GB - 16 TB                                             │
│   ├── Cost: $0.015/GB                                                  │
│   └── Use: Infrequent access, archival                                 │
│                                                                         │
│   BOOT VOLUME ELIGIBILITY:                                             │
│   • gp2, gp3, io1, io2 ✅ Can be boot volumes                          │
│   • st1, sc1 ❌ Cannot be boot volumes                                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

EBS Snapshots and Backup

import boto3
from datetime import datetime, timedelta

ec2 = boto3.client('ec2')

def create_ebs_snapshot(volume_id: str, description: str = None):
    """Create EBS snapshot (incremental backup to S3)."""
    response = ec2.create_snapshot(
        VolumeId=volume_id,
        Description=description or f"Backup-{datetime.now().isoformat()}",
        TagSpecifications=[
            {
                'ResourceType': 'snapshot',
                'Tags': [
                    {'Key': 'Name', 'Value': 'AutomatedBackup'},
                    {'Key': 'CreatedBy', 'Value': 'Python Script'},
                ]
            }
        ]
    )
    return response['SnapshotId']

def copy_snapshot_cross_region(snapshot_id: str, 
                                source_region: str,
                                destination_region: str):
    """Copy snapshot to another region for DR."""
    dest_ec2 = boto3.client('ec2', region_name=destination_region)
    
    response = dest_ec2.copy_snapshot(
        SourceSnapshotId=snapshot_id,
        SourceRegion=source_region,
        Description=f"DR copy from {source_region}",
        Encrypted=True  # Always encrypt DR copies
    )
    return response['SnapshotId']

def create_volume_from_snapshot(snapshot_id: str, 
                                 az: str,
                                 volume_type: str = 'gp3'):
    """Create new volume from snapshot (for restore or DR)."""
    response = ec2.create_volume(
        SnapshotId=snapshot_id,
        AvailabilityZone=az,
        VolumeType=volume_type,
        Encrypted=True,
        TagSpecifications=[
            {
                'ResourceType': 'volume',
                'Tags': [
                    {'Key': 'Name', 'Value': 'RestoredVolume'},
                ]
            }
        ]
    )
    return response['VolumeId']

# Key Points:
# 1. Snapshots are incremental (only changed blocks stored)
# 2. Snapshots stored in S3 (managed by AWS, not visible)
# 3. Can create volumes in any AZ from snapshot
# 4. First snapshot takes longest, subsequent are fast
# 5. Deleting snapshots only removes data not needed by others

EBS Encryption

┌────────────────────────────────────────────────────────────────────────┐
│                      EBS Encryption                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   What's encrypted:                                                     │
│   ✅ Data at rest on volume                                            │
│   ✅ Data in transit between EC2 and EBS                               │
│   ✅ All snapshots                                                      │
│   ✅ All volumes created from encrypted snapshots                       │
│                                                                         │
│   Encryption Keys:                                                      │
│   • Default: AWS managed key (aws/ebs)                                 │
│   • Custom: Your KMS CMK (better audit, control)                       │
│                                                                         │
│   Converting Unencrypted to Encrypted:                                 │
│   ┌──────────────────────────────────────────────────────────────┐    │
│   │  1. Create snapshot of unencrypted volume                     │    │
│   │  2. Copy snapshot with encryption enabled                     │    │
│   │  3. Create encrypted volume from encrypted snapshot           │    │
│   │  4. Attach new volume, migrate data, detach old               │    │
│   └──────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Best Practice: Enable "Encrypt new EBS volumes by default"          │
│   in EC2 → EBS → Settings                                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
EBS is AZ-specific! Volumes only exist in one AZ. To move across AZs or regions:
  1. Create snapshot
  2. Copy to target region (if cross-region)
  3. Create volume in target AZ from snapshot

EFS (Elastic File System)

Managed NFS file system that scales automatically. Shared storage for Linux workloads. EFS solves the problem of “I need multiple EC2 instances (or Lambda functions, or containers) to read and write the same files.” Think of it as a shared network drive that auto-scales from kilobytes to petabytes. The trade-off is cost: EFS Standard at $0.30/GB is roughly 4x more expensive than EBS gp3, so only use it when you genuinely need shared access.
┌────────────────────────────────────────────────────────────────────────┐
│                         EFS Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                    EFS File System                             │    │
│   │                   /efs-shared-data                             │    │
│   │                                                                │    │
│   │   Storage Classes:                                             │    │
│   │   ┌────────────────┐  ┌────────────────┐                      │    │
│   │   │ Standard       │  │ Standard-IA    │                      │    │
│   │   │ (Frequent)     │  │ (Infrequent)   │                      │    │
│   │   │ $0.30/GB       │  │ $0.016/GB +    │                      │    │
│   │   │                │  │ $0.01/access   │                      │    │
│   │   └────────────────┘  └────────────────┘                      │    │
│   │                                                                │    │
│   │   Lifecycle: Auto-move to IA after 7/14/30/60/90 days         │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                         │                                               │
│            ┌────────────┼────────────┬────────────┐                    │
│            │            │            │            │                    │
│            ▼            ▼            ▼            ▼                    │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐     │
│   │Mount Target │ │Mount Target │ │Mount Target │ │ Lambda      │     │
│   │  (AZ-1a)    │ │  (AZ-1b)    │ │  (AZ-1c)    │ │ (via VPC)   │     │
│   └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘     │
│         │               │               │               │              │
│      EC2-1           EC2-2           EC2-3          Lambda            │
│     (/mnt/efs)     (/mnt/efs)      (/mnt/efs)     Functions           │
│                                                                        │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  Performance Modes:                                          │     │
│   │  • General Purpose: Low latency (default)                    │     │
│   │  • Max I/O: Higher latency, higher throughput (big data)    │     │
│   │                                                              │     │
│   │  Throughput Modes:                                           │     │
│   │  • Bursting: Scales with size (50 MB/s per TB)              │     │
│   │  • Provisioned: Set fixed throughput (1-3000+ MB/s)         │     │
│   │  • Elastic: Auto-scales (up to 10+ GB/s reads)              │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                        │
└────────────────────────────────────────────────────────────────────────┘

EFS vs EBS vs S3

FeatureEFSEBSS3
TypeFile (NFS)BlockObject
AccessMulti-AZ, Multi-EC2Single AZ, Single EC2Global, HTTP
ScalingAutomatic (petabytes)Manual (16 TB max)Unlimited
Cost$0.30/GB (Standard)$0.08-0.125/GB$0.023/GB
Latency~mssub-ms~100ms
Use CaseShared filesDatabasesStatic content

RDS (Relational Database Service)

Managed relational databases: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora. RDS handles the undifferentiated heavy lifting of database administration — patching, backups, failover, replication — so your team can focus on schema design and query optimization. The most common mistake is treating RDS like a self-managed database: teams that manually manage backups or avoid Multi-AZ to save money end up paying much more in downtime and engineering hours when things go wrong.

RDS Architecture and Features

┌────────────────────────────────────────────────────────────────────────┐
│                      RDS Architecture                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                    RDS Instance                                │    │
│   │                                                                │    │
│   │   What AWS Manages:             What You Manage:               │    │
│   │   ✅ Hardware provisioning      ✅ Schema design               │    │
│   │   ✅ OS patching                ✅ Query optimization          │    │
│   │   ✅ Database patching          ✅ Index creation              │    │
│   │   ✅ Automated backups          ✅ Application tuning          │    │
│   │   ✅ Multi-AZ failover          ✅ Security groups             │    │
│   │   ✅ Scaling                    ✅ Parameter groups            │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Supported Engines:                                                   │
│   ┌─────────────┬────────────────────────────────────────────────┐    │
│   │ MySQL       │ 5.7, 8.0 - Most popular open source           │    │
│   │ PostgreSQL  │ 11-16 - Advanced features, extensions          │    │
│   │ MariaDB     │ 10.x - MySQL fork, community driven            │    │
│   │ Oracle      │ Enterprise, Standard - BYOL or License Included│    │
│   │ SQL Server  │ Express, Web, Standard, Enterprise             │    │
│   │ Aurora      │ MySQL/PostgreSQL compatible, 5x faster         │    │
│   └─────────────┴────────────────────────────────────────────────┘    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Multi-AZ vs Read Replicas

┌────────────────────────────────────────────────────────────────────────┐
│                    Multi-AZ Deployment                                  │
│                   (High Availability)                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Purpose: AUTOMATIC FAILOVER for disaster recovery (HA, not scaling)  │
│                                                                         │
│   ┌─────────────────────┐  Synchronous  ┌─────────────────────┐       │
│   │    Primary DB       │◄─────────────►│    Standby DB       │       │
│   │    (AZ-1a)          │  Replication  │    (AZ-1b)          │       │
│   │                     │               │                     │       │
│   │  ✅ All reads       │               │  ❌ No reads        │       │
│   │  ✅ All writes      │               │  ❌ No writes       │       │
│   └─────────────────────┘               └─────────────────────┘       │
│            │                                    │                      │
│            │    Automatic DNS failover          │                      │
│            │    (60-120 seconds)                │                      │
│            │                                    │                      │
│            ▼                                    ▼                      │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  Application connects to: mydb.abc123.us-east-1.rds.aws    │     │
│   │  (Single endpoint, AWS handles failover automatically)      │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                         │
│   When failover happens:                                               │
│   • AZ outage or instance failure                                      │
│   • Instance type change                                               │
│   • Manual failover (for testing)                                      │
│   • OS patching                                                        │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────┐
│                    Read Replicas                                        │
│                   (Read Scaling)                                        │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Purpose: SCALE READS by offloading queries                           │
│                                                                         │
│   ┌─────────────────────┐ Asynchronous ┌─────────────────────┐        │
│   │    Primary DB       │─────────────►│   Read Replica 1    │        │
│   │  (writes + reads)   │              │   (reads only)      │        │
│   └─────────────────────┘              └─────────────────────┘        │
│            │                                                           │
│            │              Async        ┌─────────────────────┐        │
│            └──────────────────────────►│   Read Replica 2    │        │
│                                        │   (reads only)      │        │
│                                        └─────────────────────┘        │
│                                                                         │
│   Key Points:                                                          │
│   • Up to 5 replicas (15 for Aurora)                                   │
│   • Can be cross-region (for DR or local reads)                       │
│   • Each replica has own endpoint                                      │
│   • Replication is ASYNC (eventual consistency) -- reads from          │
│   •   replicas may be milliseconds to seconds behind the primary.     │
│   •   Never read from a replica immediately after a write and expect  │
│   •   to see the new data. This is the most common read-replica bug.  │
│   • Can be promoted to standalone DB (for DR or migration)            │
│                                                                         │
│   Application Pattern:                                                 │
│   ┌─────────────────────────────────────────────────────────────┐     │
│   │  def get_connection(is_read_only=False):                     │     │
│   │      if is_read_only:                                        │     │
│   │          return connect("replica.abc123.us-east-1.rds.aws") │     │
│   │      return connect("primary.abc123.us-east-1.rds.aws")     │     │
│   └─────────────────────────────────────────────────────────────┘     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

RDS Backup and Recovery

import boto3
from datetime import datetime

rds = boto3.client('rds')

# Automated Backups (AWS Managed)
# - Daily during backup window
# - Transaction logs every 5 minutes
# - Retention: 0-35 days (0 disables)
# - Point-in-time recovery to any second

def restore_to_point_in_time(source_db: str, 
                              target_db: str,
                              restore_time: datetime):
    """Restore RDS to a specific point in time."""
    response = rds.restore_db_instance_to_point_in_time(
        SourceDBInstanceIdentifier=source_db,
        TargetDBInstanceIdentifier=target_db,
        RestoreTime=restore_time,
        UseLatestRestorableTime=False,
        DBInstanceClass='db.t3.medium',
        PubliclyAccessible=False,
        MultiAZ=True,
    )
    return response

# Manual Snapshots
# - User-initiated
# - Kept until you delete them
# - Can share across accounts/regions

def create_manual_snapshot(db_identifier: str):
    """Create manual DB snapshot."""
    snapshot_id = f"{db_identifier}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    response = rds.create_db_snapshot(
        DBInstanceIdentifier=db_identifier,
        DBSnapshotIdentifier=snapshot_id,
        Tags=[
            {'Key': 'Purpose', 'Value': 'ManualBackup'},
        ]
    )
    return snapshot_id

Aurora

AWS’s cloud-native relational database. MySQL and PostgreSQL compatible with 5x better performance.

Aurora Architecture Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                      Aurora Architecture                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                      Aurora Cluster                              │  │
│   │                                                                  │  │
│   │   Writer Endpoint: mydb.cluster-abc123.us-east-1.rds.aws        │  │
│   │   Reader Endpoint: mydb.cluster-ro-abc123.us-east-1.rds.aws     │  │
│   │                                                                  │  │
│   │   ┌───────────────────┐   ┌───────────────────┐                 │  │
│   │   │  Writer Instance  │   │ Reader Instance 1 │                 │  │
│   │   │    (Primary)      │   │    (Replica)      │                 │  │
│   │   │     (AZ-1a)       │   │     (AZ-1b)       │                 │  │
│   │   └─────────┬─────────┘   └─────────┬─────────┘                 │  │
│   │             │                       │                            │  │
│   │             └───────────┬───────────┘                            │  │
│   │                         │                                        │  │
│   │                         ▼                                        │  │
│   │   ┌─────────────────────────────────────────────────────────┐   │  │
│   │   │              Shared Cluster Storage                      │   │  │
│   │   │           (Aurora Storage Engine)                        │   │  │
│   │   │                                                          │   │  │
│   │   │   ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  ┌─────┐  │   │  │
│   │   │   │10GB │  │10GB │  │10GB │  │10GB │  │10GB │  │10GB │  │   │  │
│   │   │   │AZ-1a│  │AZ-1b│  │AZ-1c│  │AZ-1a│  │AZ-1b│  │AZ-1c│  │   │  │
│   │   │   └─────┘  └─────┘  └─────┘  └─────┘  └─────┘  └─────┘  │   │  │
│   │   │                                                          │   │  │
│   │   │   • 6 copies of data across 3 AZs                       │   │  │
│   │   │   • Can lose 2 copies and still write                   │   │  │
│   │   │   • Can lose 3 copies and still read                    │   │  │
│   │   │   • Auto-heals damaged segments                         │   │  │
│   │   │   • Storage auto-scales 10 GB → 128 TB                  │   │  │
│   │   └─────────────────────────────────────────────────────────┘   │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   AURORA ADVANTAGES:                                                   │
│   • 5x throughput of MySQL, 3x of PostgreSQL                          │
│   • Up to 15 read replicas (vs 5 for RDS MySQL)                       │
│   • Failover in < 30 seconds                                          │
│   • Continuous backup to S3 (no performance impact)                   │
│   • Point-in-time recovery to any second                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Aurora Serverless v2

┌────────────────────────────────────────────────────────────────────────┐
│                    Aurora Serverless v2                                 │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Auto-scaling capacity based on demand                                │
│                                                                         │
│   Capacity (ACUs)                                                       │
│   128 ┤                                                                │
│   100 ┤              ████████                                          │
│    80 ┤            ██        ██                                        │
│    60 ┤          ██            ██                                      │
│    40 ┤        ██                ██                                    │
│    20 ┤──────██────────────────────██────────────                      │
│       └─────────────────────────────────────────► Time                 │
│                Peak usage period                                        │
│                                                                         │
│   Configuration:                                                        │
│   • Min ACU: 0.5 (can scale to zero with Serverless v2)                │
│   • Max ACU: 128 (choose based on peak needs)                          │
│   • Scales in seconds (not minutes like v1)                            │
│                                                                         │
│   Pricing (us-east-1):                                                 │
│   • $0.12/ACU-hour (Aurora MySQL)                                      │
│   • Storage: $0.10/GB-month                                            │
│   • I/O: $0.20 per million requests                                    │
│                                                                         │
│   Best For:                                                             │
│   ✅ Variable/unpredictable workloads                                  │
│   ✅ Development/test databases                                        │
│   ✅ Multi-tenant SaaS applications                                    │
│   ✅ Infrequently used applications                                    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB

Fully managed NoSQL database with single-digit millisecond latency at any scale.

DynamoDB Data Model

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Data Model                                  │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   TABLE: Orders                                                         │
│   ──────────────                                                        │
│                                                                         │
│   Primary Key:                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  PARTITION KEY (HASH)    │  SORT KEY (RANGE)                    │  │
│   │  Required, determines    │  Optional, enables                   │  │
│   │  data distribution       │  range queries                       │  │
│   │                          │                                       │  │
│   │  customer_id             │  order_date                          │  │
│   │  "CUST#12345"           │  "2024-01-15T10:30:00Z"               │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Item (Document):                                                      │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │ {                                                                │  │
│   │   "customer_id": "CUST#12345",      // Partition key            │  │
│   │   "order_date": "2024-01-15",       // Sort key                 │  │
│   │   "order_id": "ORD#98765",          // Attribute                │  │
│   │   "total": 299.99,                  // Attribute                │  │
│   │   "items": [                        // Nested list              │  │
│   │     {"sku": "ABC123", "qty": 2},                                │  │
│   │     {"sku": "XYZ789", "qty": 1}                                 │  │
│   │   ],                                                             │  │
│   │   "status": "SHIPPED"                                           │  │
│   │ }                                                                │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   KEY DESIGN PATTERNS:                                                 │
│   ────────────────────                                                 │
│   1. One-to-Many: Use composite sort key                              │
│      PK: USER#123, SK: ORDER#001, ORDER#002, PROFILE                  │
│                                                                         │
│   2. Many-to-Many: Use GSI with inverted index                        │
│      PK: EMPLOYEE#1, SK: PROJECT#A                                    │
│      GSI: PK: PROJECT#A, SK: EMPLOYEE#1                               │
│                                                                         │
│   3. Hierarchical: Use sort key prefixes                              │
│      SK: COUNTRY#USA#STATE#CA#CITY#LA                                 │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB Operations (Best Practices)

import boto3
from boto3.dynamodb.conditions import Key, Attr
from decimal import Decimal
import json

# Initialize
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')

# =====================================
# SINGLE ITEM OPERATIONS
# =====================================

def put_item_example():
    """Write an item (creates or replaces)."""
    table.put_item(
        Item={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z',
            'order_id': 'ORD#98765',
            'total': Decimal('299.99'),  # Use Decimal for numbers
            'status': 'PENDING',
            'items': [
                {'sku': 'ABC123', 'qty': 2, 'price': Decimal('99.99')},
            ]
        },
        # Conditional write - only if doesn't exist
        ConditionExpression='attribute_not_exists(order_id)'
    )

def get_item_example():
    """Read a single item by primary key."""
    response = table.get_item(
        Key={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z'
        },
        # Strongly consistent read (vs eventually consistent)
        ConsistentRead=True,
        # Only return specific attributes
        ProjectionExpression='order_id, #s, total',
        ExpressionAttributeNames={'#s': 'status'}  # 'status' is reserved
    )
    return response.get('Item')

def update_item_example():
    """Update specific attributes (atomic)."""
    response = table.update_item(
        Key={
            'customer_id': 'CUST#12345',
            'order_date': '2024-01-15T10:30:00Z'
        },
        UpdateExpression='SET #s = :status, updated_at = :now ADD version :inc',
        ExpressionAttributeNames={'#s': 'status'},
        ExpressionAttributeValues={
            ':status': 'SHIPPED',
            ':now': '2024-01-16T14:00:00Z',
            ':inc': 1
        },
        # Optimistic locking
        ConditionExpression='version = :expected_version',
        ReturnValues='ALL_NEW'
    )
    return response['Attributes']

# =====================================
# QUERY (Efficient - uses partition key)
# =====================================

def query_customer_orders(customer_id: str, start_date: str = None):
    """Get all orders for a customer (uses partition key)."""
    key_condition = Key('customer_id').eq(customer_id)
    
    if start_date:
        key_condition = key_condition & Key('order_date').gte(start_date)
    
    response = table.query(
        KeyConditionExpression=key_condition,
        ScanIndexForward=False,  # Descending order (newest first)
        Limit=20,
        # Filter after query (use sparingly - costs RCUs)
        FilterExpression=Attr('status').ne('CANCELLED')
    )
    return response['Items']

# =====================================
# BATCH OPERATIONS
# =====================================

def batch_write_items(items: list):
    """Write up to 25 items in one request."""
    with table.batch_writer() as batch:
        for item in items:
            batch.put_item(Item=item)
    # boto3 handles batching, retries, and unprocessed items

def batch_get_items(keys: list):
    """Read up to 100 items in one request."""
    response = dynamodb.batch_get_item(
        RequestItems={
            'Orders': {
                'Keys': keys,
                'ProjectionExpression': 'order_id, total, status'
            }
        }
    )
    return response['Responses']['Orders']

# =====================================
# TRANSACTIONS (ACID)
# =====================================

def transfer_with_transaction():
    """Atomic multi-item transaction."""
    dynamodb_client = boto3.client('dynamodb')
    
    dynamodb_client.transact_write_items(
        TransactItems=[
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {'account_id': {'S': 'ACC#001'}},
                    'UpdateExpression': 'SET balance = balance - :amount',
                    'ConditionExpression': 'balance >= :amount',
                    'ExpressionAttributeValues': {':amount': {'N': '100'}}
                }
            },
            {
                'Update': {
                    'TableName': 'Accounts',
                    'Key': {'account_id': {'S': 'ACC#002'}},
                    'UpdateExpression': 'SET balance = balance + :amount',
                    'ExpressionAttributeValues': {':amount': {'N': '100'}}
                }
            }
        ]
    )

Global Secondary Indexes (GSI)

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Indexes                                     │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   BASE TABLE: Orders                                                    │
│   PK: customer_id, SK: order_date                                       │
│                                                                         │
│   Access Pattern: "Get orders by customer" ✅ Query on PK               │
│                                                                         │
│   NEW Access Pattern: "Get orders by status"                            │
│   ❌ Can't query - status is not a key!                                 │
│   ✅ Solution: Create GSI                                               │
│                                                                         │
│   GLOBAL SECONDARY INDEX (GSI):                                         │
│   ─────────────────────────────                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  GSI: orders-by-status                                          │  │
│   │  PK: status      SK: order_date                                 │  │
│   │                                                                  │  │
│   │  Projected Attributes: order_id, customer_id, total             │  │
│   │  (Can project ALL, KEYS_ONLY, or specific attributes)           │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   # Query GSI                                                           │
│   table.query(                                                          │
│       IndexName='orders-by-status',                                     │
│       KeyConditionExpression=Key('status').eq('PENDING')                │
│   )                                                                     │
│                                                                         │
│   GSI vs LSI:                                                           │
│   ┌──────────────────────────────────────────────────────────────────┐ │
│   │  GSI (Global Secondary)      │  LSI (Local Secondary)           │ │
│   │  ────────────────────────────┼───────────────────────────────── │ │
│   │  Different partition key     │  Same partition key              │ │
│   │  Create anytime              │  Create at table creation only   │ │
│   │  Own RCU/WCU                 │  Shares table RCU/WCU            │ │
│   │  Eventually consistent only  │  Strongly consistent available   │ │
│   │  Up to 20 per table          │  Up to 5 per table              │ │
│   └──────────────────────────────────────────────────────────────────┘ │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

DynamoDB Capacity and Pricing

┌────────────────────────────────────────────────────────────────────────┐
│                    DynamoDB Capacity Modes                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ON-DEMAND MODE:                                                       │
│   ───────────────                                                       │
│   • Pay per request ($1.25 per million writes, $0.25 per million reads)│
│   • No capacity planning                                               │
│   • Scales instantly to any traffic                                    │
│   • Best for: Unpredictable traffic, new applications                  │
│                                                                         │
│   PROVISIONED MODE:                                                     │
│   ─────────────────                                                     │
│   • Pre-allocate RCU (Read Capacity Units) and WCU (Write Capacity)    │
│   • Cheaper at scale (~5-7x cheaper at steady state)                   │
│   • Auto Scaling available                                              │
│   • Reserved Capacity for 1-3 years (up to 70% discount)               │
│                                                                         │
│   CAPACITY UNITS:                                                       │
│   ───────────────                                                       │
│   1 RCU = 1 strongly consistent read/sec (up to 4 KB)                  │
│         = 2 eventually consistent reads/sec (up to 4 KB)               │
│   1 WCU = 1 write/sec (up to 1 KB)                                     │
│                                                                         │
│   Example Calculation:                                                  │
│   ────────────────────                                                  │
│   100 reads/sec × 8 KB items × strongly consistent                     │
│   = 100 × (8 KB / 4 KB) × 1 = 200 RCU                                  │
│                                                                         │
│   50 writes/sec × 3 KB items                                           │
│   = 50 × ceil(3 KB / 1 KB) = 150 WCU                                   │
│                                                                         │
│   COST COMPARISON (us-east-1):                                         │
│   ────────────────────────────                                         │
│   On-Demand: $0.25/million reads, $1.25/million writes                 │
│   Provisioned: $0.00013/RCU-hr, $0.00065/WCU-hr                        │
│                                                                         │
│   Break-even: ~65,000 reads/hr or ~20,000 writes/hr                    │
│   Below this → On-Demand cheaper                                       │
│   Above this → Provisioned cheaper                                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

ElastiCache

Managed in-memory caching for sub-millisecond response times: Redis or Memcached.

ElastiCache Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                    ElastiCache Architecture                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   APPLICATION CACHING PATTERN:                                         │
│   ────────────────────────────                                         │
│                                                                         │
│   ┌────────────┐                                                       │
│   │   Client   │                                                       │
│   └──────┬─────┘                                                       │
│          │ 1. Request                                                   │
│          ▼                                                              │
│   ┌────────────┐      2. Check cache     ┌─────────────────────┐       │
│   │ Application│ ────────────────────► │   ElastiCache       │       │
│   │   Server   │                        │   (Redis/Memcached) │       │
│   └──────┬─────┘ ◄──────────────────── └─────────────────────┘       │
│          │        3a. Cache HIT                     │                  │
│          │           (return)            3b. Cache MISS                │
│          │                                          │                  │
│          │ 4. Query if miss              ┌──────────┘                  │
│          ▼                               ▼                              │
│   ┌─────────────────────────────────────────────────┐                  │
│   │              Database (RDS/DynamoDB)             │                  │
│   └─────────────────────────────────────────────────┘                  │
│                               │                                         │
│                               │ 5. Return data                          │
│                               ▼                                         │
│   ┌────────────────────────────────────────────────────────────────┐   │
│   │  Application: Store in cache with TTL, return to client        │   │
│   └────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Redis vs Memcached Comparison

FeatureRedisMemcached
Data StructuresStrings, Lists, Sets, Sorted Sets, Hashes, StreamsSimple key-value only
PersistenceYes (RDB, AOF)No
ReplicationYes (Multi-AZ)No
Pub/SubYesNo
Lua ScriptingYesNo
Cluster ModeYes (sharding)Yes (sharding)
Multi-threadingSingle-threaded (per shard)Multi-threaded
Use CaseSessions, leaderboards, queuesSimple caching, high throughput

Caching Strategies

import redis
import json
from datetime import timedelta

# Connect to ElastiCache Redis
redis_client = redis.Redis(
    host='my-cluster.abc123.cache.amazonaws.com',
    port=6379,
    ssl=True,
    decode_responses=True
)

# ======================
# CACHE-ASIDE (Lazy Loading)
# ======================
def get_user_with_cache_aside(user_id: str) -> dict:
    """
    Cache-Aside: Application manages cache
    
    Pros: Only requested data cached, cache failures don't break app
    Cons: Cache miss = slow, data can be stale
    """
    cache_key = f"user:{user_id}"
    
    # 1. Try cache first
    cached = redis_client.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # 2. Cache miss - query database
    user = db.query_user(user_id)  # Your DB call
    
    # 3. Store in cache with TTL
    redis_client.setex(
        cache_key,
        timedelta(hours=1),  # TTL
        json.dumps(user)
    )
    
    return user

# ======================
# WRITE-THROUGH
# ======================
def update_user_write_through(user_id: str, data: dict) -> dict:
    """
    Write-Through: Write to cache AND database together
    
    Pros: Cache always consistent, never stale
    Cons: Write latency (two writes), cache churn for unused data
    """
    cache_key = f"user:{user_id}"
    
    # 1. Write to database
    user = db.update_user(user_id, data)
    
    # 2. Write to cache
    redis_client.setex(
        cache_key,
        timedelta(hours=1),
        json.dumps(user)
    )
    
    return user

# ======================
# WRITE-BEHIND (Write-Back)
# ======================
def update_user_write_behind(user_id: str, data: dict) -> dict:
    """
    Write-Behind: Write to cache, async write to database
    
    Pros: Fast writes, reduced database load
    Cons: Data loss risk, complex implementation
    """
    cache_key = f"user:{user_id}"
    
    # 1. Write to cache immediately
    user = {**get_user(user_id), **data}
    redis_client.setex(cache_key, timedelta(hours=1), json.dumps(user))
    
    # 2. Queue database write (async)
    write_queue.send({'user_id': user_id, 'data': data})
    
    return user

# ======================
# CACHE INVALIDATION
# ======================
def invalidate_user_cache(user_id: str):
    """Delete cache entry when data changes."""
    redis_client.delete(f"user:{user_id}")

def invalidate_pattern(pattern: str):
    """Delete all keys matching pattern (use with caution)."""
    cursor = 0
    while True:
        cursor, keys = redis_client.scan(cursor, match=pattern, count=100)
        if keys:
            redis_client.delete(*keys)
        if cursor == 0:
            break

🎯 Interview Questions

S3 (Object Storage):
  • Static files, backups, data lakes
  • HTTP access from anywhere
  • Unlimited storage, 11 9s durability
EBS (Block Storage):
  • EC2 boot volumes, databases
  • Single EC2, single AZ
  • Low latency (sub-ms), resizable
EFS (File Storage):
  • Shared file systems across EC2
  • Multi-AZ, POSIX compliant
  • Auto-scaling, Linux only
Decision Matrix:
  • Need HTTP access globally → S3
  • Need database storage → EBS
  • Need shared NFS mount → EFS
Access Patterns:
  1. Get user profile
  2. Get user’s posts (sorted by date)
  3. Get post’s comments
  4. Get user’s followers
  5. Get posts by hashtag
Single Table Design:
PK: USER#123, SK: PROFILE       → User profile
PK: USER#123, SK: POST#2024...  → User's posts
PK: POST#456, SK: COMMENT#...   → Post comments
PK: USER#123, SK: FOLLOWS#789   → Followers

GSI1: hashtag, created_at       → Posts by hashtag
GSI2: post_id, created_at       → Comments on post
Key Points:
  • Denormalize for read performance
  • Use composite sort keys
  • Create GSIs for access patterns
  • Use sparse indexes where appropriate
Multi-AZ (High Availability):
  • Purpose: Disaster recovery
  • Synchronous replication
  • Automatic failover (60-120s)
  • Standby NOT readable
  • Same region only
Read Replicas (Read Scaling):
  • Purpose: Performance/offload reads
  • Asynchronous replication
  • Manual promotion (becomes primary)
  • Replicas ARE readable
  • Can be cross-region
Best Practice: Use BOTH
  • Multi-AZ for production HA
  • Read replicas for read scaling
Strategies:
  1. TTL-Based (Simplest)
    • Set expiration on cache entries
    • Accept eventual consistency
  2. Event-Driven
    • Publish events on data changes
    • Consumers invalidate cache
  3. Write-Through
    • Update cache on every write
    • Never stale, but slower writes
  4. Cache-Aside with Versioning
    • Include version in cache key
    • Bump version on update
Code Pattern:
def update_product(product_id, data):
    # Update database
    db.update(product_id, data)
    # Invalidate cache
    cache.delete(f"product:{product_id}")
    # Or publish event
    sns.publish(topic, {"product_id": product_id})
Strategy: Tiered Storage with Lifecycle
  1. Hot Data (Recent 30 days): S3 Standard
  2. Warm Data (30-90 days): S3 Standard-IA
  3. Cold Data (90+ days): S3 Glacier
  4. Archive (1+ year): S3 Glacier Deep Archive
Lifecycle Policy:
{
  "Rules": [{
    "Transitions": [
      {"Days": 30, "StorageClass": "STANDARD_IA"},
      {"Days": 90, "StorageClass": "GLACIER"},
      {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
    ],
    "Expiration": {"Days": 2555}
  }]
}
Cost Estimate (100 TB, us-east-1):
  • All Standard: $2,300/month
  • With lifecycle: ~$200/month (91% savings!)

🧪 Hands-On Lab: Build a Caching Layer

Objective: Add Redis caching to reduce database load by 80%
1

Create ElastiCache Redis Cluster

Use t3.micro for testing, enable encryption in transit
2

Modify Security Groups

Allow port 6379 from application servers
3

Implement Cache-Aside Pattern

Add caching layer to your application
4

Add TTL and Invalidation

Set appropriate TTLs, invalidate on writes
5

Monitor with CloudWatch

Track cache hit ratio, memory usage

Storage Comparison Summary

ServiceTypeDurabilityLatencyBest For
S3Object11 9s~100msFiles, backups, static sites
EBSBlock99.999%sub-msEC2 volumes, databases
EFSFile11 9s~msShared file systems
DynamoDBNoSQL11 9ssub-10msKey-value, high scale
RDS/AuroraSQL99.95%~msRelational, complex queries
ElastiCacheIn-memoryN/Asub-msCaching, sessions

Next Module

Networking

Master VPC, subnets, security groups, and load balancers