Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module Overview
Estimated Time: 5-6 hours | Difficulty: Intermediate | Prerequisites: Core Concepts, Compute
- S3 storage classes, lifecycle policies, and security
- EBS volumes and snapshots for EC2
- EFS for shared file systems
- RDS and Aurora for relational databases
- DynamoDB for NoSQL at scale
- ElastiCache for sub-millisecond response times
- Data migration strategies
Storage Service Selection Guide
┌────────────────────────────────────────────────────────────────────────┐
│ AWS Storage Decision Tree │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ What type of data access pattern? │
│ │ │
│ ├─── Random access to blocks ──────────► EBS (EC2 volumes) │
│ │ (Database, boot volumes) │ │
│ │ └── gp3: General │
│ │ └── io2: High IOPS │
│ │ │
│ ├─── Object/file storage ──────────────► S3 (Object Storage) │
│ │ (Any size, any format) │ │
│ │ └── Standard: Hot │
│ │ └── IA: Warm │
│ │ └── Glacier: Cold │
│ │ │
│ ├─── Shared file system (Linux) ───────► EFS (NFS) │
│ │ (Multiple EC2, containers) │
│ │ │
│ └─── Shared file system (Windows) ─────► FSx for Windows │
│ (AD integration, SMB) │
│ │
│ DATABASE Decision: │
│ ────────────────── │
│ │ │
│ ├─── Relational, complex queries ──────► RDS / Aurora │
│ │ (Joins, transactions, ACID) │
│ │ │
│ ├─── Key-value, high scale ────────────► DynamoDB │
│ │ (Single-digit ms, serverless) │
│ │ │
│ ├─── Document store ───────────────────► DocumentDB (MongoDB) │
│ │ │
│ ├─── Graph relationships ──────────────► Neptune │
│ │ │
│ └─── Caching layer ────────────────────► ElastiCache │
│ (Redis/Memcached) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Storage Types Comparison
┌───────────────────────────────────────────────────────────────────────┐
│ AWS Storage Types │
├───────────────────────────────────────────────────────────────────────┤
│ │
│ BLOCK STORAGE OBJECT STORAGE FILE STORAGE │
│ ───────────── ───────────── ───────────── │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ EBS │ │ S3 │ │ EFS │ │
│ │ (Volumes) │ │ (Buckets) │ │ (NFS) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ Characteristics: Characteristics: Characteristics: │
│ • Like a hard disk • Like Dropbox • Like network share │
│ • Attach to 1 EC2 • HTTP access • Multiple EC2s │
│ • Single AZ • Global access • Multi-AZ │
│ • Low latency • Unlimited size • POSIX compliant │
│ • Boot volumes • 11 9s durability • Auto-scaling │
│ │
│ Use Cases: Use Cases: Use Cases: │
│ • Databases • Static websites • Content management │
│ • Enterprise apps • Data lakes • Web serving │
│ • Boot/root volumes • Backups/archives • Container storage │
│ • High-perf workloads • Media hosting • Dev environments │
│ │
│ Durability & Performance: │
│ ───────────────────────── │
│ EBS: 99.999% (single AZ), <1ms latency │
│ S3: 99.999999999% (11 9s), ~100ms latency │
│ EFS: 99.999999999%, few ms latency │
│ │
└───────────────────────────────────────────────────────────────────────┘
S3 (Simple Storage Service)
Object storage with unlimited capacity. The backbone of AWS storage and data lakes. S3 is deceptively simple on the surface — just PUT and GET objects. But underneath it powers some of the largest data platforms in the world and has more configuration knobs than almost any other AWS service. The biggest cost mistake teams make with S3 is not setting up lifecycle policies — storing 5 years of logs in S3 Standard instead of Glacier Deep Archive can cost 20x more than necessary.S3 Architecture Deep Dive
┌──────────────────────────────────────────────────────────────────────┐
│ S3 Architecture │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ Bucket: my-company-data-prod (globally unique name) │
│ Region: us-east-1 │
│ ───────────────────────────────────────────────────────────────── │
│ │
│ Object Structure: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Key: images/2024/01/photo.jpg │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Value: [binary data up to 5 TB] │ │ │
│ │ │ │ │ │
│ │ │ Metadata: │ │ │
│ │ │ • System: Content-Type, Last-Modified, ETag │ │ │
│ │ │ • User-defined: x-amz-meta-* │ │ │
│ │ │ │ │ │
│ │ │ Version ID: 3sL4kqtJlcpXroDTDmJ+rmSpXd3dIbrHY+MTRCxf3 │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Limits & Capabilities: │
│ • Max object size: 5 TB │
│ • Max single PUT: 5 GB (use multipart for larger) │
│ • Max parts in multipart: 10,000 │
│ • Recommended multipart for > 100 MB │
│ • Unlimited objects per bucket │
│ • Bucket names: 3-63 chars, globally unique │
│ │
│ Data Consistency (as of Dec 2020): │
│ • Strong read-after-write consistency for all operations │
│ • No eventual consistency delays anymore! │
│ │
└──────────────────────────────────────────────────────────────────────┘
S3 Storage Classes Deep Dive
┌────────────────────────────────────────────────────────────────────────┐
│ S3 Storage Classes │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Storage Class │ Access │ Min Storage │ Retrieval │ Cost │
│ ─────────────────────────────────────────────────────────────────────│
│ S3 Standard │ Milliseconds │ None │ None │ $$$$ │
│ ─────────────────────────────────────────────────────────────────────│
│ S3 Standard-IA │ Milliseconds │ 30 days │ Per GB │ $$ │
│ ─────────────────────────────────────────────────────────────────────│
│ S3 One Zone-IA │ Milliseconds │ 30 days │ Per GB │ $ │
│ ─────────────────────────────────────────────────────────────────────│
│ S3 Intelligent │ Milliseconds │ None │ None │ Auto │
│ Tiering │ │ │ monitoring│ │
│ ─────────────────────────────────────────────────────────────────────│
│ Glacier Instant │ Milliseconds │ 90 days │ Per GB │ ¢¢ │
│ Retrieval │ │ │ │ │
│ ─────────────────────────────────────────────────────────────────────│
│ Glacier Flexible │ 1-5 min to │ 90 days │ Per GB + │ ¢ │
│ Retrieval │ 12 hours │ │ per req │ │
│ ─────────────────────────────────────────────────────────────────────│
│ Glacier Deep │ 12-48 hours │ 180 days │ Per GB + │ ¢ │
│ Archive │ │ │ per req │ │
│ ─────────────────────────────────────────────────────────────────────│
│ │
│ Cost Comparison (per GB/month, us-east-1): │
│ Standard: $0.023 │
│ Standard-IA: $0.0125 (+ $0.01/GB retrieval) │
│ One Zone-IA: $0.01 (+ $0.01/GB retrieval) │
│ Glacier IR: $0.004 (+ $0.03/GB retrieval) │
│ Glacier FR: $0.0036 (+ $0.01-$0.03/GB retrieval) │
│ Deep Archive: $0.00099 (+ $0.02/GB retrieval) │
│ │
│ USE INTELLIGENT TIERING when access patterns are unknown! │
│ It automatically moves objects between tiers based on access. │
│ │
└────────────────────────────────────────────────────────────────────────┘
S3 Lifecycle Policies
# Example lifecycle policy configuration
import boto3
import json
s3 = boto3.client('s3')
# Lifecycle policies are the single most impactful cost optimization for S3.
# A real-world example: a company storing 50 TB of logs at S3 Standard costs
# ~$1,150/month. Moving 30-day-old logs to Glacier cuts that to ~$180/month.
# Adding Deep Archive after 1 year brings long-term storage to ~$50/month.
lifecycle_policy = {
"Rules": [
{
"ID": "MoveToIA30Days",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555 # Delete after 7 years
}
},
{
# This rule is critical and often overlooked. Failed multipart
# uploads leave invisible partial data in your bucket that you
# still pay for. Some teams discover gigabytes of orphaned parts
# eating their S3 budget. 7 days is generous -- most uploads
# either complete in minutes or never will.
"ID": "DeleteIncompleteMultipartUploads",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 7
}
},
{
"ID": "ExpireOldVersions",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"NoncurrentVersionTransitions": [
{
"NoncurrentDays": 30,
"StorageClass": "GLACIER"
}
],
"NoncurrentVersionExpiration": {
"NoncurrentDays": 365
}
}
]
}
s3.put_bucket_lifecycle_configuration(
Bucket='my-bucket',
LifecycleConfiguration=lifecycle_policy
)
S3 Security Best Practices
┌────────────────────────────────────────────────────────────────────────┐
│ S3 Security Layers │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Layer 1: BLOCK PUBLIC ACCESS (Account & Bucket Level) │
│ ───────────────────────────────────────────────────── │
│ Enable all four settings by default: │
│ ✅ BlockPublicAcls │
│ ✅ IgnorePublicAcls │
│ ✅ BlockPublicPolicy │
│ ✅ RestrictPublicBuckets │
│ │
│ Layer 2: BUCKET POLICY (Resource-Based) │
│ ──────────────────────────────────────── │
│ { │
│ "Version": "2012-10-17", │
│ "Statement": [{ │
│ "Sid": "EnforceTLS", │
│ "Effect": "Deny", │
│ "Principal": "*", │
│ "Action": "s3:*", │
│ "Resource": ["arn:aws:s3:::bucket/*"], │
│ "Condition": { │
│ "Bool": {"aws:SecureTransport": "false"} │
│ } │
│ }] │
│ } │
│ │
│ Layer 3: IAM POLICIES (Identity-Based) │
│ ────────────────────────────────────── │
│ Attached to users, groups, or roles │
│ │
│ Layer 4: ENCRYPTION │
│ ─────────────────── │
│ • SSE-S3: AWS managed keys (default, free) │
│ • SSE-KMS: Customer managed keys (audit trail) │
│ • SSE-C: Customer provided keys │
│ • Client-side: Encrypt before upload │
│ │
│ Layer 5: ACCESS POINTS │
│ ────────────────────── │
│ Simplified access management for multi-tenant buckets │
│ │
│ Layer 6: OBJECT LOCK │
│ ───────────────────── │
│ WORM (Write Once Read Many) for compliance │
│ • Governance mode: Can be overridden with special permissions │
│ • Compliance mode: Cannot be overridden by anyone │
│ │
└────────────────────────────────────────────────────────────────────────┘
S3 Performance Optimization
import boto3
from concurrent.futures import ThreadPoolExecutor
import os
s3 = boto3.client('s3')
class S3Uploader:
"""High-performance S3 uploader with best practices."""
def __init__(self, bucket: str):
self.bucket = bucket
# Configure transfer settings for optimal performance
self.config = boto3.s3.transfer.TransferConfig(
multipart_threshold=8 * 1024 * 1024, # 8 MB
max_concurrency=10,
multipart_chunksize=8 * 1024 * 1024, # 8 MB
use_threads=True
)
def upload_file(self, local_path: str, s3_key: str):
"""Upload with automatic multipart handling."""
s3.upload_file(
local_path,
self.bucket,
s3_key,
Config=self.config
)
def upload_large_file_optimized(self, local_path: str, s3_key: str):
"""
Best practices for large file uploads:
1. Use multipart upload (automatic for > 8 MB)
2. Use S3 Transfer Acceleration for cross-region
3. Use byte-range requests for downloads
"""
# Enable Transfer Acceleration for the bucket first
# aws s3api put-bucket-accelerate-configuration \
# --bucket my-bucket --accelerate-configuration Status=Enabled
s3_accelerate = boto3.client(
's3',
endpoint_url='https://s3-accelerate.amazonaws.com'
)
s3_accelerate.upload_file(
local_path,
self.bucket,
s3_key,
Config=self.config
)
# S3 Performance Tips:
# 1. Use random prefixes for high request rates (3,500+ PUT/s)
# Instead of: logs/2024/01/15/file1.log
# Use: logs/a7b3c1d2/2024/01/15/file1.log
#
# 2. Enable S3 Transfer Acceleration for upload speeds
# 20-200% faster for long-distance transfers
#
# 3. Use byte-range fetches for parallel downloads
#
# 4. S3 can handle 5,500+ GET requests/second per prefix
#
# 5. Use S3 Select to retrieve only needed data (up to 400% faster)
Presigned URLs for Secure Sharing
import boto3
from datetime import datetime, timedelta
s3 = boto3.client('s3')
def generate_presigned_download_url(bucket: str, key: str,
expiry_seconds: int = 3600) -> str:
"""Generate a presigned URL for downloading."""
url = s3.generate_presigned_url(
'get_object',
Params={
'Bucket': bucket,
'Key': key
},
ExpiresIn=expiry_seconds
)
return url
def generate_presigned_upload_url(bucket: str, key: str,
content_type: str = 'application/octet-stream',
expiry_seconds: int = 3600) -> dict:
"""Generate a presigned URL for uploading."""
response = s3.generate_presigned_post(
Bucket=bucket,
Key=key,
Fields={
'Content-Type': content_type
},
Conditions=[
{'Content-Type': content_type},
['content-length-range', 1, 104857600] # 1 byte to 100 MB
],
ExpiresIn=expiry_seconds
)
return response # Returns {'url': ..., 'fields': {...}}
# Usage in API response
download_url = generate_presigned_download_url('my-bucket', 'reports/q1.pdf')
upload_info = generate_presigned_upload_url('my-bucket', 'uploads/user123/file.jpg')
EBS (Elastic Block Store)
Persistent block storage for EC2 instances. Like a high-performance SSD/HDD attached to your server. The key mental model: EBS volumes are network-attached storage that happens to be very fast. They persist independently of the EC2 instance (unlike instance store, which is physically attached and lost on stop/terminate). This means you can snapshot, detach, and reattach volumes — but it also means there is a small network latency penalty compared to local NVMe drives.EBS Volume Types Deep Dive
┌────────────────────────────────────────────────────────────────────────┐
│ EBS Volume Types │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ SSD-BASED (Random I/O): │
│ ───────────────────────── │
│ │
│ gp3 (General Purpose SSD) - RECOMMENDED DEFAULT │
│ ├── Baseline: 3,000 IOPS, 125 MB/s (free, included in GB price) │
│ ├── Max: 16,000 IOPS, 1,000 MB/s │
│ ├── Size: 1 GB - 16 TB │
│ ├── Cost: $0.08/GB + $0.005/IOPS (above 3K) │
│ ├── Cost tip: gp3 is almost always cheaper than gp2 -- same price per│
│ │ GB but 3K IOPS baseline is free vs gp2 needing 1TB to reach 3K │
│ └── Use: Boot volumes, most workloads, databases under 16K IOPS │ │
│ │
│ gp2 (Previous Gen) - LEGACY │
│ ├── IOPS: 3 IOPS/GB (burst to 3,000) │
│ ├── Max: 16,000 IOPS │
│ └── Note: gp3 is usually more cost-effective │
│ │
│ io2 Block Express (Provisioned IOPS) - HIGHEST PERFORMANCE │
│ ├── Max: 256,000 IOPS, 4,000 MB/s │
│ ├── Latency: Sub-millisecond │
│ ├── Durability: 99.999% (5 9s vs 99.9% for others) │
│ ├── Size: 4 GB - 64 TB │
│ ├── Cost: $0.125/GB + $0.065/IOPS │
│ └── Use: Databases (Oracle, SAP), critical I/O workloads │
│ │
│ io1 (Provisioned IOPS) - LEGACY │
│ └── Max: 64,000 IOPS, use io2 instead │
│ │
│ HDD-BASED (Sequential I/O): │
│ ────────────────────────── │
│ │
│ st1 (Throughput Optimized HDD) │
│ ├── Max: 500 IOPS, 500 MB/s │
│ ├── Size: 125 GB - 16 TB │
│ ├── Cost: $0.045/GB │
│ └── Use: Big data, log processing, data warehouses │
│ │
│ sc1 (Cold HDD) - LOWEST COST │
│ ├── Max: 250 IOPS, 250 MB/s │
│ ├── Size: 125 GB - 16 TB │
│ ├── Cost: $0.015/GB │
│ └── Use: Infrequent access, archival │
│ │
│ BOOT VOLUME ELIGIBILITY: │
│ • gp2, gp3, io1, io2 ✅ Can be boot volumes │
│ • st1, sc1 ❌ Cannot be boot volumes │
│ │
└────────────────────────────────────────────────────────────────────────┘
EBS Snapshots and Backup
import boto3
from datetime import datetime, timedelta
ec2 = boto3.client('ec2')
def create_ebs_snapshot(volume_id: str, description: str = None):
"""Create EBS snapshot (incremental backup to S3)."""
response = ec2.create_snapshot(
VolumeId=volume_id,
Description=description or f"Backup-{datetime.now().isoformat()}",
TagSpecifications=[
{
'ResourceType': 'snapshot',
'Tags': [
{'Key': 'Name', 'Value': 'AutomatedBackup'},
{'Key': 'CreatedBy', 'Value': 'Python Script'},
]
}
]
)
return response['SnapshotId']
def copy_snapshot_cross_region(snapshot_id: str,
source_region: str,
destination_region: str):
"""Copy snapshot to another region for DR."""
dest_ec2 = boto3.client('ec2', region_name=destination_region)
response = dest_ec2.copy_snapshot(
SourceSnapshotId=snapshot_id,
SourceRegion=source_region,
Description=f"DR copy from {source_region}",
Encrypted=True # Always encrypt DR copies
)
return response['SnapshotId']
def create_volume_from_snapshot(snapshot_id: str,
az: str,
volume_type: str = 'gp3'):
"""Create new volume from snapshot (for restore or DR)."""
response = ec2.create_volume(
SnapshotId=snapshot_id,
AvailabilityZone=az,
VolumeType=volume_type,
Encrypted=True,
TagSpecifications=[
{
'ResourceType': 'volume',
'Tags': [
{'Key': 'Name', 'Value': 'RestoredVolume'},
]
}
]
)
return response['VolumeId']
# Key Points:
# 1. Snapshots are incremental (only changed blocks stored)
# 2. Snapshots stored in S3 (managed by AWS, not visible)
# 3. Can create volumes in any AZ from snapshot
# 4. First snapshot takes longest, subsequent are fast
# 5. Deleting snapshots only removes data not needed by others
EBS Encryption
┌────────────────────────────────────────────────────────────────────────┐
│ EBS Encryption │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ What's encrypted: │
│ ✅ Data at rest on volume │
│ ✅ Data in transit between EC2 and EBS │
│ ✅ All snapshots │
│ ✅ All volumes created from encrypted snapshots │
│ │
│ Encryption Keys: │
│ • Default: AWS managed key (aws/ebs) │
│ • Custom: Your KMS CMK (better audit, control) │
│ │
│ Converting Unencrypted to Encrypted: │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ 1. Create snapshot of unencrypted volume │ │
│ │ 2. Copy snapshot with encryption enabled │ │
│ │ 3. Create encrypted volume from encrypted snapshot │ │
│ │ 4. Attach new volume, migrate data, detach old │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ Best Practice: Enable "Encrypt new EBS volumes by default" │
│ in EC2 → EBS → Settings │
│ │
└────────────────────────────────────────────────────────────────────────┘
EBS is AZ-specific! Volumes only exist in one AZ. To move across AZs or regions:
- Create snapshot
- Copy to target region (if cross-region)
- Create volume in target AZ from snapshot
EFS (Elastic File System)
Managed NFS file system that scales automatically. Shared storage for Linux workloads. EFS solves the problem of “I need multiple EC2 instances (or Lambda functions, or containers) to read and write the same files.” Think of it as a shared network drive that auto-scales from kilobytes to petabytes. The trade-off is cost: EFS Standard at $0.30/GB is roughly 4x more expensive than EBS gp3, so only use it when you genuinely need shared access.┌────────────────────────────────────────────────────────────────────────┐
│ EFS Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ EFS File System │ │
│ │ /efs-shared-data │ │
│ │ │ │
│ │ Storage Classes: │ │
│ │ ┌────────────────┐ ┌────────────────┐ │ │
│ │ │ Standard │ │ Standard-IA │ │ │
│ │ │ (Frequent) │ │ (Infrequent) │ │ │
│ │ │ $0.30/GB │ │ $0.016/GB + │ │ │
│ │ │ │ │ $0.01/access │ │ │
│ │ └────────────────┘ └────────────────┘ │ │
│ │ │ │
│ │ Lifecycle: Auto-move to IA after 7/14/30/60/90 days │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┼────────────┬────────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │Mount Target │ │Mount Target │ │Mount Target │ │ Lambda │ │
│ │ (AZ-1a) │ │ (AZ-1b) │ │ (AZ-1c) │ │ (via VPC) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │ │
│ EC2-1 EC2-2 EC2-3 Lambda │
│ (/mnt/efs) (/mnt/efs) (/mnt/efs) Functions │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Performance Modes: │ │
│ │ • General Purpose: Low latency (default) │ │
│ │ • Max I/O: Higher latency, higher throughput (big data) │ │
│ │ │ │
│ │ Throughput Modes: │ │
│ │ • Bursting: Scales with size (50 MB/s per TB) │ │
│ │ • Provisioned: Set fixed throughput (1-3000+ MB/s) │ │
│ │ • Elastic: Auto-scales (up to 10+ GB/s reads) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
EFS vs EBS vs S3
| Feature | EFS | EBS | S3 |
|---|---|---|---|
| Type | File (NFS) | Block | Object |
| Access | Multi-AZ, Multi-EC2 | Single AZ, Single EC2 | Global, HTTP |
| Scaling | Automatic (petabytes) | Manual (16 TB max) | Unlimited |
| Cost | $0.30/GB (Standard) | $0.08-0.125/GB | $0.023/GB |
| Latency | ~ms | sub-ms | ~100ms |
| Use Case | Shared files | Databases | Static content |
RDS (Relational Database Service)
Managed relational databases: MySQL, PostgreSQL, MariaDB, Oracle, SQL Server, Aurora. RDS handles the undifferentiated heavy lifting of database administration — patching, backups, failover, replication — so your team can focus on schema design and query optimization. The most common mistake is treating RDS like a self-managed database: teams that manually manage backups or avoid Multi-AZ to save money end up paying much more in downtime and engineering hours when things go wrong.RDS Architecture and Features
┌────────────────────────────────────────────────────────────────────────┐
│ RDS Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ RDS Instance │ │
│ │ │ │
│ │ What AWS Manages: What You Manage: │ │
│ │ ✅ Hardware provisioning ✅ Schema design │ │
│ │ ✅ OS patching ✅ Query optimization │ │
│ │ ✅ Database patching ✅ Index creation │ │
│ │ ✅ Automated backups ✅ Application tuning │ │
│ │ ✅ Multi-AZ failover ✅ Security groups │ │
│ │ ✅ Scaling ✅ Parameter groups │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ Supported Engines: │
│ ┌─────────────┬────────────────────────────────────────────────┐ │
│ │ MySQL │ 5.7, 8.0 - Most popular open source │ │
│ │ PostgreSQL │ 11-16 - Advanced features, extensions │ │
│ │ MariaDB │ 10.x - MySQL fork, community driven │ │
│ │ Oracle │ Enterprise, Standard - BYOL or License Included│ │
│ │ SQL Server │ Express, Web, Standard, Enterprise │ │
│ │ Aurora │ MySQL/PostgreSQL compatible, 5x faster │ │
│ └─────────────┴────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Multi-AZ vs Read Replicas
┌────────────────────────────────────────────────────────────────────────┐
│ Multi-AZ Deployment │
│ (High Availability) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Purpose: AUTOMATIC FAILOVER for disaster recovery (HA, not scaling) │
│ │
│ ┌─────────────────────┐ Synchronous ┌─────────────────────┐ │
│ │ Primary DB │◄─────────────►│ Standby DB │ │
│ │ (AZ-1a) │ Replication │ (AZ-1b) │ │
│ │ │ │ │ │
│ │ ✅ All reads │ │ ❌ No reads │ │
│ │ ✅ All writes │ │ ❌ No writes │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │ │ │
│ │ Automatic DNS failover │ │
│ │ (60-120 seconds) │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Application connects to: mydb.abc123.us-east-1.rds.aws │ │
│ │ (Single endpoint, AWS handles failover automatically) │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ When failover happens: │
│ • AZ outage or instance failure │
│ • Instance type change │
│ • Manual failover (for testing) │
│ • OS patching │
│ │
└────────────────────────────────────────────────────────────────────────┘
┌────────────────────────────────────────────────────────────────────────┐
│ Read Replicas │
│ (Read Scaling) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Purpose: SCALE READS by offloading queries │
│ │
│ ┌─────────────────────┐ Asynchronous ┌─────────────────────┐ │
│ │ Primary DB │─────────────►│ Read Replica 1 │ │
│ │ (writes + reads) │ │ (reads only) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │ │
│ │ Async ┌─────────────────────┐ │
│ └──────────────────────────►│ Read Replica 2 │ │
│ │ (reads only) │ │
│ └─────────────────────┘ │
│ │
│ Key Points: │
│ • Up to 5 replicas (15 for Aurora) │
│ • Can be cross-region (for DR or local reads) │
│ • Each replica has own endpoint │
│ • Replication is ASYNC (eventual consistency) -- reads from │
│ • replicas may be milliseconds to seconds behind the primary. │
│ • Never read from a replica immediately after a write and expect │
│ • to see the new data. This is the most common read-replica bug. │
│ • Can be promoted to standalone DB (for DR or migration) │
│ │
│ Application Pattern: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ def get_connection(is_read_only=False): │ │
│ │ if is_read_only: │ │
│ │ return connect("replica.abc123.us-east-1.rds.aws") │ │
│ │ return connect("primary.abc123.us-east-1.rds.aws") │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
RDS Backup and Recovery
import boto3
from datetime import datetime
rds = boto3.client('rds')
# Automated Backups (AWS Managed)
# - Daily during backup window
# - Transaction logs every 5 minutes
# - Retention: 0-35 days (0 disables)
# - Point-in-time recovery to any second
def restore_to_point_in_time(source_db: str,
target_db: str,
restore_time: datetime):
"""Restore RDS to a specific point in time."""
response = rds.restore_db_instance_to_point_in_time(
SourceDBInstanceIdentifier=source_db,
TargetDBInstanceIdentifier=target_db,
RestoreTime=restore_time,
UseLatestRestorableTime=False,
DBInstanceClass='db.t3.medium',
PubliclyAccessible=False,
MultiAZ=True,
)
return response
# Manual Snapshots
# - User-initiated
# - Kept until you delete them
# - Can share across accounts/regions
def create_manual_snapshot(db_identifier: str):
"""Create manual DB snapshot."""
snapshot_id = f"{db_identifier}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
response = rds.create_db_snapshot(
DBInstanceIdentifier=db_identifier,
DBSnapshotIdentifier=snapshot_id,
Tags=[
{'Key': 'Purpose', 'Value': 'ManualBackup'},
]
)
return snapshot_id
Aurora
AWS’s cloud-native relational database. MySQL and PostgreSQL compatible with 5x better performance.Aurora Architecture Deep Dive
┌────────────────────────────────────────────────────────────────────────┐
│ Aurora Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Aurora Cluster │ │
│ │ │ │
│ │ Writer Endpoint: mydb.cluster-abc123.us-east-1.rds.aws │ │
│ │ Reader Endpoint: mydb.cluster-ro-abc123.us-east-1.rds.aws │ │
│ │ │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ │ │
│ │ │ Writer Instance │ │ Reader Instance 1 │ │ │
│ │ │ (Primary) │ │ (Replica) │ │ │
│ │ │ (AZ-1a) │ │ (AZ-1b) │ │ │
│ │ └─────────┬─────────┘ └─────────┬─────────┘ │ │
│ │ │ │ │ │
│ │ └───────────┬───────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Shared Cluster Storage │ │ │
│ │ │ (Aurora Storage Engine) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │
│ │ │ │10GB │ │10GB │ │10GB │ │10GB │ │10GB │ │10GB │ │ │ │
│ │ │ │AZ-1a│ │AZ-1b│ │AZ-1c│ │AZ-1a│ │AZ-1b│ │AZ-1c│ │ │ │
│ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │
│ │ │ │ │ │
│ │ │ • 6 copies of data across 3 AZs │ │ │
│ │ │ • Can lose 2 copies and still write │ │ │
│ │ │ • Can lose 3 copies and still read │ │ │
│ │ │ • Auto-heals damaged segments │ │ │
│ │ │ • Storage auto-scales 10 GB → 128 TB │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ AURORA ADVANTAGES: │
│ • 5x throughput of MySQL, 3x of PostgreSQL │
│ • Up to 15 read replicas (vs 5 for RDS MySQL) │
│ • Failover in < 30 seconds │
│ • Continuous backup to S3 (no performance impact) │
│ • Point-in-time recovery to any second │
│ │
└────────────────────────────────────────────────────────────────────────┘
Aurora Serverless v2
┌────────────────────────────────────────────────────────────────────────┐
│ Aurora Serverless v2 │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Auto-scaling capacity based on demand │
│ │
│ Capacity (ACUs) │
│ 128 ┤ │
│ 100 ┤ ████████ │
│ 80 ┤ ██ ██ │
│ 60 ┤ ██ ██ │
│ 40 ┤ ██ ██ │
│ 20 ┤──────██────────────────────██──────────── │
│ └─────────────────────────────────────────► Time │
│ Peak usage period │
│ │
│ Configuration: │
│ • Min ACU: 0.5 (can scale to zero with Serverless v2) │
│ • Max ACU: 128 (choose based on peak needs) │
│ • Scales in seconds (not minutes like v1) │
│ │
│ Pricing (us-east-1): │
│ • $0.12/ACU-hour (Aurora MySQL) │
│ • Storage: $0.10/GB-month │
│ • I/O: $0.20 per million requests │
│ │
│ Best For: │
│ ✅ Variable/unpredictable workloads │
│ ✅ Development/test databases │
│ ✅ Multi-tenant SaaS applications │
│ ✅ Infrequently used applications │
│ │
└────────────────────────────────────────────────────────────────────────┘
DynamoDB
Fully managed NoSQL database with single-digit millisecond latency at any scale.DynamoDB Data Model
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Data Model │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ TABLE: Orders │
│ ────────────── │
│ │
│ Primary Key: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ PARTITION KEY (HASH) │ SORT KEY (RANGE) │ │
│ │ Required, determines │ Optional, enables │ │
│ │ data distribution │ range queries │ │
│ │ │ │ │
│ │ customer_id │ order_date │ │
│ │ "CUST#12345" │ "2024-01-15T10:30:00Z" │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Item (Document): │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ { │ │
│ │ "customer_id": "CUST#12345", // Partition key │ │
│ │ "order_date": "2024-01-15", // Sort key │ │
│ │ "order_id": "ORD#98765", // Attribute │ │
│ │ "total": 299.99, // Attribute │ │
│ │ "items": [ // Nested list │ │
│ │ {"sku": "ABC123", "qty": 2}, │ │
│ │ {"sku": "XYZ789", "qty": 1} │ │
│ │ ], │ │
│ │ "status": "SHIPPED" │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ KEY DESIGN PATTERNS: │
│ ──────────────────── │
│ 1. One-to-Many: Use composite sort key │
│ PK: USER#123, SK: ORDER#001, ORDER#002, PROFILE │
│ │
│ 2. Many-to-Many: Use GSI with inverted index │
│ PK: EMPLOYEE#1, SK: PROJECT#A │
│ GSI: PK: PROJECT#A, SK: EMPLOYEE#1 │
│ │
│ 3. Hierarchical: Use sort key prefixes │
│ SK: COUNTRY#USA#STATE#CA#CITY#LA │
│ │
└────────────────────────────────────────────────────────────────────────┘
DynamoDB Operations (Best Practices)
import boto3
from boto3.dynamodb.conditions import Key, Attr
from decimal import Decimal
import json
# Initialize
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')
# =====================================
# SINGLE ITEM OPERATIONS
# =====================================
def put_item_example():
"""Write an item (creates or replaces)."""
table.put_item(
Item={
'customer_id': 'CUST#12345',
'order_date': '2024-01-15T10:30:00Z',
'order_id': 'ORD#98765',
'total': Decimal('299.99'), # Use Decimal for numbers
'status': 'PENDING',
'items': [
{'sku': 'ABC123', 'qty': 2, 'price': Decimal('99.99')},
]
},
# Conditional write - only if doesn't exist
ConditionExpression='attribute_not_exists(order_id)'
)
def get_item_example():
"""Read a single item by primary key."""
response = table.get_item(
Key={
'customer_id': 'CUST#12345',
'order_date': '2024-01-15T10:30:00Z'
},
# Strongly consistent read (vs eventually consistent)
ConsistentRead=True,
# Only return specific attributes
ProjectionExpression='order_id, #s, total',
ExpressionAttributeNames={'#s': 'status'} # 'status' is reserved
)
return response.get('Item')
def update_item_example():
"""Update specific attributes (atomic)."""
response = table.update_item(
Key={
'customer_id': 'CUST#12345',
'order_date': '2024-01-15T10:30:00Z'
},
UpdateExpression='SET #s = :status, updated_at = :now ADD version :inc',
ExpressionAttributeNames={'#s': 'status'},
ExpressionAttributeValues={
':status': 'SHIPPED',
':now': '2024-01-16T14:00:00Z',
':inc': 1
},
# Optimistic locking
ConditionExpression='version = :expected_version',
ReturnValues='ALL_NEW'
)
return response['Attributes']
# =====================================
# QUERY (Efficient - uses partition key)
# =====================================
def query_customer_orders(customer_id: str, start_date: str = None):
"""Get all orders for a customer (uses partition key)."""
key_condition = Key('customer_id').eq(customer_id)
if start_date:
key_condition = key_condition & Key('order_date').gte(start_date)
response = table.query(
KeyConditionExpression=key_condition,
ScanIndexForward=False, # Descending order (newest first)
Limit=20,
# Filter after query (use sparingly - costs RCUs)
FilterExpression=Attr('status').ne('CANCELLED')
)
return response['Items']
# =====================================
# BATCH OPERATIONS
# =====================================
def batch_write_items(items: list):
"""Write up to 25 items in one request."""
with table.batch_writer() as batch:
for item in items:
batch.put_item(Item=item)
# boto3 handles batching, retries, and unprocessed items
def batch_get_items(keys: list):
"""Read up to 100 items in one request."""
response = dynamodb.batch_get_item(
RequestItems={
'Orders': {
'Keys': keys,
'ProjectionExpression': 'order_id, total, status'
}
}
)
return response['Responses']['Orders']
# =====================================
# TRANSACTIONS (ACID)
# =====================================
def transfer_with_transaction():
"""Atomic multi-item transaction."""
dynamodb_client = boto3.client('dynamodb')
dynamodb_client.transact_write_items(
TransactItems=[
{
'Update': {
'TableName': 'Accounts',
'Key': {'account_id': {'S': 'ACC#001'}},
'UpdateExpression': 'SET balance = balance - :amount',
'ConditionExpression': 'balance >= :amount',
'ExpressionAttributeValues': {':amount': {'N': '100'}}
}
},
{
'Update': {
'TableName': 'Accounts',
'Key': {'account_id': {'S': 'ACC#002'}},
'UpdateExpression': 'SET balance = balance + :amount',
'ExpressionAttributeValues': {':amount': {'N': '100'}}
}
}
]
)
Global Secondary Indexes (GSI)
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Indexes │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ BASE TABLE: Orders │
│ PK: customer_id, SK: order_date │
│ │
│ Access Pattern: "Get orders by customer" ✅ Query on PK │
│ │
│ NEW Access Pattern: "Get orders by status" │
│ ❌ Can't query - status is not a key! │
│ ✅ Solution: Create GSI │
│ │
│ GLOBAL SECONDARY INDEX (GSI): │
│ ───────────────────────────── │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ GSI: orders-by-status │ │
│ │ PK: status SK: order_date │ │
│ │ │ │
│ │ Projected Attributes: order_id, customer_id, total │ │
│ │ (Can project ALL, KEYS_ONLY, or specific attributes) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ # Query GSI │
│ table.query( │
│ IndexName='orders-by-status', │
│ KeyConditionExpression=Key('status').eq('PENDING') │
│ ) │
│ │
│ GSI vs LSI: │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ GSI (Global Secondary) │ LSI (Local Secondary) │ │
│ │ ────────────────────────────┼───────────────────────────────── │ │
│ │ Different partition key │ Same partition key │ │
│ │ Create anytime │ Create at table creation only │ │
│ │ Own RCU/WCU │ Shares table RCU/WCU │ │
│ │ Eventually consistent only │ Strongly consistent available │ │
│ │ Up to 20 per table │ Up to 5 per table │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
DynamoDB Capacity and Pricing
┌────────────────────────────────────────────────────────────────────────┐
│ DynamoDB Capacity Modes │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ON-DEMAND MODE: │
│ ─────────────── │
│ • Pay per request ($1.25 per million writes, $0.25 per million reads)│
│ • No capacity planning │
│ • Scales instantly to any traffic │
│ • Best for: Unpredictable traffic, new applications │
│ │
│ PROVISIONED MODE: │
│ ───────────────── │
│ • Pre-allocate RCU (Read Capacity Units) and WCU (Write Capacity) │
│ • Cheaper at scale (~5-7x cheaper at steady state) │
│ • Auto Scaling available │
│ • Reserved Capacity for 1-3 years (up to 70% discount) │
│ │
│ CAPACITY UNITS: │
│ ─────────────── │
│ 1 RCU = 1 strongly consistent read/sec (up to 4 KB) │
│ = 2 eventually consistent reads/sec (up to 4 KB) │
│ 1 WCU = 1 write/sec (up to 1 KB) │
│ │
│ Example Calculation: │
│ ──────────────────── │
│ 100 reads/sec × 8 KB items × strongly consistent │
│ = 100 × (8 KB / 4 KB) × 1 = 200 RCU │
│ │
│ 50 writes/sec × 3 KB items │
│ = 50 × ceil(3 KB / 1 KB) = 150 WCU │
│ │
│ COST COMPARISON (us-east-1): │
│ ──────────────────────────── │
│ On-Demand: $0.25/million reads, $1.25/million writes │
│ Provisioned: $0.00013/RCU-hr, $0.00065/WCU-hr │
│ │
│ Break-even: ~65,000 reads/hr or ~20,000 writes/hr │
│ Below this → On-Demand cheaper │
│ Above this → Provisioned cheaper │
│ │
└────────────────────────────────────────────────────────────────────────┘
ElastiCache
Managed in-memory caching for sub-millisecond response times: Redis or Memcached.ElastiCache Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ ElastiCache Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ APPLICATION CACHING PATTERN: │
│ ──────────────────────────── │
│ │
│ ┌────────────┐ │
│ │ Client │ │
│ └──────┬─────┘ │
│ │ 1. Request │
│ ▼ │
│ ┌────────────┐ 2. Check cache ┌─────────────────────┐ │
│ │ Application│ ────────────────────► │ ElastiCache │ │
│ │ Server │ │ (Redis/Memcached) │ │
│ └──────┬─────┘ ◄──────────────────── └─────────────────────┘ │
│ │ 3a. Cache HIT │ │
│ │ (return) 3b. Cache MISS │
│ │ │ │
│ │ 4. Query if miss ┌──────────┘ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Database (RDS/DynamoDB) │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ │ 5. Return data │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Application: Store in cache with TTL, return to client │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Redis vs Memcached Comparison
| Feature | Redis | Memcached |
|---|---|---|
| Data Structures | Strings, Lists, Sets, Sorted Sets, Hashes, Streams | Simple key-value only |
| Persistence | Yes (RDB, AOF) | No |
| Replication | Yes (Multi-AZ) | No |
| Pub/Sub | Yes | No |
| Lua Scripting | Yes | No |
| Cluster Mode | Yes (sharding) | Yes (sharding) |
| Multi-threading | Single-threaded (per shard) | Multi-threaded |
| Use Case | Sessions, leaderboards, queues | Simple caching, high throughput |
Caching Strategies
import redis
import json
from datetime import timedelta
# Connect to ElastiCache Redis
redis_client = redis.Redis(
host='my-cluster.abc123.cache.amazonaws.com',
port=6379,
ssl=True,
decode_responses=True
)
# ======================
# CACHE-ASIDE (Lazy Loading)
# ======================
def get_user_with_cache_aside(user_id: str) -> dict:
"""
Cache-Aside: Application manages cache
Pros: Only requested data cached, cache failures don't break app
Cons: Cache miss = slow, data can be stale
"""
cache_key = f"user:{user_id}"
# 1. Try cache first
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# 2. Cache miss - query database
user = db.query_user(user_id) # Your DB call
# 3. Store in cache with TTL
redis_client.setex(
cache_key,
timedelta(hours=1), # TTL
json.dumps(user)
)
return user
# ======================
# WRITE-THROUGH
# ======================
def update_user_write_through(user_id: str, data: dict) -> dict:
"""
Write-Through: Write to cache AND database together
Pros: Cache always consistent, never stale
Cons: Write latency (two writes), cache churn for unused data
"""
cache_key = f"user:{user_id}"
# 1. Write to database
user = db.update_user(user_id, data)
# 2. Write to cache
redis_client.setex(
cache_key,
timedelta(hours=1),
json.dumps(user)
)
return user
# ======================
# WRITE-BEHIND (Write-Back)
# ======================
def update_user_write_behind(user_id: str, data: dict) -> dict:
"""
Write-Behind: Write to cache, async write to database
Pros: Fast writes, reduced database load
Cons: Data loss risk, complex implementation
"""
cache_key = f"user:{user_id}"
# 1. Write to cache immediately
user = {**get_user(user_id), **data}
redis_client.setex(cache_key, timedelta(hours=1), json.dumps(user))
# 2. Queue database write (async)
write_queue.send({'user_id': user_id, 'data': data})
return user
# ======================
# CACHE INVALIDATION
# ======================
def invalidate_user_cache(user_id: str):
"""Delete cache entry when data changes."""
redis_client.delete(f"user:{user_id}")
def invalidate_pattern(pattern: str):
"""Delete all keys matching pattern (use with caution)."""
cursor = 0
while True:
cursor, keys = redis_client.scan(cursor, match=pattern, count=100)
if keys:
redis_client.delete(*keys)
if cursor == 0:
break
🎯 Interview Questions
Q1: S3 vs EBS vs EFS - when to use each?
Q1: S3 vs EBS vs EFS - when to use each?
S3 (Object Storage):
- Static files, backups, data lakes
- HTTP access from anywhere
- Unlimited storage, 11 9s durability
- EC2 boot volumes, databases
- Single EC2, single AZ
- Low latency (sub-ms), resizable
- Shared file systems across EC2
- Multi-AZ, POSIX compliant
- Auto-scaling, Linux only
- Need HTTP access globally → S3
- Need database storage → EBS
- Need shared NFS mount → EFS
Q2: How would you design DynamoDB for a social media app?
Q2: How would you design DynamoDB for a social media app?
Access Patterns:Key Points:
- Get user profile
- Get user’s posts (sorted by date)
- Get post’s comments
- Get user’s followers
- Get posts by hashtag
PK: USER#123, SK: PROFILE → User profile
PK: USER#123, SK: POST#2024... → User's posts
PK: POST#456, SK: COMMENT#... → Post comments
PK: USER#123, SK: FOLLOWS#789 → Followers
GSI1: hashtag, created_at → Posts by hashtag
GSI2: post_id, created_at → Comments on post
- Denormalize for read performance
- Use composite sort keys
- Create GSIs for access patterns
- Use sparse indexes where appropriate
Q3: RDS Multi-AZ vs Read Replicas - what's the difference?
Q3: RDS Multi-AZ vs Read Replicas - what's the difference?
Multi-AZ (High Availability):
- Purpose: Disaster recovery
- Synchronous replication
- Automatic failover (60-120s)
- Standby NOT readable
- Same region only
- Purpose: Performance/offload reads
- Asynchronous replication
- Manual promotion (becomes primary)
- Replicas ARE readable
- Can be cross-region
- Multi-AZ for production HA
- Read replicas for read scaling
Q4: How do you handle cache invalidation?
Q4: How do you handle cache invalidation?
Strategies:
-
TTL-Based (Simplest)
- Set expiration on cache entries
- Accept eventual consistency
-
Event-Driven
- Publish events on data changes
- Consumers invalidate cache
-
Write-Through
- Update cache on every write
- Never stale, but slower writes
-
Cache-Aside with Versioning
- Include version in cache key
- Bump version on update
def update_product(product_id, data):
# Update database
db.update(product_id, data)
# Invalidate cache
cache.delete(f"product:{product_id}")
# Or publish event
sns.publish(topic, {"product_id": product_id})
Q5: Design a cost-effective storage strategy for 100TB data
Q5: Design a cost-effective storage strategy for 100TB data
Strategy: Tiered Storage with LifecycleCost Estimate (100 TB, us-east-1):
- Hot Data (Recent 30 days): S3 Standard
- Warm Data (30-90 days): S3 Standard-IA
- Cold Data (90+ days): S3 Glacier
- Archive (1+ year): S3 Glacier Deep Archive
{
"Rules": [{
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555}
}]
}
- All Standard: $2,300/month
- With lifecycle: ~$200/month (91% savings!)
🧪 Hands-On Lab: Build a Caching Layer
Objective: Add Redis caching to reduce database load by 80%Storage Comparison Summary
| Service | Type | Durability | Latency | Best For |
|---|---|---|---|---|
| S3 | Object | 11 9s | ~100ms | Files, backups, static sites |
| EBS | Block | 99.999% | sub-ms | EC2 volumes, databases |
| EFS | File | 11 9s | ~ms | Shared file systems |
| DynamoDB | NoSQL | 11 9s | sub-10ms | Key-value, high scale |
| RDS/Aurora | SQL | 99.95% | ~ms | Relational, complex queries |
| ElastiCache | In-memory | N/A | sub-ms | Caching, sessions |
Next Module
Networking
Master VPC, subnets, security groups, and load balancers