Module Overview
Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Core Concepts
- EC2 instance types, AMIs, and advanced configurations
- Lambda functions for serverless computing
- Container orchestration with ECS and EKS
- Auto Scaling strategies for elasticity
- Cost optimization techniques for compute
Compute Service Selection Guide
Choose the right compute service for your workload:Copy
┌──────────────────────────────────────────────────────────────────────┐
│ AWS Compute Decision Tree │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ What type of workload? │
│ │ │
│ ├─── Short-lived, event-driven ──────► Lambda (Serverless) │
│ │ (< 15 min, stateless) │
│ │ │
│ ├─── Containers needed ─────┬────────► ECS (AWS Native) │
│ │ │ │
│ │ └────────► EKS (Kubernetes) │
│ │ │
│ ├─── Full control needed ────────────► EC2 (Virtual Servers) │
│ │ (OS, networking, GPUs) │
│ │ │
│ └─── Simple web app ─────────────────► Elastic Beanstalk │
│ (PaaS) or App Runner │
│ │
│ Control vs Simplicity Spectrum: │
│ ────────────────────────────── │
│ More Control ◄────────────────────────────────────► Less Management │
│ EC2 │ ECS/EKS │ Fargate │ Lambda │ App Runner │
│ │
└──────────────────────────────────────────────────────────────────────┘
EC2 (Elastic Compute Cloud)
Virtual servers in the cloud. The most fundamental and flexible AWS compute service.Instance Type Deep Dive
AWS offers 500+ instance types optimized for different workloads:Copy
┌──────────────────────────────────────────────────────────────────────┐
│ EC2 Instance Families │
├──────────────────────────────────────────────────────────────────────┤
│ │
│ GENERAL PURPOSE (M, T) │
│ ────────────────────── │
│ M-series: Balanced compute, memory, networking │
│ • m5.large 2 vCPU, 8 GB → Web servers, small databases │
│ • m5.xlarge 4 vCPU, 16 GB → Application servers │
│ • m5.4xlarge 16 vCPU, 64 GB → Medium workloads │
│ • m7g.* Graviton3 (ARM) → 40% better price/performance │
│ │
│ T-series: Burstable performance (for variable workloads) │
│ • t3.micro 2 vCPU, 1 GB → Free tier, dev/test │
│ • t3.medium 2 vCPU, 4 GB → Light production │
│ • t3.xlarge 4 vCPU, 16 GB → Moderate workloads │
│ │
│ COMPUTE OPTIMIZED (C) │
│ ───────────────────── │
│ High CPU-to-memory ratio for compute-intensive tasks │
│ • c5.large 2 vCPU, 4 GB → Batch processing │
│ • c5.4xlarge 16 vCPU, 32 GB → Scientific computing │
│ • c7g.* Graviton3 → Best compute price/perf │
│ │
│ MEMORY OPTIMIZED (R, X) │
│ ─────────────────────── │
│ High memory-to-CPU ratio for in-memory workloads │
│ • r5.large 2 vCPU, 16 GB → Caching, in-memory DB │
│ • r5.4xlarge 16 vCPU,128 GB → SAP HANA, Redis │
│ • x1e.xlarge 4 vCPU,122 GB → Extreme memory │
│ │
│ STORAGE OPTIMIZED (I, D) │
│ ──────────────────────── │
│ High sequential read/write access to large datasets │
│ • i3.large 2 vCPU, 15 GB, 475 GB NVMe → Databases │
│ • d2.xlarge 4 vCPU, 31 GB, 6 TB HDD → Data warehousing │
│ │
│ ACCELERATED COMPUTING (P, G, Inf) │
│ ───────────────────────────────── │
│ GPU and custom hardware for ML/graphics │
│ • p4d.24xlarge 8x A100 GPUs → ML training │
│ • g4dn.xlarge 1x T4 GPU → ML inference, graphics │
│ • inf1.xlarge 4x Inferentia→ Cost-effective inference │
│ │
└──────────────────────────────────────────────────────────────────────┘
Instance Naming Convention
Copy
# Decoding instance type names
def decode_instance_type(instance_type: str):
"""
Example: m5dn.2xlarge
m = Instance family (General Purpose)
5 = Generation (5th gen, higher = newer)
d = Additional capability (NVMe SSD)
n = Network optimized
2xlarge = Size (vCPUs and memory)
Size progression:
nano → micro → small → medium → large → xlarge → 2xlarge → ... → metal
"""
families = {
'm': 'General Purpose',
't': 'Burstable',
'c': 'Compute Optimized',
'r': 'Memory Optimized',
'x': 'Memory Optimized (Extreme)',
'i': 'Storage Optimized (NVMe)',
'd': 'Storage Optimized (Dense)',
'p': 'GPU (Training)',
'g': 'GPU (Graphics/Inference)',
}
modifiers = {
'a': 'AMD processor',
'g': 'AWS Graviton (ARM)',
'd': 'NVMe SSD storage',
'n': 'Network optimized',
'e': 'Extended memory',
'z': 'High frequency',
}
return families.get(instance_type[0], 'Unknown')
Pro Tip: Use Graviton (ARM) instances (m7g, c7g, r7g) for 40% better price/performance on compatible workloads. Most applications work without modification.
T-Series Burstable Instances
T-series instances use CPU credits for burstable performance:Copy
┌────────────────────────────────────────────────────────────────────┐
│ T3 CPU Credit System │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Baseline Performance: │
│ • t3.micro: 10% CPU baseline (earns 6 credits/hour) │
│ • t3.small: 20% CPU baseline (earns 12 credits/hour) │
│ • t3.medium: 20% CPU baseline (earns 24 credits/hour) │
│ • t3.large: 30% CPU baseline (earns 36 credits/hour) │
│ │
│ Credit Usage: │
│ • 1 credit = 1 vCPU at 100% for 1 minute │
│ • Below baseline: Earn credits │
│ • Above baseline: Spend credits │
│ • Credits expire after 24 hours │
│ • Max credit balance: varies by instance size │
│ │
│ CPU Usage Graph: │
│ 100% ┤ ████ │
│ 80% ┤ █ █ Burst period │
│ 60% ┤ █ █ (spending credits) │
│ 40% ┤ █ █ │
│ 20% ┤──────█──────────█────── ← Baseline │
│ 0% ┤ │
│ └─────────────────────────► Time │
│ │
│ Modes: │
│ • Standard: Can burst only with credits │
│ • Unlimited: Can burst beyond credits (pay extra) │
│ │
└────────────────────────────────────────────────────────────────────┘
AMI (Amazon Machine Image)
AMIs are templates containing OS, application server, and applications.Copy
# AMI Selection Best Practices
ami_best_practices = {
"use_aws_provided": [
"Amazon Linux 2023", # AWS optimized, free
"Ubuntu 22.04 LTS", # Popular, well-supported
"Windows Server 2022", # For .NET workloads
],
"create_custom_ami_when": [
"Need pre-installed software",
"Custom security hardening",
"Faster instance boot time",
"Consistent deployments",
],
"golden_ami_pipeline": """
Base AMI → Install packages → Configure → Test → Create AMI → Share
Automate with:
- EC2 Image Builder (AWS native)
- Packer (HashiCorp)
""",
}
# Launch EC2 with specific AMI
import boto3
ec2 = boto3.client('ec2')
response = ec2.run_instances(
ImageId='ami-0c55b159cbfafe1f0', # Amazon Linux 2023
InstanceType='t3.medium',
MinCount=1,
MaxCount=1,
KeyName='my-key-pair',
SecurityGroupIds=['sg-0123456789abcdef0'],
SubnetId='subnet-0123456789abcdef0',
TagSpecifications=[
{
'ResourceType': 'instance',
'Tags': [
{'Key': 'Name', 'Value': 'WebServer'},
{'Key': 'Environment', 'Value': 'Production'},
]
}
],
UserData='''#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from $(hostname)</h1>" > /var/www/html/index.html
'''
)
EC2 Instance Metadata Service (IMDS)
Access instance info from within the instance:Copy
# IMDSv2 (recommended - more secure)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
# Get instance metadata
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/public-ipv4
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/MyRole
Copy
# Python - Using requests with IMDSv2
import requests
def get_instance_metadata(path: str) -> str:
"""Get EC2 instance metadata using IMDSv2."""
# Get token
token_response = requests.put(
"http://169.254.169.254/latest/api/token",
headers={"X-aws-ec2-metadata-token-ttl-seconds": "21600"}
)
token = token_response.text
# Get metadata
response = requests.get(
f"http://169.254.169.254/latest/meta-data/{path}",
headers={"X-aws-ec2-metadata-token": token}
)
return response.text
# Usage
instance_id = get_instance_metadata("instance-id")
public_ip = get_instance_metadata("public-ipv4")
az = get_instance_metadata("placement/availability-zone")
Lambda (Serverless Functions)
Run code without provisioning servers. Pay only for compute time used.Lambda Architecture
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Lambda Execution Model │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Request arrives │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Lambda Service │ │
│ │ │ │
│ │ Is there a warm container? │ │
│ │ │ │ │
│ │ ├── YES ──► Reuse container (warm start: ~1ms) │ │
│ │ │ │ │
│ │ └── NO ───► Create container (cold start: 100ms-10s) │ │
│ │ │ │ │
│ │ ├── Download code from S3 │ │
│ │ ├── Start runtime (Python, Node, etc.) │ │
│ │ ├── Run initialization code │ │
│ │ └── Execute handler │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Cold Start Factors: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Runtime │ Cold Start │ Notes │ │
│ │─────────────────────────────────────────────────────────────────│ │
│ │ Python │ 100-300ms │ Fast, great for most use cases │ │
│ │ Node.js │ 100-300ms │ Fast, good for APIs │ │
│ │ Go │ 50-100ms │ Fastest cold starts │ │
│ │ Java │ 3-10s │ Slow, use SnapStart or GraalVM │ │
│ │ .NET │ 2-5s │ Slower, consider ReadyToRun │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Mitigation Strategies: │
│ • Provisioned Concurrency (pre-warm containers) │
│ • Smaller deployment packages │
│ • Move initialization outside handler │
│ • Use SnapStart for Java │
│ │
└────────────────────────────────────────────────────────────────────────┘
Lambda Function Best Practices
Copy
import json
import boto3
import os
from datetime import datetime
# ✅ Initialize outside handler (runs once per container)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])
def lambda_handler(event, context):
"""
Best Practices:
1. Keep handlers small and focused
2. Initialize SDK clients outside handler
3. Use environment variables for configuration
4. Handle errors gracefully
5. Log structured data for observability
"""
# Log incoming event (structured logging)
print(json.dumps({
'level': 'INFO',
'message': 'Processing request',
'request_id': context.aws_request_id,
'event_type': event.get('httpMethod', 'unknown'),
'timestamp': datetime.utcnow().isoformat()
}))
try:
# Parse input
if 'body' in event and event['body']:
body = json.loads(event['body'])
else:
body = event
# Process request
user_id = body.get('user_id')
if not user_id:
return response(400, {'error': 'user_id is required'})
# Database operation
result = table.get_item(Key={'user_id': user_id})
if 'Item' not in result:
return response(404, {'error': 'User not found'})
return response(200, result['Item'])
except json.JSONDecodeError:
return response(400, {'error': 'Invalid JSON'})
except Exception as e:
# Log error for debugging
print(json.dumps({
'level': 'ERROR',
'message': str(e),
'request_id': context.aws_request_id
}))
return response(500, {'error': 'Internal server error'})
def response(status_code: int, body: dict) -> dict:
"""Create API Gateway compatible response."""
return {
'statusCode': status_code,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps(body)
}
Lambda Limits and Quotas
| Resource | Limit | Notes |
|---|---|---|
| Timeout | 15 minutes | Use Step Functions for longer workflows |
| Memory | 128 MB - 10 GB | More memory = more CPU proportionally |
| Package Size | 50 MB (zip), 250 MB (unzipped) | Use layers for dependencies |
| Concurrent Executions | 1,000 (default) | Can request increase |
| Payload Size | 6 MB (sync), 256 KB (async) | Use S3 for larger payloads |
| Ephemeral Storage | 512 MB - 10 GB (/tmp) | For temporary files |
Lambda with Container Images
Copy
# Dockerfile for Lambda container
FROM public.ecr.aws/lambda/python:3.11
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
# Set the handler
CMD [ "app.lambda_handler" ]
Copy
# Build and push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_URI
docker build -t my-lambda .
docker tag my-lambda:latest $ECR_URI/my-lambda:latest
docker push $ECR_URI/my-lambda:latest
ECS (Elastic Container Service)
AWS-native container orchestration for Docker containers.ECS Architecture Deep Dive
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ ECS Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ECS Cluster │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Service │ │ │
│ │ │ Desired Count: 3 Running: 3 Pending: 0 │ │ │
│ │ │ │ │ │
│ │ │ Task Definition: my-app:5 │ │ │
│ │ │ • Container: nginx (256 CPU, 512 MB) │ │ │
│ │ │ • Container: app (512 CPU, 1024 MB) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────────┐ │ │ │
│ │ │ │ Load Balancer │ │ │ │
│ │ │ │ (ALB with Target Group) │ │ │ │
│ │ │ └─────────────────┬────────────────────────────────┘ │ │ │
│ │ │ ┌───────────┼───────────┐ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ ▼ ▼ ▼ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Task 1 │ │ Task 2 │ │ Task 3 │ │ │ │
│ │ │ │ (AZ-1a) │ │ (AZ-1b) │ │ (AZ-1c) │ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Launch Types: │ │
│ │ ┌────────────────────────┐ ┌────────────────────────┐ │ │
│ │ │ EC2 │ │ Fargate │ │ │
│ │ │ ──────────────────── │ │ ──────────────────── │ │ │
│ │ │ • You manage EC2 │ │ • Serverless │ │ │
│ │ │ • More control │ │ • No EC2 management │ │ │
│ │ │ • Use Reserved/Spot │ │ • Pay per task │ │ │
│ │ │ • GPU workloads │ │ • Faster scaling │ │ │
│ │ └────────────────────────┘ └────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Task Definition Example
Copy
{
"family": "my-web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "web",
"image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"essential": true,
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"environment": [
{"name": "NODE_ENV", "value": "production"}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
]
}
Auto Scaling
Automatically adjust compute capacity to match demand.Auto Scaling Strategies
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Auto Scaling Strategies │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. TARGET TRACKING (Recommended) │
│ ────────────────────────────── │
│ "Keep CPU at 50%" │
│ │
│ CPU Usage │
│ 80% ┤ ████ │
│ 60% ┤ █ █ │
│ 50% ┼─────────────────█──────█──────── Target │
│ 40% ┤ █ █ │
│ 20% ┤███████████████ ████ │
│ └────────────────────────────────► Time │
│ │
│ Automatically adds/removes instances to maintain target │
│ │
│ 2. STEP SCALING │
│ ─────────────── │
│ "If CPU > 80%, add 3. If CPU > 60%, add 1." │
│ │
│ Alarm Threshold │ Scaling Action │
│ ────────────────┼──────────────── │
│ CPU < 30% │ Remove 2 instances │
│ CPU 30-50% │ Do nothing │
│ CPU 50-70% │ Add 1 instance │
│ CPU 70-90% │ Add 2 instances │
│ CPU > 90% │ Add 4 instances │
│ │
│ 3. SCHEDULED SCALING │
│ ──────────────────── │
│ "Scale to 10 instances at 9 AM, scale to 3 at 6 PM" │
│ │
│ Instances │
│ 10 ┤ ██████████████████ │
│ 8 ┤ █ █ │
│ 6 ┤ █ █ │
│ 3 ┼███ ███████ │
│ └──────────────────────────────────► Time │
│ 6AM 9AM 5PM 8PM │
│ │
│ 4. PREDICTIVE SCALING │
│ ───────────────────── │
│ Uses ML to predict demand and scale proactively │
│ Best for cyclical patterns (daily, weekly) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Auto Scaling Configuration (Terraform)
Copy
# Auto Scaling Group
resource "aws_autoscaling_group" "web" {
name = "web-asg"
vpc_zone_identifier = var.private_subnet_ids
target_group_arns = [aws_lb_target_group.web.arn]
health_check_type = "ELB"
min_size = 2
max_size = 10
desired_capacity = 3
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
tag {
key = "Name"
value = "web-server"
propagate_at_launch = true
}
}
# Target Tracking Policy
resource "aws_autoscaling_policy" "cpu" {
name = "cpu-target-tracking"
autoscaling_group_name = aws_autoscaling_group.web.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 50.0
}
}
# Scale on Request Count
resource "aws_autoscaling_policy" "requests" {
name = "request-target-tracking"
autoscaling_group_name = aws_autoscaling_group.web.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ALBRequestCountPerTarget"
resource_label = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.web.arn_suffix}"
}
target_value = 1000.0 # 1000 requests per target
}
}
Cost Optimization
Compute Cost Strategies
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Compute Cost Optimization │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. RIGHT-SIZING (15-30% savings) │
│ ───────────────────────────────── │
│ • Use AWS Compute Optimizer recommendations │
│ • Monitor CloudWatch metrics (CPU, memory) │
│ • Downsize underutilized instances │
│ │
│ Example: │
│ m5.xlarge (15% CPU avg) → m5.large = 50% cost reduction │
│ │
│ 2. RESERVED + SAVINGS PLANS (30-72% savings) │
│ ───────────────────────────────────────────── │
│ • Reserved Instances for steady-state │
│ • Savings Plans for flexibility │
│ • Cover 60-80% of baseline with commitments │
│ │
│ 3. SPOT INSTANCES (60-90% savings) │
│ ───────────────────────────────── │
│ Use for: │
│ ✅ CI/CD workers │
│ ✅ Batch processing │
│ ✅ Dev/test environments │
│ ✅ Stateless web servers (behind ASG) │
│ │
│ 4. GRAVITON INSTANCES (40% better price/perf) │
│ ──────────────────────────────────────────── │
│ • m7g, c7g, r7g instance families │
│ • Most applications work without changes │
│ │
│ 5. LAMBDA OPTIMIZATION │
│ ────────────────────── │
│ • Right-size memory (affects CPU) │
│ • Use Graviton (arm64) for 34% lower cost │
│ • Minimize cold starts │
│ │
│ COMBINED STRATEGY EXAMPLE: │
│ ────────────────────────── │
│ Original: 10x m5.xlarge On-Demand = $1,382/month │
│ │
│ Optimized: │
│ • 4x m7g.large Reserved (60% base) = $230/month │
│ • 4x m7g.large Spot (30% variable) = $69/month │
│ • 2x m7g.large On-Demand (10% buffer) = $130/month │
│ Total: $429/month (69% savings!) │
│ │
└────────────────────────────────────────────────────────────────────────┘
🎯 Interview Questions
Q1: When would you choose Lambda over EC2?
Q1: When would you choose Lambda over EC2?
Lambda is better when:
- Event-driven, short-running tasks (< 15 min)
- Unpredictable or spiky traffic
- You want zero server management
- Cost matters more than consistent latency
- Long-running processes
- Need specific OS/hardware (GPUs)
- Consistent, predictable traffic
- Cost optimization with Reserved Instances
- Need persistent connections (WebSockets)
Q2: How do you reduce Lambda cold starts?
Q2: How do you reduce Lambda cold starts?
Strategies:
- Provisioned Concurrency - Pre-warm containers
- Smaller packages - Reduce initialization time
- Initialize outside handler - SDK clients, DB connections
- Use lighter runtimes - Go, Python, Node.js
- SnapStart for Java - Checkpoint/restore
- Keep functions warm - Scheduled pings (not ideal)
Copy
# Initialize OUTSIDE handler
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE'])
def handler(event, context):
# Only process logic here
return table.get_item(...)
Q3: ECS vs EKS - when to use each?
Q3: ECS vs EKS - when to use each?
Choose ECS when:
- AWS-native, simpler setup
- Smaller team without K8s expertise
- Tighter AWS integration needed
- Lower operational overhead
- Multi-cloud strategy
- Team has Kubernetes expertise
- Need K8s ecosystem (Helm, operators)
- Complex microservices architectures
- Portability is important
Q4: Design an auto-scaling strategy for an e-commerce site
Q4: Design an auto-scaling strategy for an e-commerce site
Multi-layered approach:
- Predictive Scaling: Scale up before known peaks (Black Friday)
- Target Tracking: Maintain 50% CPU average
- Step Scaling: Add extra capacity for sudden spikes
- Minimum: 4 instances (2 per AZ)
- Desired: 6 instances (normal load)
- Maximum: 50 instances (peak capacity)
- Target tracking: CPU at 50%
- Step: +4 if CPU > 80% for 2 min
- Scheduled: Scale to 20 at 8 AM, 10 at 10 PM
- Predictive: Enable for daily patterns
- Scale-out: 60 seconds
- Scale-in: 300 seconds (avoid thrashing)
Q5: How do you optimize EC2 costs?
Q5: How do you optimize EC2 costs?
Framework (in order of impact):
- Right-size (15-30% savings)
- Use Compute Optimizer
- Monitor actual utilization
- Purchase options (30-72% savings)
- Reserved for baseline (60-70% of capacity)
- Spot for stateless/batch
- Savings Plans for flexibility
- Instance selection (20-40% savings)
- Graviton (ARM) for compatible workloads
- Latest generation (m7 vs m5)
- Shutdown automation
- Stop dev/test outside hours
- Auto-scaling to zero when possible
- Regular review
- Monthly cost reviews
- Tag-based cost allocation
🧪 Hands-On Lab: Deploy Scalable Web App
Objective: Deploy a Node.js application with Auto Scaling and Load Balancing1
Create Launch Template
Configure EC2 instance with user data for automatic setup
2
Create Auto Scaling Group
Set min=2, max=6, desired=3 across 2 AZs
3
Create Application Load Balancer
Configure health checks and target group
4
Configure Scaling Policies
Add target tracking (CPU 50%) and step scaling
5
Test Scaling
Generate load with
ab or hey and watch scaling in actionNext Module
Storage & Databases
Master S3, EBS, RDS, DynamoDB, and ElastiCache