Skip to main content
EC2 Instance Families

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Core Concepts
This module covers all AWS compute services in depth. You’ll learn when to use each service, how to optimize for cost and performance, and real-world architecture patterns. What You’ll Learn:
  • EC2 instance types, AMIs, and advanced configurations
  • Lambda functions for serverless computing
  • Container orchestration with ECS and EKS
  • Auto Scaling strategies for elasticity
  • Cost optimization techniques for compute

Compute Service Selection Guide

Choose the right compute service for your workload:
┌──────────────────────────────────────────────────────────────────────┐
│                   AWS Compute Decision Tree                          │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   What type of workload?                                              │
│         │                                                             │
│         ├─── Short-lived, event-driven ──────► Lambda (Serverless)   │
│         │    (< 15 min, stateless)                                   │
│         │                                                             │
│         ├─── Containers needed ─────┬────────► ECS (AWS Native)      │
│         │                           │                                 │
│         │                           └────────► EKS (Kubernetes)      │
│         │                                                             │
│         ├─── Full control needed ────────────► EC2 (Virtual Servers) │
│         │    (OS, networking, GPUs)                                  │
│         │                                                             │
│         └─── Simple web app ─────────────────► Elastic Beanstalk     │
│              (PaaS)                               or App Runner       │
│                                                                       │
│   Control vs Simplicity Spectrum:                                     │
│   ──────────────────────────────                                      │
│   More Control ◄────────────────────────────────────► Less Management │
│   EC2  │  ECS/EKS  │  Fargate  │  Lambda  │  App Runner              │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

EC2 (Elastic Compute Cloud)

Virtual servers in the cloud. The most fundamental and flexible AWS compute service.

Instance Type Deep Dive

AWS offers 500+ instance types optimized for different workloads:
┌──────────────────────────────────────────────────────────────────────┐
│                    EC2 Instance Families                              │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│   GENERAL PURPOSE (M, T)                                              │
│   ──────────────────────                                              │
│   M-series: Balanced compute, memory, networking                      │
│   • m5.large    2 vCPU,  8 GB  → Web servers, small databases        │
│   • m5.xlarge   4 vCPU, 16 GB  → Application servers                 │
│   • m5.4xlarge 16 vCPU, 64 GB  → Medium workloads                    │
│   • m7g.*      Graviton3 (ARM) → 40% better price/performance        │
│                                                                       │
│   T-series: Burstable performance (for variable workloads)           │
│   • t3.micro   2 vCPU,  1 GB   → Free tier, dev/test                 │
│   • t3.medium  2 vCPU,  4 GB   → Light production                    │
│   • t3.xlarge  4 vCPU, 16 GB   → Moderate workloads                  │
│                                                                       │
│   COMPUTE OPTIMIZED (C)                                               │
│   ─────────────────────                                               │
│   High CPU-to-memory ratio for compute-intensive tasks               │
│   • c5.large    2 vCPU,  4 GB  → Batch processing                    │
│   • c5.4xlarge 16 vCPU, 32 GB  → Scientific computing                │
│   • c7g.*      Graviton3       → Best compute price/perf             │
│                                                                       │
│   MEMORY OPTIMIZED (R, X)                                             │
│   ───────────────────────                                             │
│   High memory-to-CPU ratio for in-memory workloads                   │
│   • r5.large    2 vCPU, 16 GB  → Caching, in-memory DB               │
│   • r5.4xlarge 16 vCPU,128 GB  → SAP HANA, Redis                     │
│   • x1e.xlarge  4 vCPU,122 GB  → Extreme memory                      │
│                                                                       │
│   STORAGE OPTIMIZED (I, D)                                            │
│   ────────────────────────                                            │
│   High sequential read/write access to large datasets                │
│   • i3.large    2 vCPU, 15 GB, 475 GB NVMe → Databases              │
│   • d2.xlarge   4 vCPU, 31 GB, 6 TB HDD   → Data warehousing        │
│                                                                       │
│   ACCELERATED COMPUTING (P, G, Inf)                                   │
│   ─────────────────────────────────                                   │
│   GPU and custom hardware for ML/graphics                            │
│   • p4d.24xlarge  8x A100 GPUs → ML training                        │
│   • g4dn.xlarge   1x T4 GPU    → ML inference, graphics             │
│   • inf1.xlarge   4x Inferentia→ Cost-effective inference           │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

Instance Naming Convention

# Decoding instance type names
def decode_instance_type(instance_type: str):
    """
    Example: m5dn.2xlarge
    
    m     = Instance family (General Purpose)
    5     = Generation (5th gen, higher = newer)
    d     = Additional capability (NVMe SSD)
    n     = Network optimized
    2xlarge = Size (vCPUs and memory)
    
    Size progression:
    nano → micro → small → medium → large → xlarge → 2xlarge → ... → metal
    """
    
    families = {
        'm': 'General Purpose',
        't': 'Burstable',
        'c': 'Compute Optimized',
        'r': 'Memory Optimized',
        'x': 'Memory Optimized (Extreme)',
        'i': 'Storage Optimized (NVMe)',
        'd': 'Storage Optimized (Dense)',
        'p': 'GPU (Training)',
        'g': 'GPU (Graphics/Inference)',
    }
    
    modifiers = {
        'a': 'AMD processor',
        'g': 'AWS Graviton (ARM)',
        'd': 'NVMe SSD storage',
        'n': 'Network optimized',
        'e': 'Extended memory',
        'z': 'High frequency',
    }
    
    return families.get(instance_type[0], 'Unknown')
Pro Tip: Use Graviton (ARM) instances (m7g, c7g, r7g) for 40% better price/performance on compatible workloads. Most applications work without modification.

T-Series Burstable Instances

T-series instances use CPU credits for burstable performance:
┌────────────────────────────────────────────────────────────────────┐
│                    T3 CPU Credit System                             │
├────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   Baseline Performance:                                             │
│   • t3.micro:  10% CPU baseline  (earns 6 credits/hour)            │
│   • t3.small:  20% CPU baseline  (earns 12 credits/hour)           │
│   • t3.medium: 20% CPU baseline  (earns 24 credits/hour)           │
│   • t3.large:  30% CPU baseline  (earns 36 credits/hour)           │
│                                                                     │
│   Credit Usage:                                                     │
│   • 1 credit = 1 vCPU at 100% for 1 minute                         │
│   • Below baseline: Earn credits                                    │
│   • Above baseline: Spend credits                                   │
│   • Credits expire after 24 hours                                   │
│   • Max credit balance: varies by instance size                     │
│                                                                     │
│   CPU Usage Graph:                                                  │
│   100% ┤          ████                                             │
│    80% ┤         █    █          Burst period                      │
│    60% ┤        █      █         (spending credits)                │
│    40% ┤       █        █                                          │
│    20% ┤──────█──────────█──────  ← Baseline                       │
│     0% ┤                                                           │
│        └─────────────────────────► Time                            │
│                                                                     │
│   Modes:                                                            │
│   • Standard: Can burst only with credits                          │
│   • Unlimited: Can burst beyond credits (pay extra)                │
│                                                                     │
└────────────────────────────────────────────────────────────────────┘

AMI (Amazon Machine Image)

AMIs are templates containing OS, application server, and applications.
# AMI Selection Best Practices
ami_best_practices = {
    "use_aws_provided": [
        "Amazon Linux 2023",      # AWS optimized, free
        "Ubuntu 22.04 LTS",       # Popular, well-supported
        "Windows Server 2022",    # For .NET workloads
    ],
    
    "create_custom_ami_when": [
        "Need pre-installed software",
        "Custom security hardening",
        "Faster instance boot time",
        "Consistent deployments",
    ],
    
    "golden_ami_pipeline": """
    Base AMI → Install packages → Configure → Test → Create AMI → Share
    
    Automate with:
    - EC2 Image Builder (AWS native)
    - Packer (HashiCorp)
    """,
}

# Launch EC2 with specific AMI
import boto3

ec2 = boto3.client('ec2')

response = ec2.run_instances(
    ImageId='ami-0c55b159cbfafe1f0',  # Amazon Linux 2023
    InstanceType='t3.medium',
    MinCount=1,
    MaxCount=1,
    KeyName='my-key-pair',
    SecurityGroupIds=['sg-0123456789abcdef0'],
    SubnetId='subnet-0123456789abcdef0',
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                {'Key': 'Name', 'Value': 'WebServer'},
                {'Key': 'Environment', 'Value': 'Production'},
            ]
        }
    ],
    UserData='''#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from $(hostname)</h1>" > /var/www/html/index.html
'''
)

EC2 Instance Metadata Service (IMDS)

Access instance info from within the instance:
# IMDSv2 (recommended - more secure)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Get instance metadata
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/public-ipv4

curl -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/MyRole
# Python - Using requests with IMDSv2
import requests

def get_instance_metadata(path: str) -> str:
    """Get EC2 instance metadata using IMDSv2."""
    # Get token
    token_response = requests.put(
        "http://169.254.169.254/latest/api/token",
        headers={"X-aws-ec2-metadata-token-ttl-seconds": "21600"}
    )
    token = token_response.text
    
    # Get metadata
    response = requests.get(
        f"http://169.254.169.254/latest/meta-data/{path}",
        headers={"X-aws-ec2-metadata-token": token}
    )
    return response.text

# Usage
instance_id = get_instance_metadata("instance-id")
public_ip = get_instance_metadata("public-ipv4")
az = get_instance_metadata("placement/availability-zone")

Lambda (Serverless Functions)

Run code without provisioning servers. Pay only for compute time used.
Lambda Event-Driven Architecture

Lambda Architecture

┌────────────────────────────────────────────────────────────────────────┐
│                      Lambda Execution Model                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Request arrives                                                       │
│        │                                                                │
│        ▼                                                                │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                    Lambda Service                                │  │
│   │                                                                  │  │
│   │   Is there a warm container?                                     │  │
│   │        │                                                         │  │
│   │        ├── YES ──► Reuse container (warm start: ~1ms)           │  │
│   │        │                                                         │  │
│   │        └── NO ───► Create container (cold start: 100ms-10s)     │  │
│   │                    │                                             │  │
│   │                    ├── Download code from S3                     │  │
│   │                    ├── Start runtime (Python, Node, etc.)        │  │
│   │                    ├── Run initialization code                   │  │
│   │                    └── Execute handler                           │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Cold Start Factors:                                                   │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Runtime     │ Cold Start │ Notes                               │  │
│   │─────────────────────────────────────────────────────────────────│  │
│   │  Python      │  100-300ms │ Fast, great for most use cases      │  │
│   │  Node.js     │  100-300ms │ Fast, good for APIs                 │  │
│   │  Go          │   50-100ms │ Fastest cold starts                 │  │
│   │  Java        │  3-10s     │ Slow, use SnapStart or GraalVM      │  │
│   │  .NET        │  2-5s      │ Slower, consider ReadyToRun         │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Mitigation Strategies:                                                │
│   • Provisioned Concurrency (pre-warm containers)                       │
│   • Smaller deployment packages                                         │
│   • Move initialization outside handler                                 │
│   • Use SnapStart for Java                                              │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Lambda Function Best Practices

import json
import boto3
import os
from datetime import datetime

# ✅ Initialize outside handler (runs once per container)
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])

def lambda_handler(event, context):
    """
    Best Practices:
    1. Keep handlers small and focused
    2. Initialize SDK clients outside handler
    3. Use environment variables for configuration
    4. Handle errors gracefully
    5. Log structured data for observability
    """
    
    # Log incoming event (structured logging)
    print(json.dumps({
        'level': 'INFO',
        'message': 'Processing request',
        'request_id': context.aws_request_id,
        'event_type': event.get('httpMethod', 'unknown'),
        'timestamp': datetime.utcnow().isoformat()
    }))
    
    try:
        # Parse input
        if 'body' in event and event['body']:
            body = json.loads(event['body'])
        else:
            body = event
        
        # Process request
        user_id = body.get('user_id')
        if not user_id:
            return response(400, {'error': 'user_id is required'})
        
        # Database operation
        result = table.get_item(Key={'user_id': user_id})
        
        if 'Item' not in result:
            return response(404, {'error': 'User not found'})
        
        return response(200, result['Item'])
        
    except json.JSONDecodeError:
        return response(400, {'error': 'Invalid JSON'})
    except Exception as e:
        # Log error for debugging
        print(json.dumps({
            'level': 'ERROR',
            'message': str(e),
            'request_id': context.aws_request_id
        }))
        return response(500, {'error': 'Internal server error'})


def response(status_code: int, body: dict) -> dict:
    """Create API Gateway compatible response."""
    return {
        'statusCode': status_code,
        'headers': {
            'Content-Type': 'application/json',
            'Access-Control-Allow-Origin': '*'
        },
        'body': json.dumps(body)
    }

Lambda Limits and Quotas

ResourceLimitNotes
Timeout15 minutesUse Step Functions for longer workflows
Memory128 MB - 10 GBMore memory = more CPU proportionally
Package Size50 MB (zip), 250 MB (unzipped)Use layers for dependencies
Concurrent Executions1,000 (default)Can request increase
Payload Size6 MB (sync), 256 KB (async)Use S3 for larger payloads
Ephemeral Storage512 MB - 10 GB (/tmp)For temporary files

Lambda with Container Images

# Dockerfile for Lambda container
FROM public.ecr.aws/lambda/python:3.11

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}

# Set the handler
CMD [ "app.lambda_handler" ]
# Build and push to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin $ECR_URI
docker build -t my-lambda .
docker tag my-lambda:latest $ECR_URI/my-lambda:latest
docker push $ECR_URI/my-lambda:latest

ECS (Elastic Container Service)

AWS-native container orchestration for Docker containers.

ECS Architecture Deep Dive

┌────────────────────────────────────────────────────────────────────────┐
│                        ECS Architecture                                 │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                         ECS Cluster                              │  │
│   │                                                                  │  │
│   │   ┌─────────────────────────────────────────────────────────┐   │  │
│   │   │                      Service                             │   │  │
│   │   │   Desired Count: 3    Running: 3    Pending: 0          │   │  │
│   │   │                                                          │   │  │
│   │   │   Task Definition: my-app:5                              │   │  │
│   │   │   • Container: nginx (256 CPU, 512 MB)                   │   │  │
│   │   │   • Container: app   (512 CPU, 1024 MB)                  │   │  │
│   │   │                                                          │   │  │
│   │   │   ┌──────────────────────────────────────────────────┐  │   │  │
│   │   │   │              Load Balancer                        │  │   │  │
│   │   │   │        (ALB with Target Group)                    │  │   │  │
│   │   │   └─────────────────┬────────────────────────────────┘  │   │  │
│   │   │         ┌───────────┼───────────┐                       │   │  │
│   │   │         │           │           │                       │   │  │
│   │   │         ▼           ▼           ▼                       │   │  │
│   │   │   ┌──────────┐ ┌──────────┐ ┌──────────┐               │   │  │
│   │   │   │  Task 1  │ │  Task 2  │ │  Task 3  │               │   │  │
│   │   │   │  (AZ-1a) │ │  (AZ-1b) │ │  (AZ-1c) │               │   │  │
│   │   │   └──────────┘ └──────────┘ └──────────┘               │   │  │
│   │   │                                                          │   │  │
│   │   └─────────────────────────────────────────────────────────┘   │  │
│   │                                                                  │  │
│   │   Launch Types:                                                  │  │
│   │   ┌────────────────────────┐  ┌────────────────────────┐        │  │
│   │   │         EC2            │  │       Fargate          │        │  │
│   │   │  ────────────────────  │  │  ────────────────────  │        │  │
│   │   │  • You manage EC2      │  │  • Serverless          │        │  │
│   │   │  • More control        │  │  • No EC2 management   │        │  │
│   │   │  • Use Reserved/Spot   │  │  • Pay per task        │        │  │
│   │   │  • GPU workloads       │  │  • Faster scaling      │        │  │
│   │   └────────────────────────┘  └────────────────────────┘        │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Task Definition Example

{
  "family": "my-web-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "web",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {"name": "NODE_ENV", "value": "production"}
      ],
      "secrets": [
        {
          "name": "DB_PASSWORD",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123:secret:db-password"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/my-web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3
      }
    }
  ]
}

Auto Scaling

Automatically adjust compute capacity to match demand.

Auto Scaling Strategies

┌────────────────────────────────────────────────────────────────────────┐
│                    Auto Scaling Strategies                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   1. TARGET TRACKING (Recommended)                                      │
│   ──────────────────────────────                                        │
│   "Keep CPU at 50%"                                                     │
│                                                                         │
│   CPU Usage                                                             │
│   80% ┤                    ████                                        │
│   60% ┤                   █    █                                       │
│   50% ┼─────────────────█──────█──────── Target                        │
│   40% ┤                █        █                                      │
│   20% ┤███████████████          ████                                   │
│       └────────────────────────────────► Time                          │
│                                                                         │
│   Automatically adds/removes instances to maintain target               │
│                                                                         │
│   2. STEP SCALING                                                       │
│   ───────────────                                                       │
│   "If CPU > 80%, add 3. If CPU > 60%, add 1."                          │
│                                                                         │
│   Alarm Threshold │ Scaling Action                                     │
│   ────────────────┼────────────────                                    │
│   CPU < 30%       │ Remove 2 instances                                 │
│   CPU 30-50%      │ Do nothing                                         │
│   CPU 50-70%      │ Add 1 instance                                     │
│   CPU 70-90%      │ Add 2 instances                                    │
│   CPU > 90%       │ Add 4 instances                                    │
│                                                                         │
│   3. SCHEDULED SCALING                                                  │
│   ────────────────────                                                  │
│   "Scale to 10 instances at 9 AM, scale to 3 at 6 PM"                  │
│                                                                         │
│   Instances                                                             │
│   10 ┤     ██████████████████                                         │
│    8 ┤    █                  █                                        │
│    6 ┤   █                    █                                       │
│    3 ┼███                      ███████                                │
│      └──────────────────────────────────► Time                         │
│        6AM  9AM           5PM  8PM                                     │
│                                                                         │
│   4. PREDICTIVE SCALING                                                 │
│   ─────────────────────                                                 │
│   Uses ML to predict demand and scale proactively                      │
│   Best for cyclical patterns (daily, weekly)                           │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Auto Scaling Configuration (Terraform)

# Auto Scaling Group
resource "aws_autoscaling_group" "web" {
  name                = "web-asg"
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = [aws_lb_target_group.web.arn]
  health_check_type   = "ELB"
  
  min_size         = 2
  max_size         = 10
  desired_capacity = 3
  
  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }
  
  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }
  
  tag {
    key                 = "Name"
    value               = "web-server"
    propagate_at_launch = true
  }
}

# Target Tracking Policy
resource "aws_autoscaling_policy" "cpu" {
  name                   = "cpu-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.web.name
  policy_type            = "TargetTrackingScaling"
  
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 50.0
  }
}

# Scale on Request Count
resource "aws_autoscaling_policy" "requests" {
  name                   = "request-target-tracking"
  autoscaling_group_name = aws_autoscaling_group.web.name
  policy_type            = "TargetTrackingScaling"
  
  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ALBRequestCountPerTarget"
      resource_label         = "${aws_lb.main.arn_suffix}/${aws_lb_target_group.web.arn_suffix}"
    }
    target_value = 1000.0  # 1000 requests per target
  }
}

Cost Optimization

Compute Cost Strategies

┌────────────────────────────────────────────────────────────────────────┐
│                  Compute Cost Optimization                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   1. RIGHT-SIZING (15-30% savings)                                      │
│   ─────────────────────────────────                                     │
│   • Use AWS Compute Optimizer recommendations                           │
│   • Monitor CloudWatch metrics (CPU, memory)                            │
│   • Downsize underutilized instances                                    │
│                                                                         │
│   Example:                                                              │
│   m5.xlarge (15% CPU avg) → m5.large = 50% cost reduction             │
│                                                                         │
│   2. RESERVED + SAVINGS PLANS (30-72% savings)                          │
│   ─────────────────────────────────────────────                         │
│   • Reserved Instances for steady-state                                 │
│   • Savings Plans for flexibility                                       │
│   • Cover 60-80% of baseline with commitments                          │
│                                                                         │
│   3. SPOT INSTANCES (60-90% savings)                                    │
│   ─────────────────────────────────                                     │
│   Use for:                                                              │
│   ✅ CI/CD workers                                                      │
│   ✅ Batch processing                                                   │
│   ✅ Dev/test environments                                              │
│   ✅ Stateless web servers (behind ASG)                                 │
│                                                                         │
│   4. GRAVITON INSTANCES (40% better price/perf)                         │
│   ────────────────────────────────────────────                          │
│   • m7g, c7g, r7g instance families                                    │
│   • Most applications work without changes                              │
│                                                                         │
│   5. LAMBDA OPTIMIZATION                                                │
│   ──────────────────────                                                │
│   • Right-size memory (affects CPU)                                     │
│   • Use Graviton (arm64) for 34% lower cost                            │
│   • Minimize cold starts                                                │
│                                                                         │
│   COMBINED STRATEGY EXAMPLE:                                            │
│   ──────────────────────────                                            │
│   Original: 10x m5.xlarge On-Demand = $1,382/month                     │
│                                                                         │
│   Optimized:                                                            │
│   • 4x m7g.large Reserved (60% base) = $230/month                      │
│   • 4x m7g.large Spot (30% variable) = $69/month                       │
│   • 2x m7g.large On-Demand (10% buffer) = $130/month                   │
│   Total: $429/month (69% savings!)                                     │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

🎯 Interview Questions

Lambda is better when:
  • Event-driven, short-running tasks (< 15 min)
  • Unpredictable or spiky traffic
  • You want zero server management
  • Cost matters more than consistent latency
EC2 is better when:
  • Long-running processes
  • Need specific OS/hardware (GPUs)
  • Consistent, predictable traffic
  • Cost optimization with Reserved Instances
  • Need persistent connections (WebSockets)
Strategies:
  1. Provisioned Concurrency - Pre-warm containers
  2. Smaller packages - Reduce initialization time
  3. Initialize outside handler - SDK clients, DB connections
  4. Use lighter runtimes - Go, Python, Node.js
  5. SnapStart for Java - Checkpoint/restore
  6. Keep functions warm - Scheduled pings (not ideal)
Code pattern:
# Initialize OUTSIDE handler
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE'])

def handler(event, context):
    # Only process logic here
    return table.get_item(...)
Choose ECS when:
  • AWS-native, simpler setup
  • Smaller team without K8s expertise
  • Tighter AWS integration needed
  • Lower operational overhead
Choose EKS when:
  • Multi-cloud strategy
  • Team has Kubernetes expertise
  • Need K8s ecosystem (Helm, operators)
  • Complex microservices architectures
  • Portability is important
Cost Note: EKS adds 0.10/hourpercluster( 0.10/hour per cluster (~72/month)
Multi-layered approach:
  1. Predictive Scaling: Scale up before known peaks (Black Friday)
  2. Target Tracking: Maintain 50% CPU average
  3. Step Scaling: Add extra capacity for sudden spikes
Configuration:
  • Minimum: 4 instances (2 per AZ)
  • Desired: 6 instances (normal load)
  • Maximum: 50 instances (peak capacity)
Policies:
  • Target tracking: CPU at 50%
  • Step: +4 if CPU > 80% for 2 min
  • Scheduled: Scale to 20 at 8 AM, 10 at 10 PM
  • Predictive: Enable for daily patterns
Cool-downs:
  • Scale-out: 60 seconds
  • Scale-in: 300 seconds (avoid thrashing)
Framework (in order of impact):
  1. Right-size (15-30% savings)
    • Use Compute Optimizer
    • Monitor actual utilization
  2. Purchase options (30-72% savings)
    • Reserved for baseline (60-70% of capacity)
    • Spot for stateless/batch
    • Savings Plans for flexibility
  3. Instance selection (20-40% savings)
    • Graviton (ARM) for compatible workloads
    • Latest generation (m7 vs m5)
  4. Shutdown automation
    • Stop dev/test outside hours
    • Auto-scaling to zero when possible
  5. Regular review
    • Monthly cost reviews
    • Tag-based cost allocation

🧪 Hands-On Lab: Deploy Scalable Web App

Objective: Deploy a Node.js application with Auto Scaling and Load Balancing
1

Create Launch Template

Configure EC2 instance with user data for automatic setup
2

Create Auto Scaling Group

Set min=2, max=6, desired=3 across 2 AZs
3

Create Application Load Balancer

Configure health checks and target group
4

Configure Scaling Policies

Add target tracking (CPU 50%) and step scaling
5

Test Scaling

Generate load with ab or hey and watch scaling in action

Next Module

Storage & Databases

Master S3, EBS, RDS, DynamoDB, and ElastiCache