Skip to main content
AWS Well-Architected 6 Pillars

Module Overview

Estimated Time: 3-4 hours | Difficulty: Intermediate | Prerequisites: Previous AWS modules
The Well-Architected Framework is critical for AWS certifications and real-world architecture. It provides best practices and design principles for building cloud systems. What You’ll Learn:
  • The 6 pillars and their design principles
  • Common architectural patterns
  • Trade-offs between pillars
  • How to perform architecture reviews
  • Practical implementation guidance

Overview

The AWS Well-Architected Framework provides a consistent approach for evaluating architectures and implementing designs that scale.
┌────────────────────────────────────────────────────────────────┐
│               Well-Architected Framework Pillars                │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│   │ Operational  │  │   Security   │  │  Reliability │        │
│   │  Excellence  │  │              │  │              │        │
│   │     🔧       │  │     🔒       │  │     💪       │        │
│   └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                 │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│   │ Performance  │  │    Cost      │  │Sustainability│        │
│   │  Efficiency  │  │ Optimization │  │              │        │
│   │     ⚡       │  │     💰       │  │     🌱       │        │
│   └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

1. Operational Excellence

Run and monitor systems to deliver business value

Design Principles

  • Perform operations as code - Infrastructure as Code (IaC)
  • Make frequent, small, reversible changes
  • Refine operations procedures frequently
  • Anticipate failure - Pre-mortems
  • Learn from operational failures

Key Practices

┌────────────────────────────────────────────────────────────────┐
│               Operational Excellence Practices                  │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Infrastructure as Code                                        │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐             │
│   │CloudForm- │    │ Terraform │    │    CDK    │             │
│   │  ation    │    │           │    │           │             │
│   └───────────┘    └───────────┘    └───────────┘             │
│                                                                 │
│   Observability                                                 │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐             │
│   │CloudWatch │    │  X-Ray    │    │  Logs     │             │
│   │  Metrics  │    │ Tracing   │    │ Insights  │             │
│   └───────────┘    └───────────┘    └───────────┘             │
│                                                                 │
│   Automation                                                    │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐             │
│   │  Lambda   │    │   SSM     │    │EventBridge│             │
│   │ Functions │    │ Runbooks  │    │   Rules   │             │
│   └───────────┘    └───────────┘    └───────────┘             │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

2. Security

Protect information, systems, and assets

Design Principles

  • Implement a strong identity foundation - Least privilege
  • Enable traceability - Monitor, alert, audit
  • Apply security at all layers
  • Automate security best practices
  • Protect data in transit and at rest
  • Keep people away from data - Reduce manual access

Security Architecture

┌────────────────────────────────────────────────────────────────┐
│                    Defense in Depth                             │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Edge                                                          │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  CloudFront + WAF + Shield                              │   │
│   └────────────────────────────────────────────────────────┘   │
│                             │                                   │
│   Network                   ▼                                   │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  VPC + Security Groups + NACLs + Flow Logs             │   │
│   └────────────────────────────────────────────────────────┘   │
│                             │                                   │
│   Compute                   ▼                                   │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  IAM Roles + Instance Metadata + Patching              │   │
│   └────────────────────────────────────────────────────────┘   │
│                             │                                   │
│   Application               ▼                                   │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  Secrets Manager + Parameter Store + Code Scanning     │   │
│   └────────────────────────────────────────────────────────┘   │
│                             │                                   │
│   Data                      ▼                                   │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  KMS Encryption + Backup + Versioning + Access Logging │   │
│   └────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

3. Reliability

Recover from failures and meet demand

Design Principles

  • Automatically recover from failure
  • Test recovery procedures
  • Scale horizontally
  • Stop guessing capacity
  • Manage change through automation

High Availability Pattern

┌────────────────────────────────────────────────────────────────┐
│                Multi-AZ Architecture                            │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│                    ┌───────────────┐                           │
│                    │  Route 53     │                           │
│                    │  (DNS + HC)   │                           │
│                    └───────┬───────┘                           │
│                            │                                    │
│                    ┌───────▼───────┐                           │
│                    │     ALB       │                           │
│                    │  (Multi-AZ)   │                           │
│                    └───────┬───────┘                           │
│              ┌─────────────┼─────────────┐                     │
│              │             │             │                     │
│              ▼             ▼             ▼                     │
│        ┌──────────┐  ┌──────────┐  ┌──────────┐               │
│        │   EC2    │  │   EC2    │  │   EC2    │               │
│        │  (AZ-1)  │  │  (AZ-2)  │  │  (AZ-3)  │               │
│        └────┬─────┘  └────┬─────┘  └────┬─────┘               │
│             │             │             │                      │
│             └─────────────┼─────────────┘                      │
│                           │                                    │
│                    ┌──────▼──────┐                             │
│                    │    RDS      │                             │
│                    │  Multi-AZ   │                             │
│                    │(Primary+    │                             │
│                    │ Standby)    │                             │
│                    └─────────────┘                             │
│                                                                 │
│   RTO: Minutes  |  RPO: Near-zero  |  Availability: 99.99%    │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Reliability Metrics

MetricDefinitionTarget
RTORecovery Time ObjectiveHow fast to recover
RPORecovery Point ObjectiveHow much data loss acceptable
MTTRMean Time To RecoveryAverage recovery time
MTBFMean Time Between FailuresAverage uptime

4. Performance Efficiency

Use resources efficiently as demand changes

Design Principles

  • Democratize advanced technologies - Use managed services
  • Go global in minutes
  • Use serverless architectures
  • Experiment more often
  • Consider mechanical sympathy - Match architecture to workload

Performance Patterns

┌────────────────────────────────────────────────────────────────┐
│              Performance Optimization Stack                     │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Caching Layers                                                │
│   ┌──────────────────────────────────────────────────────────┐ │
│   │                                                           │ │
│   │   Browser    →   CDN      →   API Cache   →   DB Cache   │ │
│   │   (headers)     (CloudFront)  (API GW)      (ElastiCache) │ │
│   │                                                           │ │
│   └──────────────────────────────────────────────────────────┘ │
│                                                                 │
│   Compute Selection                                             │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐             │
│   │ Right-size  │ │ Graviton    │ │ Spot for    │             │
│   │ instances   │ │ (ARM)       │ │ batch       │             │
│   └─────────────┘ └─────────────┘ └─────────────┘             │
│                                                                 │
│   Database Optimization                                         │
│   ┌─────────────┐ ┌─────────────┐ ┌─────────────┐             │
│   │ Read        │ │ Connection  │ │ Query       │             │
│   │ Replicas    │ │ Pooling     │ │ Optimization│             │
│   └─────────────┘ └─────────────┘ └─────────────┘             │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

5. Cost Optimization

Avoid unnecessary costs

Design Principles

  • Implement cloud financial management
  • Adopt a consumption model - Pay only for what you use
  • Measure overall efficiency
  • Stop spending on undifferentiated heavy lifting
  • Analyze and attribute expenditure

Cost Optimization Strategies

┌────────────────────────────────────────────────────────────────┐
│                 Cost Optimization Levers                        │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│   Right-Sizing                                                  │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  • Use Cost Explorer recommendations                    │   │
│   │  • Monitor with CloudWatch                              │   │
│   │  • Downsize underutilized instances                     │   │
│   └────────────────────────────────────────────────────────┘   │
│                                                                 │
│   Pricing Models (Up to 72% savings)                           │
│   ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│   │   Reserved   │  │    Spot      │  │   Savings    │        │
│   │  Instances   │  │  Instances   │  │    Plans     │        │
│   │  (1-3 year)  │  │ (up to 90%)  │  │  (flexible)  │        │
│   └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                 │
│   Architecture Optimization                                     │
│   ┌────────────────────────────────────────────────────────┐   │
│   │  • Serverless (Lambda, Fargate)                         │   │
│   │  • S3 Lifecycle policies                                │   │
│   │  • Auto-scaling                                         │   │
│   │  • Delete unused resources                              │   │
│   └────────────────────────────────────────────────────────┘   │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

6. Sustainability

Minimize environmental impact

Design Principles

  • Understand your impact
  • Establish sustainability goals
  • Maximize utilization
  • Adopt efficient hardware/software
  • Use managed services
  • Reduce downstream impact

Sustainability Practices

AreaPractice
ComputeRight-size, Graviton (ARM), Spot
StorageLifecycle policies, compression
DataEfficient formats, cold storage
CodeOptimize algorithms, reduce calls

Architecture Review Checklist

well_architected_review = {
    "operational_excellence": [
        "□ Infrastructure as Code?",
        "□ Monitoring and alerting?",
        "□ Runbooks for incidents?",
        "□ CI/CD pipeline?",
    ],
    "security": [
        "□ Least privilege IAM?",
        "□ Encryption at rest/transit?",
        "□ Network segmentation?",
        "□ Audit logging?",
    ],
    "reliability": [
        "□ Multi-AZ deployment?",
        "□ Health checks?",
        "□ Backup strategy?",
        "□ Disaster recovery tested?",
    ],
    "performance": [
        "□ Right-sized resources?",
        "□ Caching strategy?",
        "□ CDN for static content?",
        "□ Database optimized?",
    ],
    "cost": [
        "□ Reserved/Spot usage?",
        "□ Unused resources cleaned?",
        "□ Cost allocation tags?",
        "□ Budget alerts?",
    ],
    "sustainability": [
        "□ Resource utilization >60%?",
        "□ Efficient regions selected?",
        "□ Data lifecycle policies?",
    ]
}
Pro Tip: Use the AWS Well-Architected Tool in the console to perform self-assessments and get improvement recommendations based on your workload.

🎯 Interview Questions

  1. Operational Excellence: Run and monitor systems (IaC, automation)
  2. Security: Protect information and assets (IAM, encryption)
  3. Reliability: Recover from failures, meet demand (Multi-AZ, backup)
  4. Performance Efficiency: Use resources efficiently (right-sizing, caching)
  5. Cost Optimization: Avoid unnecessary costs (Reserved, Spot, Savings Plans)
  6. Sustainability: Minimize environmental impact (efficiency, managed services)
Each pillar has design principles and best practices.
Strategy:
  1. Multi-AZ deployment across at least 2 AZs
  2. Load balancer for traffic distribution
  3. Auto Scaling for capacity
  4. RDS Multi-AZ for database HA
  5. Route 53 health checks for DNS failover
Metrics:
  • 99.9% = 8.7 hours downtime/year
  • 99.99% = 52.5 minutes/year
  • 99.999% = 5.25 minutes/year
Trade-off: Higher availability = higher cost
Strategies:
  1. Right-size first - don’t over-provision
  2. Reserved/Savings Plans for baseline (60-70%)
  3. Spot instances for fault-tolerant workloads
  4. Serverless for variable workloads
  5. Caching to reduce compute/database load
Trade-offs:
  • Reserved = commitment vs discount
  • Spot = savings vs interruption risk
  • Caching = complexity vs performance
Layers:
  1. Edge: CloudFront, WAF, Shield
  2. Network: VPC, Security Groups, NACLs
  3. Compute: IAM roles, patching, hardening
  4. Application: Input validation, secrets management
  5. Data: Encryption at rest/transit, backup
Principle: If one layer fails, others still protect
RTO (Recovery Time Objective):
  • How long to recover from failure
  • “How much downtime is acceptable?”
  • Example: 4 hours RTO = must be back in 4 hours
RPO (Recovery Point Objective):
  • How much data loss is acceptable
  • “How far back in time to recover?”
  • Example: 1 hour RPO = max 1 hour of data loss
Trade-offs:
  • Lower RTO/RPO = more expensive (Multi-AZ, continuous backup)
  • Higher RTO/RPO = cheaper (single-AZ, daily backup)

🧪 Hands-On Lab: Well-Architected Review

Objective: Perform a Well-Architected review on an existing workload
1

Access Well-Architected Tool

Go to AWS Console → Well-Architected Tool → Define Workload
2

Answer Pillar Questions

Go through each pillar’s questions, marking current state
3

Review High-Risk Issues

Identify HRIs (High Risk Issues) flagged by the tool
4

Create Improvement Plan

Prioritize and create action items for improvements
5

Implement Changes

Address top 3 issues and re-run review

Next: Case Studies

Serverless URL Shortener

Build a complete serverless application with Lambda, API Gateway, and DynamoDB