Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

AWS Lambda Architecture

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Core Concepts
AWS Lambda is the foundation of serverless computing on AWS. Think of it as the difference between owning a car and calling an Uber: with EC2, you own the car (the server), pay insurance and maintenance whether you drive or not, and handle every repair yourself. With Lambda, you pay only for the ride (the execution), somebody else maintains the vehicle, and you never think about parking. The trade-off is that you give up control over the engine — you cannot customize the operating system, you are limited to 15-minute trips, and if you need the car all day every day, owning becomes cheaper. This module covers everything from basic function creation to advanced optimization patterns used in production.
Cost crossover point: Lambda is cheaper than a comparable Fargate or EC2 setup up to roughly 1-2 million invocations per day for a typical 200ms function at 512 MB memory. Beyond that, the per-invocation pricing starts losing to always-on compute. Always model both options before committing to Lambda for high-throughput services.
What You’ll Learn:
  • Lambda execution model and lifecycle
  • Event sources and triggers
  • Lambda Layers and custom runtimes
  • Cold starts and performance optimization
  • VPC configuration and networking
  • Concurrency and scaling
  • Best practices and production patterns
  • Cost optimization strategies

Why Lambda?

No Servers

Zero infrastructure management — AWS handles provisioning, patching, and scaling the underlying fleet. Compared to Azure Functions or GCP Cloud Functions, Lambda has the deepest integration with its parent ecosystem.

Auto-Scaling

Scales from zero to thousands of concurrent executions automatically. Unlike EC2 Auto Scaling (which takes minutes), Lambda scales in milliseconds per invocation — each request gets its own execution environment.

Pay Per Use

Billed per 1ms of execution time. A function using 512 MB and running 200ms costs roughly 0.0000016perinvocation.Amillioninvocations/monthatthatprofilecostsabout0.0000016 per invocation. A million invocations/month at that profile costs about 1.67 — cheaper than a cup of coffee.

Event-Driven

Integrates with 200+ AWS services as event sources. S3 uploads, DynamoDB changes, API Gateway requests, SQS messages — Lambda is the universal glue of AWS architectures.

Lambda Execution Model

┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda Execution Lifecycle                           │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                     COLD START                                   │  │
│   │  ┌───────────┐  ┌────────────┐  ┌─────────────┐  ┌───────────┐  │  │
│   │  │ Download  │  │ Start      │  │ Initialize  │  │ Run       │  │  │
│   │  │ Code      │─►│ Container  │─►│ Runtime     │─►│ Handler   │  │  │
│   │  │           │  │            │  │ + Init Code │  │           │  │  │
│   │  └───────────┘  └────────────┘  └─────────────┘  └───────────┘  │  │
│   │       │              │                │               │          │  │
│   │      AWS           AWS              YOUR CODE       YOUR CODE    │  │
│   │    (100-500ms)   (50-100ms)        (varies)        (billed)     │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                     WARM START                                   │  │
│   │  ┌───────────────────────────────────────────┐  ┌───────────┐   │  │
│   │  │      Container Already Running            │  │ Run       │   │  │
│   │  │      (execution environment reused)       │─►│ Handler   │   │  │
│   │  └───────────────────────────────────────────┘  └───────────┘   │  │
│   │                     SKIPPED                        YOUR CODE     │  │
│   │                   (milliseconds)                   (billed)      │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Execution Environment:                                                │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  /tmp (512 MB - 10 GB)  │  Memory (128 MB - 10 GB)              │  │
│   │  Persisted between      │  CPU proportional to memory           │  │
│   │  warm invocations       │  1769 MB = 1 vCPU                     │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Handler Function Structure

The handler is the entry point Lambda calls. Everything outside the handler runs once per container (cold start) and persists across warm invocations — this is the single most important optimization you can make. Think of it like a restaurant kitchen: the init code sets up the stations and heats the ovens (once), while the handler is the cook who plates each individual order.
import json
import boto3
from typing import Any, Dict

# -----------------------------------------------------------------
# INIT CODE -- Runs ONCE per container lifecycle (cold start only).
# This is "free" on warm invocations. Put expensive setup here:
# database connections, SDK clients, loaded ML models, config.
# This code is NOT billed during provisioned concurrency init.
# -----------------------------------------------------------------
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')
# boto3 reuses HTTP connections under the hood via urllib3 pooling.
# Creating the client outside the handler means warm invocations
# skip the TLS handshake -- saving 50-100ms per call.

def lambda_handler(event: Dict[str, Any], context) -> Dict[str, Any]:
    """
    Lambda handler function -- called on EVERY invocation.

    Args:
        event: Event data from trigger (S3, API Gateway, etc.)
               The shape of this dict changes depending on the source.
        context: Runtime information provided by Lambda:
            - context.function_name        # e.g., "order-service-prod"
            - context.memory_limit_in_mb   # what you configured (128-10240)
            - context.invoked_function_arn  # full ARN including alias/version
            - context.aws_request_id       # unique per invocation -- use for tracing
            - context.get_remaining_time_in_millis()  # time before Lambda kills you

    Returns:
        Response format depends on trigger type. API Gateway expects
        statusCode + body. S3/SQS triggers ignore the return value.
    """

    # Always log the request ID -- this is how you correlate logs
    # in CloudWatch with X-Ray traces and downstream service calls.
    print(f"Request ID: {context.aws_request_id}")
    print(f"Remaining time: {context.get_remaining_time_in_millis()}ms")

    try:
        # API Gateway sends the body as a JSON string, not a dict.
        # This trips up almost every new Lambda developer.
        body = json.loads(event.get('body', '{}'))
        order_id = body.get('order_id')

        # DynamoDB get_item is a single-digit-millisecond operation.
        # If this were a scan or query, you would want pagination.
        result = table.get_item(Key={'order_id': order_id})

        return {
            'statusCode': 200,
            # API Gateway requires these headers for CORS to work.
            # Forgetting this causes "No 'Access-Control-Allow-Origin'"
            # errors that are painful to debug in production.
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
            'body': json.dumps(result.get('Item', {}))
        }

    except Exception as e:
        # In production, use structured logging (aws-lambda-powertools)
        # instead of print(). Print works, but you cannot query it
        # efficiently in CloudWatch Logs Insights.
        print(f"Error: {str(e)}")
        return {
            'statusCode': 500,
            # NEVER return raw exception messages to API consumers
            # in production -- they can leak internal details.
            # Return a generic error and log the real one.
            'body': json.dumps({'error': 'Internal server error'})
        }
Production pitfall: The event['body'] from API Gateway is a string, not a dict. If you forget json.loads(), you will get AttributeError: 'str' object has no attribute 'get' and spend 30 minutes wondering why your perfectly valid JSON is not working. Every Lambda developer hits this exactly once.

Event Sources and Triggers

Understanding event source types is critical because they determine your error handling strategy, retry behavior, and scaling characteristics. A senior engineer would say: “The invocation type dictates everything downstream — your idempotency requirements, your DLQ strategy, and your concurrency model.”
Lambda Event Sources
┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda Event Source Types                            │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   SYNCHRONOUS (Request-Response)                                       │
│   ─────────────────────────────                                        │
│   • API Gateway (REST, HTTP, WebSocket)                                │
│   • Application Load Balancer                                          │
│   • Cognito User Pools                                                 │
│   • Alexa Skills                                                       │
│   • CloudFront (Lambda@Edge)                                           │
│   → Caller waits for response                                          │
│   → Errors returned to caller                                          │
│                                                                         │
│   ASYNCHRONOUS (Fire-and-Forget)                                       │
│   ─────────────────────────────                                        │
│   • S3 Events                                                          │
│   • SNS                                                                │
│   • EventBridge                                                        │
│   • SES                                                                │
│   • CloudWatch Logs                                                    │
│   → Caller gets 202 Accepted immediately                               │
│   → Lambda retries on failure (2 attempts by default)                  │
│   → Can configure DLQ for failed events                                │
│                                                                         │
│   POLL-BASED (Event Source Mapping)                                    │
│   ────────────────────────────────                                     │
│   • SQS                                                                │
│   • DynamoDB Streams                                                   │
│   • Kinesis Data Streams                                               │
│   • Amazon MQ                                                          │
│   • Kafka (MSK, self-managed)                                          │
│   → Lambda polls the source                                            │
│   → Batch processing supported                                         │
│   → Lambda manages checkpointing                                       │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
Why three invocation types matter for interviews: Interviewers often ask “what happens when your Lambda fails?” The correct answer depends entirely on the invocation type. For synchronous (API Gateway), the caller gets the error. For asynchronous (S3, SNS), Lambda retries twice then sends to a DLQ. For poll-based (SQS, Kinesis), Lambda retries until the message expires or you configure a max retry count. Getting this wrong in a design review is a red flag.

Common Event Formats

Each trigger sends a different event shape. Memorizing these is less important than understanding the pattern: every event source wraps its data differently, and your handler’s first job is to unwrap the payload correctly.
# API Gateway Event (REST API)
# NOTE: HTTP API (v2) uses a completely different format!
# REST API = "version 1.0" payload, HTTP API = "version 2.0" payload.
# Mixing these up is one of the most common Lambda debugging issues.
api_gateway_event = {
    "httpMethod": "POST",
    "path": "/orders",
    "pathParameters": {"id": "123"},         # From URL path: /orders/{id}
    "queryStringParameters": {"status": "pending"},  # From ?status=pending
    "headers": {"Content-Type": "application/json"},
    "body": "{\"item\": \"widget\", \"qty\": 2}",   # Always a STRING, not dict!
    "requestContext": {
        "requestId": "abc-123",              # API Gateway's request ID (not Lambda's)
        "authorizer": {"claims": {"sub": "user-123"}}  # From Cognito/JWT
    }
}

# S3 Event -- triggered by object creation, deletion, etc.
# GOTCHA: The object key is URL-encoded. "my file.jpg" arrives as "my+file.jpg".
# You MUST decode it: urllib.parse.unquote_plus(key)
s3_event = {
    "Records": [{                            # Can contain multiple records
        "eventName": "ObjectCreated:Put",
        "s3": {
            "bucket": {"name": "my-bucket"},
            "object": {
                "key": "uploads/image.jpg",  # URL-encoded! Decode before using.
                "size": 1024                 # Bytes -- useful for validation
            }
        }
    }]
}

# SQS Event -- Lambda polls SQS and delivers batches
# IMPORTANT: If ANY record in the batch fails and you raise an exception,
# ALL records in the batch become visible again and get reprocessed.
# Use "partial batch response" (ReportBatchItemFailures) to avoid this.
sqs_event = {
    "Records": [{
        "messageId": "msg-123",
        "body": "{\"order_id\": \"12345\"}",  # Your application payload
        "attributes": {
            "ApproximateReceiveCount": "1"    # How many times this was delivered
        }
    }]
}

# DynamoDB Stream Event -- captures table changes in near real-time
# Think of this as a CDC (Change Data Capture) mechanism.
# Azure equivalent: Cosmos DB Change Feed. GCP equivalent: Firestore triggers.
dynamodb_stream_event = {
    "Records": [{
        "eventName": "INSERT",               # INSERT, MODIFY, or REMOVE
        "dynamodb": {
            "NewImage": {                     # The item AFTER the change
                "PK": {"S": "ORDER#123"},
                "status": {"S": "CREATED"}
            },
            # OldImage only available if StreamViewType = NEW_AND_OLD_IMAGES
            "StreamViewType": "NEW_AND_OLD_IMAGES"
        }
    }]
}

# EventBridge Event -- the preferred way to build event-driven architectures
# Unlike SNS (which pushes), EventBridge routes based on rules.
# Azure equivalent: Event Grid. GCP equivalent: Eventarc.
eventbridge_event = {
    "source": "custom.myapp",
    "detail-type": "OrderCreated",           # Used for rule matching
    "detail": {
        "order_id": "12345",
        "amount": 99.99
    }
}
SQS partial batch failure trap: By default, if your Lambda processes 10 SQS messages and 1 fails, all 10 become visible again and get reprocessed. This causes duplicate processing for the 9 that succeeded. Enable FunctionResponseTypes: ["ReportBatchItemFailures"] in your event source mapping and return batchItemFailures with only the failed message IDs. This single configuration change prevents a class of production bugs that are extremely painful to debug.

Lambda Layers

Layers allow you to share code and dependencies across multiple functions. Think of layers like shared libraries on a Linux system — instead of every application bundling its own copy of libc, they all reference a single shared installation. In Lambda terms, instead of every function including its own copy of boto3 or requests in its deployment package, they all reference a shared layer at /opt/.
┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda Layers Architecture                           │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Lambda Function                                                       │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Your Code (handler.py)                                          │  │
│   ├─────────────────────────────────────────────────────────────────┤  │
│   │  Layer 1: Shared Libraries (boto3, requests)                     │  │
│   ├─────────────────────────────────────────────────────────────────┤  │
│   │  Layer 2: Custom Utilities (logging, auth)                       │  │
│   ├─────────────────────────────────────────────────────────────────┤  │
│   │  Layer 3: AWS SDK Extensions (X-Ray, Powertools)                │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Benefits:                                                             │
│   • Reduce deployment package size                                     │
│   • Share code across functions                                        │
│   • Separate dependencies from business logic                          │
│   • Faster deployments (layers cached)                                 │
│                                                                         │
│   Limits:                                                               │
│   • Maximum 5 layers per function                                      │
│   • Total unzipped size: 250 MB (function + layers)                   │
│   • Layer path: /opt/                                                   │
│     - Python: /opt/python/                                             │
│     - Node.js: /opt/nodejs/node_modules/                               │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Creating a Lambda Layer

# Create layer structure for Python.
# The directory structure MUST match exactly or Lambda will not find your packages.
# Python layers expect: python/lib/python3.x/site-packages/
# A simpler alternative that also works: python/ (Lambda adds /opt/python to sys.path)
mkdir -p python/lib/python3.11/site-packages

# Install dependencies into the layer directory.
# IMPORTANT: If you develop on macOS/Windows, some packages (e.g., numpy, pandas)
# include platform-specific binaries. You MUST build on Amazon Linux 2 or use
# --platform manylinux2014_x86_64 to get Linux-compatible binaries.
pip install requests boto3 aws-lambda-powertools \
    -t python/lib/python3.11/site-packages \
    --platform manylinux2014_x86_64 --only-binary=:all:

# Package the layer -- watch the size! Unzipped total (function + all layers)
# cannot exceed 250 MB. Use 'du -sh python/' to check before zipping.
zip -r layer.zip python/

# Deploy layer -- each publish creates a new immutable version.
# Old versions are NOT deleted automatically (they accumulate and can hit
# the 75 GB account storage limit if you deploy frequently via CI/CD).
aws lambda publish-layer-version \
    --layer-name my-dependencies \
    --zip-file fileb://layer.zip \
    --compatible-runtimes python3.11 \
    --compatible-architectures x86_64 arm64 \
    --description "Common Python dependencies v1.2"
Cost tip: If your layer includes large ML libraries (numpy, pandas, scikit-learn), consider using the arm64 architecture. Graviton2 Lambda functions are 20% cheaper per GB-second AND typically 10-15% faster for compute-heavy workloads. That is free money on the table.

AWS Lambda Powertools

Lambda Powertools is the single most impactful library you can add to any production Lambda function. It provides structured logging, distributed tracing, and custom metrics with minimal boilerplate. If you are not using it, you are writing all of this plumbing yourself (and probably getting it wrong). The Azure equivalent is Azure Functions Extensions; the GCP equivalent is the Functions Framework with OpenTelemetry.
# Using AWS Lambda Powertools for production-ready Lambda.
# Install via layer or pip: pip install aws-lambda-powertools
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.utilities.typing import LambdaContext
from aws_lambda_powertools.utilities.validation import validate
from aws_lambda_powertools.event_handler import APIGatewayRestResolver

# These are initialized ONCE per container (cold start).
# The service name appears in every log line and X-Ray trace --
# this is how you find your function's logs in a sea of microservices.
logger = Logger(service="order-service")
tracer = Tracer(service="order-service")
metrics = Metrics(service="order-service", namespace="MyApp")
app = APIGatewayRestResolver()

@app.post("/orders")
@tracer.capture_method  # Automatically creates an X-Ray subsegment
def create_order():
    """Create a new order."""
    body = app.current_event.json_body

    # Structured logging: this outputs JSON, not plain text.
    # You can query it in CloudWatch Logs Insights:
    # fields @timestamp, order_data.order_id | filter level = "INFO"
    logger.info("Creating order", extra={"order_data": body})

    # Custom metrics go to CloudWatch Metrics. You can set alarms on these.
    # Cost: $0.30 per custom metric per month. Do not create high-cardinality
    # metric names (e.g., one per user_id) or your CloudWatch bill explodes.
    metrics.add_metric(name="OrdersCreated", unit="Count", value=1)

    # Business logic...
    order_id = process_order(body)

    return {"order_id": order_id}

# Decorator order matters! inject_lambda_context must be outermost.
@logger.inject_lambda_context    # Adds request_id, function_name to every log
@tracer.capture_lambda_handler   # Creates root X-Ray segment
@metrics.log_metrics             # Flushes metrics at the end of invocation
def lambda_handler(event: dict, context: LambdaContext) -> dict:
    return app.resolve(event, context)
Production pitfall — metric cardinality: Each unique metric name + dimension combination counts as a separate CloudWatch metric at $0.30/month. If you accidentally create a metric per user_id or per request_id, you can generate thousands of metrics and a surprise CloudWatch bill in the hundreds of dollars. Always use bounded dimensions like environment, function_name, or status_code.

Cold Starts and Optimization

Cold starts are the single most discussed topic in serverless computing, and also the most misunderstood. A cold start happens when Lambda must create a new execution environment to handle your request — downloading your code, starting the runtime, and running your initialization code. The restaurant analogy: a cold start is like opening the kitchen from scratch (turning on ovens, prepping stations), while a warm start is an already-running kitchen that just needs to plate the next order. The critical nuance: cold starts affect only a small percentage of invocations in most workloads, but they affect 100% of invocations if your function is rarely called. A function handling 100 requests/second rarely cold-starts. A function handling 1 request/hour cold-starts almost every time.
Lambda Cold Start Optimization

Understanding Cold Starts

┌────────────────────────────────────────────────────────────────────────┐
│                    Cold Start Factors                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Factor              │ Impact          │ How to Optimize              │
│   ────────────────────┼─────────────────┼───────────────────────────── │
│   Package Size        │ High            │ Minimize dependencies        │
│   Runtime             │ Medium          │ Python/Node faster than Java │
│   Memory              │ High            │ More memory = faster init    │
│   VPC                 │ Very High       │ Use VPC only if needed       │
│   Provisioned Conc.   │ Eliminates      │ Pre-warm for critical paths  │
│                                                                         │
│   Typical Cold Start Times:                                             │
│   ─────────────────────────                                            │
│   Python (no VPC):     100-300ms                                       │
│   Node.js (no VPC):    100-300ms                                       │
│   Java (no VPC):       500-3000ms                                      │
│   .NET (no VPC):       200-500ms                                       │
│                                                                         │
│   With VPC (before improvements):  +10-30 seconds                      │
│   With VPC (after ENI improvements): +200-500ms                        │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Optimization Strategies

import json
import boto3

# -----------------------------------------------------------------
# STRATEGY 1: Initialize outside handler.
# This code runs ONCE on cold start and persists across warm invocations.
# A DynamoDB client creation takes ~50-100ms. If you put it inside the
# handler, you pay that cost on EVERY invocation instead of once.
# -----------------------------------------------------------------
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Orders')

# -----------------------------------------------------------------
# STRATEGY 2: Lazy loading for conditional dependencies.
# If only 10% of invocations need a heavy library, why load it for
# the other 90%? Lazy loading defers the import until first use.
# This is especially valuable for ML libraries like pandas (~200ms import).
# -----------------------------------------------------------------
_heavy_library = None

def get_heavy_library():
    global _heavy_library
    if _heavy_library is None:
        import heavy_library  # Only imported on first call
        _heavy_library = heavy_library
    return _heavy_library

def lambda_handler(event, context):
    # !! BAD: Creating clients inside handler.
    # dynamodb = boto3.resource('dynamodb')
    # This forces a new HTTP connection + TLS handshake on EVERY invocation.
    # On warm starts, you are throwing away the existing connection pool.

    # GOOD: boto3 clients created outside the handler automatically
    # reuse TCP connections via urllib3 connection pooling.

    # STRATEGY 3: Only import when needed.
    if event.get('needs_heavy_processing'):
        lib = get_heavy_library()
        result = lib.process(event)

    return {'statusCode': 200}
The memory-CPU trick that most people miss: Lambda allocates CPU proportionally to memory. At 128 MB, you get a sliver of one vCPU. At 1769 MB, you get exactly 1 full vCPU. At 10 GB, you get 6 vCPUs. This means a function that is CPU-bound (not I/O-bound) may actually cost LESS at higher memory because it finishes faster. A function that takes 3 seconds at 128 MB (0.0000063)mighttake0.4secondsat1024MB(0.0000063) might take 0.4 seconds at 1024 MB (0.0000067) — nearly the same cost but 7x faster. Use AWS Lambda Power Tuning (an open-source tool) to find the sweet spot.

Provisioned Concurrency

Provisioned Concurrency (PC) is Lambda’s answer to “I cannot tolerate cold starts.” It pre-initializes execution environments so they are always warm and ready. Think of it as keeping a fleet of taxis idling at the curb — you pay for the idling, but the moment a passenger arrives, the ride starts instantly.
┌────────────────────────────────────────────────────────────────────────┐
│                    Provisioned Concurrency                              │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                                                                  │  │
│   │   Pre-warmed Execution Environments                             │  │
│   │   ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐                 │  │
│   │   │ Warm │ │ Warm │ │ Warm │ │ Warm │ │ Warm │  = 5 PC         │  │
│   │   │  #1  │ │  #2  │ │  #3  │ │  #4  │ │  #5  │                 │  │
│   │   └──────┘ └──────┘ └──────┘ └──────┘ └──────┘                 │  │
│   │       │        │        │        │        │                     │  │
│   │       └────────┴────────┴────────┴────────┘                     │  │
│   │                        │                                         │  │
│   │              Always ready to handle requests                     │  │
│   │              (no cold starts)                                    │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│   Pricing (us-east-1):                                                  │
│   • $0.000004463 per GB-second (provisioned -- charged even when idle) │
│   • + $0.0000097 per GB-second (invocation -- same as on-demand)       │
│   • Example: 1 GB, 100 PC, 24/7 = ~$350/month JUST for keeping warm   │
│   • That same $350 buys a t3.medium running 24/7 on EC2                │
│   • Decision: if PC cost > equivalent Fargate task, use Fargate        │
│                                                                         │
│   Use Cases:                                                            │
│   • Latency-critical APIs (payment processing, auth endpoints)         │
│   • Predictable traffic (schedule PC up/down with Auto Scaling)        │
│   • Functions with heavy initialization (Java, .NET, ML models)        │
│   • NOT for functions called a few times per day (wasteful)            │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
Cost trap with Provisioned Concurrency: PC charges you for every second the environments exist, whether they handle requests or not. If you configure 50 PC at 1 GB and your function only gets traffic 8 hours/day, you are paying for 16 hours of idle warm containers daily. Use Application Auto Scaling with a scheduled action to ramp PC up during business hours and down to zero at night. This alone can cut PC costs by 60-70%.

VPC Configuration

Putting Lambda in a VPC is one of the most consequential decisions you will make, and one of the most commonly over-applied. The default rule should be: do NOT put Lambda in a VPC unless you must access private resources like RDS, ElastiCache, or internal EC2 instances. Lambda functions outside a VPC can already reach all public AWS services (DynamoDB, S3, SQS) without any VPC overhead. The VPC adds networking complexity, potential NAT Gateway costs ($32/month + data processing charges per AZ), and historically added significant cold start latency (though AWS improved this dramatically in 2019 with Hyperplane ENI sharing). Azure comparison: Azure Functions Premium plan always runs in a VNet (virtual network). GCP Cloud Functions use Serverless VPC Access connectors. All three clouds face the same fundamental tension: serverless wants to be stateless and isolated, but databases live in private networks.
┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda in VPC                                        │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   When to Use VPC:                                                      │
│   ─────────────────                                                    │
│   ✓ Access RDS in private subnet                                       │
│   ✓ Access ElastiCache                                                 │
│   ✓ Access EC2 instances                                               │
│   ✓ Compliance requirements                                            │
│   ✗ Don't use VPC if not needed (adds latency)                         │
│                                                                         │
│   Architecture:                                                         │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  VPC                                                             │  │
│   │  ┌───────────────────────┐  ┌───────────────────────┐           │  │
│   │  │  Private Subnet (AZ1) │  │  Private Subnet (AZ2) │           │  │
│   │  │  ┌─────────────────┐  │  │  ┌─────────────────┐  │           │  │
│   │  │  │  Lambda ENI     │  │  │  │  Lambda ENI     │  │           │  │
│   │  │  └────────┬────────┘  │  │  └────────┬────────┘  │           │  │
│   │  │           │           │  │           │           │           │  │
│   │  │  ┌────────▼────────┐  │  │  ┌────────▼────────┐  │           │  │
│   │  │  │     RDS         │  │  │  │  ElastiCache    │  │           │  │
│   │  │  └─────────────────┘  │  │  └─────────────────┘  │           │  │
│   │  └───────────────────────┘  └───────────────────────┘           │  │
│   │                                                                  │  │
│   │  To access internet (S3, DynamoDB, external APIs):               │  │
│   │  ┌───────────────────┐                                          │  │
│   │  │  NAT Gateway      │ ──► Internet Gateway ──► Internet        │  │
│   │  └───────────────────┘                                          │  │
│   │  OR                                                              │  │
│   │  ┌───────────────────┐                                          │  │
│   │  │  VPC Endpoints    │ ──► S3, DynamoDB (private, no NAT)       │  │
│   │  └───────────────────┘                                          │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

VPC Configuration Best Practices

# Lambda VPC configuration (Terraform)
# IMPORTANT: Lambda in VPC requires the function's IAM role to have
# ec2:CreateNetworkInterface, ec2:DescribeNetworkInterfaces,
# ec2:DeleteNetworkInterface permissions. Without these, deployment
# succeeds but invocations fail with a cryptic "network interface" error.

resource "aws_lambda_function" "vpc_lambda" {
  function_name = "my-vpc-function"
  runtime       = "python3.11"
  handler       = "handler.lambda_handler"

  vpc_config {
    # Use at least 2 subnets in different AZs for high availability.
    # If AZ-a goes down and your Lambda is only in AZ-a, it is unreachable.
    # Lambda creates Hyperplane ENIs in these subnets -- since 2019, ENIs
    # are shared across functions, so cold start penalty is ~200ms, not 10s+.
    subnet_ids         = [aws_subnet.private_a.id, aws_subnet.private_b.id]
    security_group_ids = [aws_security_group.lambda_sg.id]
  }

  role = aws_iam_role.lambda_vpc_role.arn
}

# Security group for Lambda.
# Lambda functions act as network clients, not servers -- they make
# outbound connections to databases, APIs, etc. You almost never need
# inbound rules. The egress rule below allows all outbound traffic.
resource "aws_security_group" "lambda_sg" {
  name        = "lambda-sg"
  vpc_id      = aws_vpc.main.id

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  # No ingress rules needed -- Lambda initiates connections, not receives them.
}

# VPC Endpoint for DynamoDB -- this is a Gateway endpoint (FREE).
# Without this, a VPC-attached Lambda must route DynamoDB traffic through
# your NAT Gateway, costing $0.045/GB in data processing fees.
# At 100 GB/month of DynamoDB traffic, that is $4.50/month wasted.
# S3 and DynamoDB Gateway endpoints cost $0. Always create them.
resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = aws_vpc.main.id
  service_name      = "com.amazonaws.us-east-1.dynamodb"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [aws_route_table.private.id]
}

# For other AWS services (SQS, SNS, Secrets Manager), use Interface
# endpoints. These cost $0.01/hour/AZ (~$7.20/month per endpoint per AZ).
# Cheaper than NAT Gateway if you only need a few services.
resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = aws_vpc.main.id
  service_name        = "com.amazonaws.us-east-1.secretsmanager"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = [aws_subnet.private_a.id, aws_subnet.private_b.id]
  security_group_ids  = [aws_security_group.vpc_endpoint_sg.id]
  private_dns_enabled = true  # Allows SDK calls to resolve to the endpoint
}
Cost tip — NAT Gateway vs VPC Endpoints: A NAT Gateway costs 32/monthperAZplus32/month per AZ plus 0.045/GB of data processed. If your Lambda in VPC only needs to reach AWS services (not the public internet), replace the NAT Gateway with VPC Endpoints for each service. For a typical setup accessing DynamoDB, S3, SQS, and Secrets Manager, you save $25+/month and eliminate the NAT Gateway as a single point of failure.

Concurrency and Scaling

Lambda scaling is fundamentally different from EC2 or container scaling. With EC2, you scale by adding more machines (horizontal) or bigger machines (vertical). With Lambda, each concurrent request gets its own isolated execution environment. There is no load balancer — AWS handles request routing internally. The scaling model is beautifully simple in theory, but the account-level concurrency limit creates real production risks that catch teams off guard.
┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda Concurrency                                   │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Concurrent Executions = Invocations/sec × Duration (sec)             │
│                                                                         │
│   Example: 100 requests/sec × 0.5 sec duration = 50 concurrent         │
│                                                                         │
│   Account Limits:                                                       │
│   ───────────────                                                       │
│   • Default: 1,000 concurrent executions (per region)                  │
│   • Can request increase to 10,000+                                    │
│   • Burst limit: 500-3,000 (varies by region)                          │
│                                                                         │
│   Concurrency Types:                                                    │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                                                                  │  │
│   │  Unreserved (default)     Reserved          Provisioned         │  │
│   │  ─────────────────────    ─────────         ────────────        │  │
│   │  Shared pool for all      Guaranteed        Pre-initialized     │  │
│   │  functions in account     minimum for       (no cold starts)    │  │
│   │                           this function                          │  │
│   │                                                                  │  │
│   │  Account Limit: 1000                                            │  │
│   │  ┌────────────────────────────────────────────────────────────┐ │  │
│   │  │████████████████│████████│██████████████│                   │ │  │
│   │  │   Unreserved   │ Func A │    Func B    │  Not allocated    │ │  │
│   │  │     (600)      │ (100)  │    (200)     │      (100)        │ │  │
│   │  └────────────────────────────────────────────────────────────┘ │  │
│   │                                                                  │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Reserved Concurrency

Reserved concurrency is both a guarantee and a limit. It guarantees that this function will always have N execution environments available (even if other functions in the account are maxed out), but it also caps this function at N concurrent executions (even if the account has spare capacity). This dual nature is counterintuitive and a frequent source of production incidents.
# Reserved concurrency: guarantee a minimum AND enforce a maximum
# for a specific function. This is a double-edged sword.
import boto3

lambda_client = boto3.client('lambda')

# Reserve 100 concurrent executions for the payment processor.
# This means: even if your account's other functions consume 900 of
# the 1000 account limit, this function still gets its 100.
# BUT: this function can NEVER exceed 100, even if the account is idle.
lambda_client.put_function_concurrency(
    FunctionName='critical-payment-processor',
    ReservedConcurrentExecutions=100
)

# Setting to 0 = emergency kill switch. Throttles ALL invocations.
# Useful during incidents: "this function is causing cascading failures,
# shut it off NOW" without needing to delete or redeploy.
lambda_client.put_function_concurrency(
    FunctionName='temporarily-disabled-function',
    ReservedConcurrentExecutions=0
)
The concurrency math that bites teams: Your account has a 1000 limit. You set 200 reserved for Function A and 300 reserved for Function B. That leaves only 500 unreserved for ALL other functions in the account and region. If you have 50 other functions sharing that pool during a traffic spike, they will get throttled. Always leave at least 100 unreserved as a buffer, and request a limit increase from AWS before you need it (approval takes 1-3 business days).

Error Handling and Retries

Error handling in Lambda is not just about try/catch — it is about understanding that different invocation types have fundamentally different retry behaviors. If you design your error handling for synchronous invocations and then switch to asynchronous, your function will silently retry and potentially process events multiple times. A senior engineer would say: “Every Lambda function must be idempotent because you cannot guarantee exactly-once delivery for any invocation type.”
┌────────────────────────────────────────────────────────────────────────┐
│                    Lambda Error Handling                                │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   SYNCHRONOUS INVOCATIONS (API Gateway, ALB):                          │
│   ─────────────────────────────────────────                            │
│   • No automatic retries                                               │
│   • Errors returned to caller                                          │
│   • Implement retry logic in client                                    │
│                                                                         │
│   ASYNCHRONOUS INVOCATIONS (S3, SNS, EventBridge):                     │
│   ──────────────────────────────────────────────                       │
│   • 2 automatic retries (total 3 attempts)                             │
│   • Exponential backoff between retries                                │
│   • Configure DLQ or On-Failure destination                            │
│                                                                         │
│   EVENT SOURCE MAPPINGS (SQS, DynamoDB, Kinesis):                      │
│   ──────────────────────────────────────────────                       │
│   • Retry until record expires or succeeds                             │
│   • Can configure max age, retry attempts                              │
│   • Bisect batch on failure (find problematic record)                  │
│                                                                         │
│   Destinations (async only):                                            │
│   ┌───────────────────┐    Success    ┌───────────────────┐            │
│   │  Lambda Function  │──────────────►│  SQS/SNS/Lambda   │            │
│   │                   │               │  EventBridge      │            │
│   │                   │    Failure    ┌───────────────────┐            │
│   │                   │──────────────►│  SQS/SNS/Lambda   │            │
│   └───────────────────┘               │  EventBridge      │            │
│                                       └───────────────────┘            │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
import json
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.idempotency import (
    IdempotencyConfig, DynamoDBPersistenceLayer, idempotent
)

logger = Logger()

# Idempotency uses DynamoDB to track which events have already been processed.
# When an event arrives, Powertools hashes the event (or a subset of it),
# checks if that hash exists in DynamoDB, and either:
# (a) returns the cached result if already processed, or
# (b) processes and stores the result for future lookups.
# This prevents the classic "customer charged twice" bug from retries.
persistence_layer = DynamoDBPersistenceLayer(table_name="IdempotencyTable")
config = IdempotencyConfig(
    # JMESPath expression: use the request body as the idempotency key.
    # For payment processing, you might use "body.transaction_id" instead
    # to be more specific -- using the entire body means ANY field change
    # (even a timestamp) creates a new idempotency key.
    event_key_jmespath="body",
    # How long to remember processed events. 1 hour is typical for APIs.
    # Too short: retries after expiry will reprocess. Too long: DynamoDB
    # table grows and costs more. Match this to your SQS visibility timeout.
    expires_after_seconds=3600
)

@idempotent(config=config, persistence_store=persistence_layer)
def process_payment(event: dict):
    """
    Idempotent payment processing.
    If this exact event was already processed (within 1 hour),
    the cached result is returned WITHOUT re-executing this function.
    Safe to retry -- will not charge customer twice.
    """
    body = json.loads(event['body'])

    # Process payment...
    return {"status": "success", "transaction_id": "tx-123"}

def lambda_handler(event, context):
    try:
        result = process_payment(event)
        return {
            'statusCode': 200,
            'body': json.dumps(result)
        }
    except Exception as e:
        # logger.exception() captures the full stack trace in structured JSON.
        # In CloudWatch Logs Insights you can query:
        # fields @timestamp, @message | filter level = "ERROR"
        logger.exception("Payment processing failed")
        # Re-raise for async invocations: Lambda will retry and eventually
        # send to DLQ. For sync invocations (API Gateway), this returns 500.
        raise
The idempotency DynamoDB table costs: The idempotency table uses on-demand billing by default. For a function processing 1 million events/month with 1-hour TTL, expect roughly $1.25/month in DynamoDB costs (1M writes + occasional reads). Enable TTL on the table so expired records are automatically deleted — otherwise the table grows indefinitely and your storage costs creep up.

Best Practices

Keep Functions Focused

Single responsibility — one function per task. A function that processes orders AND sends emails is two functions pretending to be one. When the email service goes down, your order processing stops too.

Minimize Package Size

Only include necessary dependencies. Every MB added to your deployment package adds ~1-2ms to cold start. A 50 MB package with unused libraries is paying a tax on every cold start.

Use Environment Variables

Store configuration in env vars, secrets in Secrets Manager. NEVER put database passwords in environment variables — they are visible in the Lambda console and CloudTrail logs.

Implement Idempotency

Every Lambda function can be retried. If your function charges a credit card, creates a database record, or sends an email, it MUST be idempotent. Use Powertools idempotency or design your own deduplication.

Set Realistic Timeouts

Default 3s is almost always too short for production. Set timeout to your p99 latency plus a 50% buffer. A function that usually takes 2s but occasionally takes 8s should have a 12s timeout. But never set it to 15 minutes “just in case” — that wastes money on hung connections.

Monitor Everything

Enable X-Ray tracing, set CloudWatch alarms on Errors, Throttles, and Duration metrics. The three metrics that matter most: error rate, p99 latency, and concurrent executions approaching the account limit.

Production Checklist

production_checklist = {
    "code": [
        "Handler is focused and small (< 100 lines of business logic)",
        "Dependencies minimized -- no unused imports in deployment package",
        "SDK clients and DB connections initialized OUTSIDE handler",
        "Structured logging with correlation IDs (use Powertools Logger)",
        "All mutating operations are idempotent (especially payments, writes)",
        "Graceful handling of context.get_remaining_time_in_millis() < threshold",
    ],
    "configuration": [
        "Memory right-sized using Lambda Power Tuning (not guessed)",
        "Timeout = p99 latency + 50% buffer (never 15 min 'just in case')",
        "Environment variables for config (stage, region, table names)",
        "Secrets Manager for secrets (NOT env vars -- they leak to CloudTrail)",
        "X-Ray tracing enabled (set TRACING_MODE=Active, not PassThrough)",
        "ARM64 architecture selected where possible (20% cheaper, often faster)",
    ],
    "reliability": [
        "DLQ or on-failure destination configured for async invocations",
        "Reserved concurrency set for critical functions (payment, auth)",
        "Provisioned concurrency for latency-sensitive paths with schedule",
        "Circuit breaker pattern for downstream service failures",
        "Partial batch response enabled for SQS event source mappings",
    ],
    "security": [
        "Least-privilege IAM role (not arn:aws:iam::*:policy/*)",
        "VPC only when accessing private resources (RDS, ElastiCache)",
        "No hardcoded credentials, tokens, or API keys anywhere in code",
        "Input validation on every event field before processing",
        "Function URL auth set to AWS_IAM (never NONE in production)",
    ],
    "cost": [
        "Memory right-sized -- over-provisioning 512 MB vs optimal 256 MB doubles cost",
        "VPC Endpoints instead of NAT Gateway where possible (saves $32+/month/AZ)",
        "ARM (Graviton2) architecture for 20% cost reduction",
        "CloudWatch alarm on invocation count spikes (catch runaway recursion)",
        "Review Lambda cost in Cost Explorer monthly -- look for unused functions",
    ]
}
The runaway recursion trap: A Lambda function that writes to an S3 bucket, triggered by S3 events on that same bucket, creates an infinite loop. Each invocation triggers another invocation. AWS will not stop this automatically — it will scale to your concurrency limit and keep running until you notice. One team ran up a $12,000 bill in 3 hours this way. Always ensure your trigger source and output destination are different, or add explicit deduplication logic.

🎯 Interview Questions

Optimization strategies:
  1. Code level:
    • Minimize deployment package size
    • Lazy load heavy dependencies
    • Initialize SDK clients outside handler
  2. Configuration:
    • Increase memory (faster CPU = faster init)
    • Use compiled languages carefully
    • Avoid VPC unless necessary
  3. Provisioned Concurrency:
    • Pre-warm execution environments
    • Eliminate cold starts for critical paths
    • Use scheduled scaling for traffic patterns
  4. Architecture:
    • Keep functions warm with scheduled pings (anti-pattern, prefer PC)
    • Use Lambda SnapStart for Java
Scaling behavior:
  • Lambda scales by creating more execution environments
  • Burst: 500-3,000 concurrent executions immediately
  • After burst: 500 additional per minute
Formula: Concurrent Executions = (invocations/sec) × (avg duration in sec)Limits:
  • Account limit: 1,000 default (can increase)
  • Function reserved concurrency: up to account limit
  • Provisioned concurrency: pre-warmed instances
Lambda:
  • Short-lived, event-driven workloads
  • Unpredictable/spiky traffic
  • < 15 minutes execution
  • No server management needed
ECS (Fargate):
  • Long-running containers
  • Microservices architecture
  • Consistent traffic patterns
  • Need more control than Lambda
EC2:
  • Maximum control/customization
  • Specialized hardware needs
  • Persistent workloads
  • Legacy applications
Best practices:
# Use Secrets Manager (cached with Powertools)
from aws_lambda_powertools.utilities.parameters import get_secret

# Cached for 5 minutes by default
db_password = get_secret("prod/db/password")

# Or use Parameter Store for non-sensitive config
from aws_lambda_powertools.utilities.parameters import get_parameter

api_endpoint = get_parameter("/myapp/api/endpoint")
Never:
  • Hardcode secrets in code
  • Store secrets in environment variables
  • Log secrets
Lambda@Edge:
  • Runs at CloudFront edge locations
  • Lower latency (closer to users)
  • Limited: 128MB memory, 5s timeout (viewer), 30s (origin)
  • Use cases: URL rewrite, A/B testing, auth, headers
Regular Lambda:
  • Runs in a single region
  • Full capabilities: 10GB memory, 15min timeout
  • More triggers and integrations
CloudFront Functions:
  • Even faster, cheaper than Lambda@Edge
  • 2MB code, 1ms timeout
  • Simple header manipulation only

🧪 Hands-On Lab

1

Create Basic Lambda

Create a Python Lambda function with API Gateway trigger
2

Add Dependencies with Layers

Create a layer with requests and boto3, attach to function
3

Configure VPC Access

Set up Lambda in VPC to access RDS, configure VPC endpoints
4

Implement Error Handling

Add DLQ, structured logging, and X-Ray tracing
5

Optimize Performance

Configure provisioned concurrency and measure cold start impact

Interview Deep-Dive

Strong Answer:
  • First, understand the math. At 500 req/s with an average duration of 200ms, we need 100 concurrent executions (500 x 0.2). At 5,000 req/s, we need 1,000 concurrent executions. The default account limit is 1,000, and our burst limit (depending on region) is 500-3,000 immediate concurrent environments. So we are hitting the account concurrency ceiling during the spike.
  • Immediate fix — request a concurrency limit increase. This is a soft limit. I would request an increase to 5,000-10,000 via the AWS Service Quotas console. Approval typically takes 1-3 business days, so this is not an in-the-moment fix.
  • Short-term mitigation — add an SQS queue in front of Lambda. Instead of API Gateway invoking Lambda directly (synchronous, throttled requests return 429 to the customer), route orders through SQS. API Gateway writes to SQS (which has virtually unlimited throughput), and Lambda polls SQS with a batch size of 10. This decouples ingestion from processing. The customer gets a 202 Accepted immediately, and orders are never lost because SQS retains messages for up to 14 days. The trade-off: orders are now processed asynchronously, so the response cannot include the order confirmation — you need a notification mechanism (WebSocket, polling, or email).
  • Set reserved concurrency on the order processor. Reserve 800 of the 1,000 account limit for this function so other non-critical functions (analytics, logging) cannot steal its capacity during the spike. This guarantees capacity but caps other functions at 200.
  • For future flash sales, use Provisioned Concurrency with scheduled scaling. If the sale starts at 9 AM, schedule PC to ramp to 500 at 8:45 AM and back to 0 at 11 PM. This eliminates cold starts during the critical window. At 512 MB memory, 500 PC for 14 hours costs roughly $11 for that day — cheap insurance for a flash sale.
  • Long-term architecture — evaluate whether Lambda is the right tool. At sustained 5,000 req/s, Lambda costs roughly 8,600/month(5,000x86,400secondsx0.2sx0.5GBx8,600/month (5,000 x 86,400 seconds x 0.2s x 0.5 GB x 0.0000166667/GB-s). The same workload on Fargate with 10 tasks at 1 vCPU/2 GB costs around $1,200/month. If flash sales happen weekly, Fargate is dramatically cheaper for the steady-state, and Lambda handles the overflow via a hybrid architecture.
Follow-up: Orders are no longer lost, but customers complain that some orders were processed twice during the spike. What happened and how do you fix it?During the throttling window, SQS messages that were delivered to Lambda but failed (due to the function hitting a downstream timeout or partial failure) became visible again after the visibility timeout expired. Lambda reprocessed them, causing duplicates. The fix is two-fold: (1) implement idempotency using Powertools with a DynamoDB-backed persistence layer keyed on order_id, so reprocessing the same order is a no-op, and (2) enable ReportBatchItemFailures on the SQS event source mapping so only the individual failed messages retry, not the entire batch.What impresses interviewers: Showing the cost math (concurrent executions formula, Lambda vs Fargate crossover), knowing the SQS buffering pattern, and immediately addressing idempotency without being prompted. Candidates who only say “increase the limit” miss the architectural thinking interviewers are looking for.
Strong Answer:
  • Do not migrate everything at once. Start with the strangler fig pattern: put an ALB in front of both the old EC2 monolith and the new Lambda functions, then migrate endpoints one at a time. Route /api/v1/orders to Lambda while /api/v1/reports still hits EC2. This lets you validate each endpoint in production with real traffic before cutting over.
  • The 30-second report endpoints cannot go to Lambda behind API Gateway. API Gateway has a hard 29-second timeout that cannot be increased. For these endpoints, you have three options: (1) move the report generation to an asynchronous pattern — the API returns a 202 Accepted with a report_id, a Step Functions workflow generates the report, and the client polls or receives a webhook when done; (2) use Lambda Function URLs instead of API Gateway (no 29-second limit, just the Lambda 15-minute limit); (3) keep those endpoints on ECS/Fargate. Option 1 is the cleanest architecture. Option 2 is the quickest migration path but sacrifices API Gateway features like throttling, caching, and WAF integration.
  • RDS connection pooling is the critical challenge. Each Lambda execution environment opens its own database connection. At 100 concurrent Lambda invocations, you have 100 connections to PostgreSQL. During a traffic spike to 1,000 concurrent, you saturate the RDS max_connections limit (typically ~800 for a db.r5.large). Use RDS Proxy ($0.015/vCPU-hour) to pool connections — Lambda connects to the proxy, which maintains a stable pool of 50-100 actual database connections. Without RDS Proxy, this migration will cause cascading database failures.
  • Cold starts matter for a customer-facing API. Python/Node.js functions with RDS Proxy in a VPC will add 200-500ms on cold starts. For the most latency-sensitive endpoints (login, checkout), use Provisioned Concurrency. For less critical endpoints (profile updates, settings), let cold starts happen — users will not notice an occasional extra 300ms.
  • Migrate in this order: stateless read endpoints first (low risk, easy to validate), then stateless write endpoints (need idempotency), then stateful/complex endpoints last. Keep the reporting endpoints on Fargate permanently — Lambda is the wrong tool for 30-second synchronous operations.
Follow-up: Three months after migration, your RDS Proxy bill is $400/month and your DBA says database connections are still spiking. What is going wrong?RDS Proxy pools connections per Lambda function, not globally. If you have 40 Lambda functions each configured to use the proxy, and each function’s pool allows 10 connections, you have a theoretical maximum of 400 connections — still dangerously high. The fix: consolidate related endpoints into fewer Lambda functions (e.g., one function for all order-related endpoints using Powertools APIGatewayRestResolver with route decorators), reducing the number of distinct connection pools. Also review the proxy’s MaxConnectionsPercent setting — the default of 100% means the proxy will use up to the database’s full max_connections, which defeats the purpose if multiple proxies or pools exist.What impresses interviewers: Knowing the API Gateway 29-second hard limit (not configurable), immediately identifying RDS connection exhaustion as the biggest risk, and proposing the strangler fig pattern instead of a big-bang migration. Mentioning RDS Proxy and its per-function pooling behavior shows production experience.
Strong Answer:
  • Silent failures in S3-triggered Lambda usually mean the function was never invoked. S3 event notifications are asynchronous. If the notification itself fails to deliver to Lambda, there is no DLQ because the failure happens before Lambda is involved. First, check: is the S3 event notification configuration correct? In the S3 bucket properties, verify the event type (s3:ObjectCreated:*), the prefix/suffix filters, and the Lambda function ARN. A common mistake is setting a prefix filter of uploads/ but the files are uploaded to upload/ (no trailing ‘s’).
  • Check for Lambda permission issues. S3 needs permission to invoke your Lambda function. If the resource-based policy on the Lambda function does not include s3:InvokeFunction from the specific bucket ARN, the invocation silently fails. Run aws lambda get-policy --function-name my-function and verify the S3 principal and bucket ARN are listed. If the policy was set up via the console but the bucket was recreated (different ARN), the policy is stale.
  • Check for filename encoding issues. S3 event notifications URL-encode the object key. A file named my photo (1).jpg arrives as my+photo+%281%29.jpg. If your function does not call urllib.parse.unquote_plus() on the key, the S3 GetObject call fails with a NoSuchKey error. But here is the subtle part: if your error handling catches the exception and returns successfully (no re-raise), Lambda considers the invocation successful. No DLQ, no retry, no error metric. The image is silently skipped.
  • Check for concurrency throttling. If the function hit the account concurrency limit, S3 async invocations retry with backoff for up to 6 hours. If the function is still throttled after 6 hours, the event is dropped. Check the Throttles CloudWatch metric for the function during the time window when images failed.
  • Check for event notification delivery limits. S3 event notifications can silently fail if the destination Lambda function does not exist, the region is wrong, or the function is in a failed state. S3 does not log these failures anywhere obvious — you need CloudTrail with data events enabled on the bucket (which costs $0.10 per 100,000 events) to see the notification delivery attempts.
  • The fix: add a reconciliation process. Never rely solely on event-driven processing for critical workloads. Run a scheduled Lambda (every 15 minutes) that lists objects in the source bucket, compares against the thumbnails bucket, and reprocesses any missing items. This catch-all pattern costs almost nothing and prevents silent data loss.
Follow-up: You add the reconciliation process and discover that 2% of images consistently fail thumbnailing. The files are valid JPEGs. What could cause this?The most likely cause is the Lambda function running out of memory or timing out on large images. A 20 MB JPEG decompressed into memory for thumbnailing can consume 200-500 MB of RAM. If the function is configured with 256 MB memory, it gets OOM-killed by the Lambda runtime. This shows as a Runtime.ExitError or no log at all if the kill happens before the error handler runs. Check the MaxMemoryUsed metric in CloudWatch — if it equals MemorySize, the function is being OOM-killed. Increase memory to 1024 MB or 2048 MB and add a file size check at the start of the handler to reject images above a safe threshold.What impresses interviewers: Systematically working through the failure modes from “event never arrived” to “event arrived but processing failed silently.” Knowing about S3 key URL-encoding, the 6-hour retry window, and the reconciliation pattern shows you have debugged real S3-Lambda pipelines. The OOM-kill scenario (no logs, no error) is a classic production gotcha that only people with hands-on experience know about.
Strong Answer:
  • Use DynamoDB Global Tables for multi-region reads. Global Tables replicate data across regions with typically sub-second replication lag. Each region’s Lambda reads from its local replica, achieving single-digit-millisecond DynamoDB read latency. This is the critical enabler for sub-100ms global latency. The trade-off: Global Tables use eventual consistency for cross-region replication, so a write in us-east-1 may not be visible in eu-west-1 for 200-1000ms. If your API needs read-your-writes consistency, you must either route that user’s subsequent reads to the same region (using latency-based routing with session affinity) or accept the consistency window.
  • S3 replication for multi-region writes. Use S3 Cross-Region Replication (CRR) to replicate objects written in one region to others. CRR typically completes within 15 minutes for most objects. If the application needs to read the object immediately after writing, the reading Lambda must read from the same region that wrote it, or use S3 Transfer Acceleration for faster cross-region access.
  • Deploy with a multi-region CI/CD pipeline. Use AWS CodePipeline or a tool like Serverless Framework with multi-region deployment stages. Deploy to a canary region first (e.g., ap-southeast-1 with lowest traffic), validate health metrics for 10 minutes, then deploy to the remaining regions. A bad deployment to all 3 regions simultaneously is a global outage.
  • Route 53 latency-based routing for global distribution. Create latency-based DNS records pointing to each region’s API Gateway. Users are automatically routed to the nearest healthy region. Add health checks that test the full stack (API Gateway through Lambda through DynamoDB) so Route 53 can failover if one region degrades.
  • The hidden cost trap: 3x everything. Three regions means 3x Lambda invocations (billed per region), 3x DynamoDB Global Tables (write capacity is charged in every replica region), 3x NAT Gateways if using VPC, 3x API Gateway. A setup that costs 2,000/monthinoneregionmaycost2,000/month in one region may cost 7,000-8,000/month in three regions after accounting for replication traffic. Model the full cost before committing.
  • Conflict resolution is the hardest problem. With DynamoDB Global Tables, if two users update the same item in different regions simultaneously, the “last writer wins” policy applies based on the timestamp. For most applications (user profiles, preferences), this is fine. For financial transactions or inventory counts, this is dangerous. For those workloads, designate one region as the primary writer and use the others as read replicas only.
Follow-up: During a regional outage in us-east-1, your failover to eu-west-1 works, but users report stale data. What is happening and what is your communication strategy?DynamoDB Global Tables replication is asynchronous. When us-east-1 went down, any writes that were in-flight (accepted by us-east-1 but not yet replicated) are temporarily lost until us-east-1 recovers. Users routed to eu-west-1 see data as of the last successful replication point — potentially 1-5 seconds behind. The communication strategy: (1) immediately acknowledge the regional degradation in the status page, (2) clearly state “data from the last few seconds may be delayed” rather than hiding the inconsistency, (3) when us-east-1 recovers, Global Tables automatically reconcile using last-writer-wins, so the “lost” writes will appear — communicate this recovery timeline. The architectural lesson: for data that absolutely cannot tolerate this window (payment confirmations), use synchronous cross-region writes via DynamoDB Transactions with a custom replication layer, accepting higher write latency (200-300ms) for stronger consistency.What impresses interviewers: Articulating the eventual consistency trade-off with specific numbers (replication lag in milliseconds, not vague “might be stale”). Knowing the last-writer-wins conflict resolution model and when it breaks down. Calling out the 3x cost multiplier shows you have actually operated multi-region architectures, not just read about them. Proposing a graceful degradation communication strategy rather than pretending the data is always consistent demonstrates operational maturity.

Next Module

AWS Step Functions

Orchestrate serverless workflows with Step Functions