Module Overview
Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Lambda
- State machine concepts and design
- State types (Task, Choice, Parallel, Map)
- Error handling and retries
- Standard vs Express workflows
- Service integrations
- Workflow patterns for common use cases
Why Step Functions?
Visual Workflows
Design and visualize complex business processes as state machines
Built-in Error Handling
Automatic retries, catch blocks, and compensation logic
200+ Integrations
Native integration with Lambda, DynamoDB, SQS, SNS, and more
Audit Trail
Complete execution history for debugging and compliance
State Machine Concepts
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ State Machine Components │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ STATE MACHINE │
│ ───────────── │
│ • Collection of states that define workflow │
│ • Starts at StartAt state, ends at End state │
│ • Executes synchronously or asynchronously │
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────┐ │ │
│ │ │ START │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ Validate │ ──── Task State (Lambda) │ │
│ │ │ Input │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Valid? │────►│ Reject │ ──── Choice State │ │
│ │ │ (yes) │ no │ │ │ │
│ │ └────┬─────┘ └──────────┘ │ │
│ │ │ yes │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ Process │ ──── Task State (DynamoDB) │ │
│ │ │ Order │ │ │
│ │ └────┬─────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────┐ │ │
│ │ │ END │ │ │
│ │ └──────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
State Types
Task State
Performs work by invoking an AWS service or activity.Copy
{
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
"InputPath": "$.order",
"ResultPath": "$.validation",
"OutputPath": "$",
"TimeoutSeconds": 30,
"Next": "CheckValidation"
}
}
Choice State
Branching logic based on input.Copy
{
"CheckValidation": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.validation.isValid",
"BooleanEquals": true,
"Next": "ProcessPayment"
},
{
"Variable": "$.validation.errorCode",
"StringEquals": "INSUFFICIENT_STOCK",
"Next": "NotifyOutOfStock"
}
],
"Default": "RejectOrder"
}
}
Parallel State
Execute multiple branches simultaneously.Copy
{
"ProcessInParallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "SendEmail",
"States": {
"SendEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:send-email",
"End": true
}
}
},
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:update-inventory",
"End": true
}
}
}
],
"Next": "FinalizeOrder"
}
}
Map State
Iterate over an array and process each item.Copy
{
"ProcessLineItems": {
"Type": "Map",
"InputPath": "$.order.items",
"ItemsPath": "$",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-item",
"End": true
}
}
},
"ResultPath": "$.processedItems",
"Next": "CalculateTotal"
}
}
Wait State
Pause execution for a specified time.Copy
{
"WaitForDelivery": {
"Type": "Wait",
"Seconds": 3600,
"Next": "CheckDeliveryStatus"
},
"WaitUntilShipDate": {
"Type": "Wait",
"TimestampPath": "$.order.shipDate",
"Next": "StartShipping"
}
}
Other States
Copy
{
"PassThrough": {
"Type": "Pass",
"Result": {"status": "processed"},
"ResultPath": "$.result",
"Next": "NextState"
},
"OrderFailed": {
"Type": "Fail",
"Cause": "Order processing failed",
"Error": "OrderError"
},
"OrderComplete": {
"Type": "Succeed"
}
}
Input/Output Processing
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Data Flow Through States │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ State Input (raw) │
│ { │
│ "order": {"id": "123", "items": [...], "total": 99.99}, │
│ "customer": {"id": "C001", "email": "..."} │
│ } │
│ │ │
│ │ InputPath: "$.order" │
│ ▼ │
│ Task Input │
│ {"id": "123", "items": [...], "total": 99.99} │
│ │ │
│ │ Lambda executes │
│ ▼ │
│ Task Result │
│ {"validation": "success", "discount": 10.00} │
│ │ │
│ │ ResultPath: "$.orderValidation" │
│ ▼ │
│ State with Result │
│ { │
│ "order": {"id": "123", "items": [...], "total": 99.99}, │
│ "customer": {"id": "C001", "email": "..."}, │
│ "orderValidation": {"validation": "success", "discount": 10.00} │
│ } │
│ │ │
│ │ OutputPath: "$" │
│ ▼ │
│ State Output (passed to next state) │
│ (same as above) │
│ │
│ Path Reference: │
│ InputPath: What to send to task │
│ ResultPath: Where to put task result (null = discard) │
│ OutputPath: What to pass to next state │
│ │
└────────────────────────────────────────────────────────────────────────┘
Intrinsic Functions
Copy
{
"TransformData": {
"Type": "Pass",
"Parameters": {
"orderId.$": "States.UUID()",
"orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
"itemCount.$": "States.ArrayLength($.items)",
"fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
"items.$": "States.ArrayPartition($.allItems, 10)",
"jsonString.$": "States.JsonToString($.data)",
"parsedJson.$": "States.StringToJson($.jsonString)"
},
"Next": "ProcessOrder"
}
}
Error Handling
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Error Handling Pattern │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ ProcessPayment │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Lambda: process-payment │ │ │
│ │ │ │ │ │
│ │ │ Retry: │ │ │
│ │ │ • 3 attempts │ │ │
│ │ │ • 1s → 2s → 4s (exponential backoff) │ │ │
│ │ │ │ │ │
│ │ │ Catch: │ │ │
│ │ │ • PaymentDeclined → HandleDeclined │ │ │
│ │ │ • States.ALL → FallbackHandler │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
│ Error Types: │
│ ───────────── │
│ States.ALL - Matches any error │
│ States.Timeout - Task timed out │
│ States.TaskFailed - Lambda execution error │
│ States.Permissions - Permission error │
│ Custom.PaymentFailed - Custom error from Lambda │
│ │
└────────────────────────────────────────────────────────────────────────┘
Retry Configuration
Copy
{
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-payment",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2.0,
"MaxDelaySeconds": 30
},
{
"ErrorEquals": ["States.Timeout"],
"IntervalSeconds": 5,
"MaxAttempts": 2,
"BackoffRate": 1.0
}
],
"Catch": [
{
"ErrorEquals": ["PaymentDeclined"],
"ResultPath": "$.error",
"Next": "HandleDeclinedPayment"
},
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "NotifyFailure"
}
],
"Next": "ConfirmOrder"
}
}
Lambda Error Throwing
Copy
class PaymentDeclinedException(Exception):
pass
class InsufficientFundsException(Exception):
pass
def lambda_handler(event, context):
try:
result = process_payment(event)
return result
except CardDeclinedError:
# Step Functions will catch this
raise PaymentDeclinedException("Card was declined")
except InsufficientFundsError:
raise InsufficientFundsException("Insufficient funds in account")
Standard vs Express Workflows
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Workflow Type Comparison │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Feature │ Standard │ Express │
│ ─────────────────────┼──────────────────────┼────────────────────── │
│ Duration │ Up to 1 year │ Up to 5 minutes │
│ Execution History │ Stored (90 days) │ CloudWatch Logs only │
│ Start Rate │ 2,000/sec │ 100,000/sec │
│ Pricing │ Per state transition │ Per execution + dur. │
│ Idempotency │ Exactly-once │ At-least-once │
│ Execution Type │ Async (default) │ Sync or Async │
│ │
│ When to Use Standard: │
│ ───────────────────── │
│ ✓ Long-running workflows (hours/days) │
│ ✓ Need execution history for audit │
│ ✓ Require exactly-once semantics │
│ ✓ Human approval workflows │
│ │
│ When to Use Express: │
│ ──────────────────── │
│ ✓ High-volume, short-duration workflows │
│ ✓ Event processing pipelines │
│ ✓ API orchestration │
│ ✓ Cost-sensitive scenarios │
│ │
│ Pricing Example (1M executions, 5 state transitions each): │
│ Standard: 5M transitions × $0.000025 = $125 │
│ Express: 1M × $1.00/M + duration charges = ~$10-20 │
│ │
└────────────────────────────────────────────────────────────────────────┘
Service Integrations
Direct Service Integrations
Copy
{
"Comment": "Direct integrations without Lambda",
"States": {
"PutItemInDynamoDB": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName": "Orders",
"Item": {
"order_id": {"S.$": "$.orderId"},
"status": {"S": "CREATED"},
"created_at": {"S.$": "$$.State.EnteredTime"}
}
},
"Next": "SendNotification"
},
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
"Message.$": "States.Format('Order {} created', $.orderId)"
},
"Next": "AddToQueue"
},
"AddToQueue": {
"Type": "Task",
"Resource": "arn:aws:states:::sqs:sendMessage",
"Parameters": {
"QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
"MessageBody.$": "States.JsonToString($.order)"
},
"Next": "Done"
},
"InvokeLambda": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "process-order",
"Payload.$": "$"
},
"Next": "Done"
},
"StartECSTask": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"Cluster": "my-cluster",
"TaskDefinition": "process-batch",
"LaunchType": "FARGATE"
},
"Next": "Done"
}
}
}
Wait for Callback Pattern
Copy
{
"WaitForHumanApproval": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "send-approval-request",
"Payload": {
"orderId.$": "$.orderId",
"approver": "[email protected]",
"taskToken.$": "$$.Task.Token"
}
},
"TimeoutSeconds": 86400,
"Next": "ProcessApprovedOrder"
}
}
Copy
# Lambda that sends approval request
def send_approval_request(event, context):
task_token = event['taskToken']
order_id = event['orderId']
# Store token for later callback
dynamodb.put_item(
TableName='PendingApprovals',
Item={
'order_id': order_id,
'task_token': task_token,
'status': 'PENDING'
}
)
# Send email with approval link
ses.send_email(
To=event['approver'],
Subject=f'Approval needed for order {order_id}',
Body=f'Click to approve: https://api.../approve?order={order_id}'
)
# API endpoint that handles approval
def handle_approval(event, context):
order_id = event['queryStringParameters']['order_id']
action = event['queryStringParameters']['action'] # approve/reject
# Get stored token
item = dynamodb.get_item(
TableName='PendingApprovals',
Key={'order_id': order_id}
)
task_token = item['Item']['task_token']
# Resume Step Function
if action == 'approve':
stepfunctions.send_task_success(
taskToken=task_token,
output=json.dumps({'approved': True})
)
else:
stepfunctions.send_task_failure(
taskToken=task_token,
error='Rejected',
cause='Manager rejected the order'
)
Common Workflow Patterns
Saga Pattern (Distributed Transaction)
Copy
┌────────────────────────────────────────────────────────────────────────┐
│ Saga Pattern │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Each step has a compensating action for rollback: │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Reserve │───►│ Charge │───►│ Ship │───► Success │
│ │ Stock │ │ Payment │ │ Order │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ │ Fail │ Fail │ Fail │
│ ▼ ▼ ▼ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ Cancel │◄───│ Refund │◄───│ Cancel │ │
│ │ Reserve │ │ Payment │ │ Shipment │ │
│ └───────────┘ └───────────┘ └───────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Copy
{
"StartAt": "ReserveStock",
"States": {
"ReserveStock": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:reserve-stock",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "OrderFailed"
}],
"Next": "ChargePayment"
},
"ChargePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:charge-payment",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "CancelReservation"
}],
"Next": "ShipOrder"
},
"ShipOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:ship-order",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "RefundPayment"
}],
"Next": "OrderComplete"
},
"CancelReservation": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:cancel-reservation",
"Next": "OrderFailed"
},
"RefundPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:refund-payment",
"Next": "CancelReservation"
},
"OrderComplete": {
"Type": "Succeed"
},
"OrderFailed": {
"Type": "Fail",
"Error": "OrderProcessingFailed",
"Cause": "Order could not be completed"
}
}
}
Fan-Out/Fan-In Pattern
Copy
{
"ProcessAllOrders": {
"Type": "Map",
"InputPath": "$.orders",
"MaxConcurrency": 50,
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS"
},
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-order",
"End": true
}
}
},
"ResultPath": "$.processedOrders",
"Next": "AggregateResults"
}
}
Best Practices
Design for Idempotency
Tasks may retry—ensure operations are safe to repeat
Use ResultPath Wisely
Preserve input data while adding task results
Set Timeouts
Always set TimeoutSeconds to prevent stuck executions
Use Express for High Volume
Express workflows are much cheaper for short tasks
🎯 Interview Questions
Q1: When to use Step Functions vs SQS + Lambda?
Q1: When to use Step Functions vs SQS + Lambda?
Step Functions:
- Complex orchestration with branching
- Need visibility into workflow state
- Error handling with retries and fallbacks
- Long-running workflows
- Simple event processing
- High-volume, independent tasks
- Don’t need orchestration
- Cost-sensitive (cheaper at high scale)
Q2: How to handle long-running operations?
Q2: How to handle long-running operations?
Options:
- Wait for Task Token: Pause execution, resume via callback
- Activity Tasks: Worker polls for tasks, reports completion
- Async Lambda: Start Lambda, use callback pattern
Q3: Standard vs Express - how to choose?
Q3: Standard vs Express - how to choose?
Use Standard when:
- Execution longer than 5 minutes
- Need execution history for audit
- Require exactly-once semantics
- High volume (over 1000/sec)
- Short duration (under 5 min)
- Cost-sensitive
- At-least-once is acceptable
Next Module
AWS SAM
Build serverless applications with SAM