Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Module Overview
Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Lambda
- State machine concepts and design
- State types (Task, Choice, Parallel, Map)
- Error handling and retries
- Standard vs Express workflows
- Service integrations
- Workflow patterns for common use cases
Why Step Functions?
Visual Workflows
Design and visualize complex business processes as state machines
Built-in Error Handling
Automatic retries, catch blocks, and compensation logic
200+ Integrations
Native integration with Lambda, DynamoDB, SQS, SNS, and more
Audit Trail
Complete execution history for debugging and compliance
State Machine Concepts
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β State Machine Components β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β STATE MACHINE β
β βββββββββββββ β
β β’ Collection of states that define workflow β
β β’ Starts at StartAt state, ends at End state β
β β’ Executes synchronously or asynchronously β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββ β β
β β β START β β β
β β ββββββ¬ββββββ β β
β β β β β
β β βΌ β β
β β ββββββββββββ β β
β β β Validate β ββββ Task State (Lambda) β β
β β β Input β β β
β β ββββββ¬ββββββ β β
β β β β β
β β βΌ β β
β β ββββββββββββ ββββββββββββ β β
β β β Valid? ββββββΊβ Reject β ββββ Choice State β β
β β β (yes) β no β β β β
β β ββββββ¬ββββββ ββββββββββββ β β
β β β yes β β
β β βΌ β β
β β ββββββββββββ β β
β β β Process β ββββ Task State (DynamoDB) β β
β β β Order β β β
β β ββββββ¬ββββββ β β
β β β β β
β β βΌ β β
β β ββββββββββββ β β
β β β END β β β
β β ββββββββββββ β β
β β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
State Types
Task State
Performs work by invoking an AWS service or activity.{
"ValidateOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
"InputPath": "$.order",
"ResultPath": "$.validation",
"OutputPath": "$",
// Always set TimeoutSeconds. Without it, a stuck Lambda (e.g., waiting
// on a downstream service that never responds) will hold the execution
// open indefinitely. Standard workflows charge per state transition AND
// keep the execution in "Running" state, which counts against your
// concurrent execution quota (1M default, but still finite).
"TimeoutSeconds": 30,
"Next": "CheckValidation"
}
}
Choice State
Branching logic based on input.{
"CheckValidation": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.validation.isValid",
"BooleanEquals": true,
"Next": "ProcessPayment"
},
{
"Variable": "$.validation.errorCode",
"StringEquals": "INSUFFICIENT_STOCK",
"Next": "NotifyOutOfStock"
}
],
"Default": "RejectOrder"
}
}
Parallel State
Execute multiple branches simultaneously.{
"ProcessInParallel": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "SendEmail",
"States": {
"SendEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:send-email",
"End": true
}
}
},
{
"StartAt": "UpdateInventory",
"States": {
"UpdateInventory": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:update-inventory",
"End": true
}
}
}
],
"Next": "FinalizeOrder"
}
}
Map State
Iterate over an array and process each item.{
"ProcessLineItems": {
"Type": "Map",
"InputPath": "$.order.items",
"ItemsPath": "$",
"MaxConcurrency": 10,
"Iterator": {
"StartAt": "ProcessItem",
"States": {
"ProcessItem": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-item",
"End": true
}
}
},
"ResultPath": "$.processedItems",
"Next": "CalculateTotal"
}
}
Wait State
Pause execution for a specified time.{
"WaitForDelivery": {
"Type": "Wait",
"Seconds": 3600,
"Next": "CheckDeliveryStatus"
},
"WaitUntilShipDate": {
"Type": "Wait",
"TimestampPath": "$.order.shipDate",
"Next": "StartShipping"
}
}
Other States
{
"PassThrough": {
"Type": "Pass",
"Result": {"status": "processed"},
"ResultPath": "$.result",
"Next": "NextState"
},
"OrderFailed": {
"Type": "Fail",
"Cause": "Order processing failed",
"Error": "OrderError"
},
"OrderComplete": {
"Type": "Succeed"
}
}
Input/Output Processing
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Data Flow Through States β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β State Input (raw) β
β { β
β "order": {"id": "123", "items": [...], "total": 99.99}, β
β "customer": {"id": "C001", "email": "..."} β
β } β
β β β
β β InputPath: "$.order" β
β βΌ β
β Task Input β
β {"id": "123", "items": [...], "total": 99.99} β
β β β
β β Lambda executes β
β βΌ β
β Task Result β
β {"validation": "success", "discount": 10.00} β
β β β
β β ResultPath: "$.orderValidation" β
β βΌ β
β State with Result β
β { β
β "order": {"id": "123", "items": [...], "total": 99.99}, β
β "customer": {"id": "C001", "email": "..."}, β
β "orderValidation": {"validation": "success", "discount": 10.00} β
β } β
β β β
β β OutputPath: "$" β
β βΌ β
β State Output (passed to next state) β
β (same as above) β
β β
β Path Reference: β
β InputPath: What to send to task β
β ResultPath: Where to put task result (null = discard) β
β OutputPath: What to pass to next state β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Intrinsic Functions
{
"TransformData": {
"Type": "Pass",
"Parameters": {
"orderId.$": "States.UUID()",
"orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
"itemCount.$": "States.ArrayLength($.items)",
"fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
"items.$": "States.ArrayPartition($.allItems, 10)",
"jsonString.$": "States.JsonToString($.data)",
"parsedJson.$": "States.StringToJson($.jsonString)"
},
"Next": "ProcessOrder"
}
}
Error Handling
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Error Handling Pattern β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ProcessPayment β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Lambda: process-payment β β β
β β β β β β
β β β Retry: β β β
β β β β’ 3 attempts β β β
β β β β’ 1s β 2s β 4s (exponential backoff) β β β
β β β β β β
β β β Catch: β β β
β β β β’ PaymentDeclined β HandleDeclined β β β
β β β β’ States.ALL β FallbackHandler β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β Error Types: β
β βββββββββββββ β
β States.ALL - Matches any error β
β States.Timeout - Task timed out β
β States.TaskFailed - Lambda execution error β
β States.Permissions - Permission error β
β Custom.PaymentFailed - Custom error from Lambda β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Retry Configuration
{
"ProcessPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-payment",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 2.0,
"MaxDelaySeconds": 30
},
{
"ErrorEquals": ["States.Timeout"],
"IntervalSeconds": 5,
"MaxAttempts": 2,
"BackoffRate": 1.0
}
],
"Catch": [
{
"ErrorEquals": ["PaymentDeclined"],
"ResultPath": "$.error",
"Next": "HandleDeclinedPayment"
},
{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "NotifyFailure"
}
],
"Next": "ConfirmOrder"
}
}
Lambda Error Throwing
class PaymentDeclinedException(Exception):
pass
class InsufficientFundsException(Exception):
pass
def lambda_handler(event, context):
try:
result = process_payment(event)
return result
except CardDeclinedError:
# Step Functions will catch this
raise PaymentDeclinedException("Card was declined")
except InsufficientFundsError:
raise InsufficientFundsException("Insufficient funds in account")
Standard vs Express Workflows
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Workflow Type Comparison β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Feature β Standard β Express β
β ββββββββββββββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββββββ β
β Duration β Up to 1 year β Up to 5 minutes β
β Execution History β Stored (90 days) β CloudWatch Logs only β
β Start Rate β 2,000/sec β 100,000/sec β
β Pricing β Per state transition β Per execution + dur. β
β Idempotency β Exactly-once β At-least-once β
β Execution Type β Async (default) β Sync or Async β
β β
β When to Use Standard: β
β βββββββββββββββββββββ β
β β Long-running workflows (hours/days) β
β β Need execution history for audit β
β β Require exactly-once semantics β
β β Human approval workflows β
β β
β When to Use Express: β
β ββββββββββββββββββββ β
β β High-volume, short-duration workflows β
β β Event processing pipelines β
β β API orchestration β
β β Cost-sensitive scenarios β
β β
β Pricing Example (1M executions, 5 state transitions each): β
β Standard: 5M transitions x $0.000025 = $125 β
β Express: 1M x $1.00/M + duration charges = ~$10-20 β
β β
β Cost mistake: Using Standard workflows for high-volume, short-lived β
β tasks (like API orchestration). A team processing 10M API requests/ β
β month with 8 transitions each pays $2,000 on Standard vs ~$100 on β
β Express. Rule of thumb: if it finishes in under 5 minutes and you β
β can tolerate at-least-once semantics, use Express. β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Service Integrations
Direct Service Integrations
{
"Comment": "Direct integrations without Lambda",
"States": {
"PutItemInDynamoDB": {
"Type": "Task",
// Direct service integration -- no Lambda needed. This calls
// DynamoDB directly from Step Functions, saving both the cost of
// a Lambda invocation (~$0.20/M) and ~100ms of cold-start latency.
// Use direct integrations whenever the operation is a simple
// service call that doesn't need custom business logic.
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName": "Orders",
"Item": {
"order_id": {"S.$": "$.orderId"},
"status": {"S": "CREATED"},
"created_at": {"S.$": "$$.State.EnteredTime"}
}
},
"Next": "SendNotification"
},
"SendNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::sns:publish",
"Parameters": {
"TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
"Message.$": "States.Format('Order {} created', $.orderId)"
},
"Next": "AddToQueue"
},
"AddToQueue": {
"Type": "Task",
"Resource": "arn:aws:states:::sqs:sendMessage",
"Parameters": {
"QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
"MessageBody.$": "States.JsonToString($.order)"
},
"Next": "Done"
},
"InvokeLambda": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "process-order",
"Payload.$": "$"
},
"Next": "Done"
},
"StartECSTask": {
"Type": "Task",
"Resource": "arn:aws:states:::ecs:runTask.sync",
"Parameters": {
"Cluster": "my-cluster",
"TaskDefinition": "process-batch",
"LaunchType": "FARGATE"
},
"Next": "Done"
}
}
}
Wait for Callback Pattern
This is one of the most powerful Step Functions patterns. The workflow pauses and waits for an external system (a human, a webhook, a third-party API) to call back with a result. The execution stays in βWaitingβ state without consuming compute or costing money beyond the initial state transition. Real-world use cases include: manager approval for expenses, waiting for a payment processor webhook, or pausing until a manual QA review is complete.{
"WaitForHumanApproval": {
"Type": "Task",
// waitForTaskToken pauses the workflow until an external system calls
// SendTaskSuccess or SendTaskFailure with the token. The workflow
// consumes no compute while waiting -- you only pay for the state
// transition, not the wait time. This is the pattern for human
// approvals, external webhooks, or any async callback.
"Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
"Parameters": {
"FunctionName": "send-approval-request",
"Payload": {
"orderId.$": "$.orderId",
"approver": "manager@company.com",
"taskToken.$": "$$.Task.Token"
}
},
// Common mistake: Not setting a timeout on callback tasks. Without
// TimeoutSeconds, the execution waits forever if the callback never
// arrives (e.g., the approver ignores the email). This counts against
// your open executions quota and the execution eventually becomes
// unrecoverable. Always set a reasonable timeout with a Catch block
// for States.Timeout that sends a reminder or auto-rejects.
"TimeoutSeconds": 86400,
"Next": "ProcessApprovedOrder"
}
}
# Lambda that sends approval request
def send_approval_request(event, context):
task_token = event['taskToken']
order_id = event['orderId']
# Store token for later callback
dynamodb.put_item(
TableName='PendingApprovals',
Item={
'order_id': order_id,
'task_token': task_token,
'status': 'PENDING'
}
)
# Send email with approval link
ses.send_email(
To=event['approver'],
Subject=f'Approval needed for order {order_id}',
Body=f'Click to approve: https://api.../approve?order={order_id}'
)
# API endpoint that handles approval
def handle_approval(event, context):
order_id = event['queryStringParameters']['order_id']
action = event['queryStringParameters']['action'] # approve/reject
# Get stored token
item = dynamodb.get_item(
TableName='PendingApprovals',
Key={'order_id': order_id}
)
task_token = item['Item']['task_token']
# Resume Step Function
if action == 'approve':
stepfunctions.send_task_success(
taskToken=task_token,
output=json.dumps({'approved': True})
)
else:
stepfunctions.send_task_failure(
taskToken=task_token,
error='Rejected',
cause='Manager rejected the order'
)
Common Workflow Patterns
Saga Pattern (Distributed Transaction)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Saga Pattern β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Each step has a compensating action for rollback: β
β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β Reserve βββββΊβ Charge βββββΊβ Ship βββββΊ Success β
β β Stock β β Payment β β Order β β
β βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ β
β β β β β
β β Fail β Fail β Fail β
β βΌ βΌ βΌ β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β Cancel ββββββ Refund ββββββ Cancel β β
β β Reserve β β Payment β β Shipment β β
β βββββββββββββ βββββββββββββ βββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
{
"StartAt": "ReserveStock",
"States": {
"ReserveStock": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:reserve-stock",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "OrderFailed"
}],
"Next": "ChargePayment"
},
"ChargePayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:charge-payment",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "CancelReservation"
}],
"Next": "ShipOrder"
},
"ShipOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:ship-order",
"Catch": [{
"ErrorEquals": ["States.ALL"],
"ResultPath": "$.error",
"Next": "RefundPayment"
}],
"Next": "OrderComplete"
},
"CancelReservation": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:cancel-reservation",
"Next": "OrderFailed"
},
"RefundPayment": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:refund-payment",
"Next": "CancelReservation"
},
"OrderComplete": {
"Type": "Succeed"
},
"OrderFailed": {
"Type": "Fail",
"Error": "OrderProcessingFailed",
"Cause": "Order could not be completed"
}
}
}
Fan-Out/Fan-In Pattern
{
"ProcessAllOrders": {
"Type": "Map",
"InputPath": "$.orders",
"MaxConcurrency": 50,
"ItemProcessor": {
"ProcessorConfig": {
"Mode": "DISTRIBUTED",
"ExecutionType": "EXPRESS"
},
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:...:process-order",
"End": true
}
}
},
"ResultPath": "$.processedOrders",
"Next": "AggregateResults"
}
}
Best Practices
Design for Idempotency
Tasks may retryβensure operations are safe to repeat
Use ResultPath Wisely
Preserve input data while adding task results
Set Timeouts
Always set TimeoutSeconds to prevent stuck executions
Use Express for High Volume
Express workflows are much cheaper for short tasks
π― Interview Questions
Q1: When to use Step Functions vs SQS + Lambda?
Q1: When to use Step Functions vs SQS + Lambda?
Step Functions:
- Complex orchestration with branching
- Need visibility into workflow state
- Error handling with retries and fallbacks
- Long-running workflows
- Simple event processing
- High-volume, independent tasks
- Donβt need orchestration
- Cost-sensitive (cheaper at high scale)
Q2: How to handle long-running operations?
Q2: How to handle long-running operations?
Options:
- Wait for Task Token: Pause execution, resume via callback
- Activity Tasks: Worker polls for tasks, reports completion
- Async Lambda: Start Lambda, use callback pattern
Q3: Standard vs Express - how to choose?
Q3: Standard vs Express - how to choose?
Use Standard when:
- Execution longer than 5 minutes
- Need execution history for audit
- Require exactly-once semantics
- High volume (over 1000/sec)
- Short duration (under 5 min)
- Cost-sensitive
- At-least-once is acceptable
Next Module
AWS SAM
Build serverless applications with SAM