Skip to main content
Step Functions Architecture

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Lambda
AWS Step Functions lets you coordinate multiple AWS services into serverless workflows using visual state machines. This module covers workflow design patterns, error handling, and production best practices. What You’ll Learn:
  • State machine concepts and design
  • State types (Task, Choice, Parallel, Map)
  • Error handling and retries
  • Standard vs Express workflows
  • Service integrations
  • Workflow patterns for common use cases

Why Step Functions?

Visual Workflows

Design and visualize complex business processes as state machines

Built-in Error Handling

Automatic retries, catch blocks, and compensation logic

200+ Integrations

Native integration with Lambda, DynamoDB, SQS, SNS, and more

Audit Trail

Complete execution history for debugging and compliance

State Machine Concepts

┌────────────────────────────────────────────────────────────────────────┐
│                    State Machine Components                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   STATE MACHINE                                                         │
│   ─────────────                                                        │
│   • Collection of states that define workflow                          │
│   • Starts at StartAt state, ends at End state                         │
│   • Executes synchronously or asynchronously                           │
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                                                                │    │
│   │    ┌──────────┐                                               │    │
│   │    │  START   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Validate │ ──── Task State (Lambda)                      │    │
│   │    │  Input   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐     ┌──────────┐                              │    │
│   │    │  Valid?  │────►│ Reject   │ ──── Choice State            │    │
│   │    │  (yes)   │ no  │          │                              │    │
│   │    └────┬─────┘     └──────────┘                              │    │
│   │         │ yes                                                  │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Process  │ ──── Task State (DynamoDB)                    │    │
│   │    │  Order   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │   END    │                                               │    │
│   │    └──────────┘                                               │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

State Types

Task State

Performs work by invoking an AWS service or activity.
{
  "ValidateOrder": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
    "InputPath": "$.order",
    "ResultPath": "$.validation",
    "OutputPath": "$",
    "TimeoutSeconds": 30,
    "Next": "CheckValidation"
  }
}

Choice State

Branching logic based on input.
{
  "CheckValidation": {
    "Type": "Choice",
    "Choices": [
      {
        "Variable": "$.validation.isValid",
        "BooleanEquals": true,
        "Next": "ProcessPayment"
      },
      {
        "Variable": "$.validation.errorCode",
        "StringEquals": "INSUFFICIENT_STOCK",
        "Next": "NotifyOutOfStock"
      }
    ],
    "Default": "RejectOrder"
  }
}

Parallel State

Execute multiple branches simultaneously.
{
  "ProcessInParallel": {
    "Type": "Parallel",
    "Branches": [
      {
        "StartAt": "SendEmail",
        "States": {
          "SendEmail": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:send-email",
            "End": true
          }
        }
      },
      {
        "StartAt": "UpdateInventory",
        "States": {
          "UpdateInventory": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:update-inventory",
            "End": true
          }
        }
      }
    ],
    "Next": "FinalizeOrder"
  }
}

Map State

Iterate over an array and process each item.
{
  "ProcessLineItems": {
    "Type": "Map",
    "InputPath": "$.order.items",
    "ItemsPath": "$",
    "MaxConcurrency": 10,
    "Iterator": {
      "StartAt": "ProcessItem",
      "States": {
        "ProcessItem": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-item",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedItems",
    "Next": "CalculateTotal"
  }
}

Wait State

Pause execution for a specified time.
{
  "WaitForDelivery": {
    "Type": "Wait",
    "Seconds": 3600,
    "Next": "CheckDeliveryStatus"
  },
  "WaitUntilShipDate": {
    "Type": "Wait",
    "TimestampPath": "$.order.shipDate",
    "Next": "StartShipping"
  }
}

Other States

{
  "PassThrough": {
    "Type": "Pass",
    "Result": {"status": "processed"},
    "ResultPath": "$.result",
    "Next": "NextState"
  },
  "OrderFailed": {
    "Type": "Fail",
    "Cause": "Order processing failed",
    "Error": "OrderError"
  },
  "OrderComplete": {
    "Type": "Succeed"
  }
}

Input/Output Processing

┌────────────────────────────────────────────────────────────────────────┐
│                    Data Flow Through States                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   State Input (raw)                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."}                         │
│   }                                                                     │
│         │                                                               │
│         │ InputPath: "$.order"                                         │
│         ▼                                                               │
│   Task Input                                                            │
│   {"id": "123", "items": [...], "total": 99.99}                        │
│         │                                                               │
│         │ Lambda executes                                               │
│         ▼                                                               │
│   Task Result                                                           │
│   {"validation": "success", "discount": 10.00}                         │
│         │                                                               │
│         │ ResultPath: "$.orderValidation"                              │
│         ▼                                                               │
│   State with Result                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."},                        │
│     "orderValidation": {"validation": "success", "discount": 10.00}    │
│   }                                                                     │
│         │                                                               │
│         │ OutputPath: "$"                                              │
│         ▼                                                               │
│   State Output (passed to next state)                                   │
│   (same as above)                                                       │
│                                                                         │
│   Path Reference:                                                       │
│   InputPath:  What to send to task                                     │
│   ResultPath: Where to put task result (null = discard)                │
│   OutputPath: What to pass to next state                               │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Intrinsic Functions

{
  "TransformData": {
    "Type": "Pass",
    "Parameters": {
      "orderId.$": "States.UUID()",
      "orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
      "itemCount.$": "States.ArrayLength($.items)",
      "fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
      "items.$": "States.ArrayPartition($.allItems, 10)",
      "jsonString.$": "States.JsonToString($.data)",
      "parsedJson.$": "States.StringToJson($.jsonString)"
    },
    "Next": "ProcessOrder"
  }
}

Error Handling

┌────────────────────────────────────────────────────────────────────────┐
│                    Error Handling Pattern                               │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │ ProcessPayment                                                 │    │
│   │ ┌─────────────────────────────────────────────────────────┐   │    │
│   │ │ Lambda: process-payment                                  │   │    │
│   │ │                                                          │   │    │
│   │ │ Retry:                                                   │   │    │
│   │ │ • 3 attempts                                             │   │    │
│   │ │ • 1s → 2s → 4s (exponential backoff)                    │   │    │
│   │ │                                                          │   │    │
│   │ │ Catch:                                                   │   │    │
│   │ │ • PaymentDeclined → HandleDeclined                       │   │    │
│   │ │ • States.ALL → FallbackHandler                          │   │    │
│   │ └─────────────────────────────────────────────────────────┘   │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Error Types:                                                          │
│   ─────────────                                                        │
│   States.ALL          - Matches any error                              │
│   States.Timeout      - Task timed out                                 │
│   States.TaskFailed   - Lambda execution error                         │
│   States.Permissions  - Permission error                               │
│   Custom.PaymentFailed - Custom error from Lambda                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Retry Configuration

{
  "ProcessPayment": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:process-payment",
    "Retry": [
      {
        "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
        "IntervalSeconds": 1,
        "MaxAttempts": 3,
        "BackoffRate": 2.0,
        "MaxDelaySeconds": 30
      },
      {
        "ErrorEquals": ["States.Timeout"],
        "IntervalSeconds": 5,
        "MaxAttempts": 2,
        "BackoffRate": 1.0
      }
    ],
    "Catch": [
      {
        "ErrorEquals": ["PaymentDeclined"],
        "ResultPath": "$.error",
        "Next": "HandleDeclinedPayment"
      },
      {
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "NotifyFailure"
      }
    ],
    "Next": "ConfirmOrder"
  }
}

Lambda Error Throwing

class PaymentDeclinedException(Exception):
    pass

class InsufficientFundsException(Exception):
    pass

def lambda_handler(event, context):
    try:
        result = process_payment(event)
        return result
        
    except CardDeclinedError:
        # Step Functions will catch this
        raise PaymentDeclinedException("Card was declined")
        
    except InsufficientFundsError:
        raise InsufficientFundsException("Insufficient funds in account")

Standard vs Express Workflows

┌────────────────────────────────────────────────────────────────────────┐
│                    Workflow Type Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Feature              │ Standard              │ Express               │
│   ─────────────────────┼──────────────────────┼────────────────────── │
│   Duration             │ Up to 1 year         │ Up to 5 minutes       │
│   Execution History    │ Stored (90 days)     │ CloudWatch Logs only  │
│   Start Rate           │ 2,000/sec            │ 100,000/sec           │
│   Pricing              │ Per state transition │ Per execution + dur.  │
│   Idempotency          │ Exactly-once         │ At-least-once         │
│   Execution Type       │ Async (default)      │ Sync or Async         │
│                                                                         │
│   When to Use Standard:                                                │
│   ─────────────────────                                                │
│   ✓ Long-running workflows (hours/days)                                │
│   ✓ Need execution history for audit                                   │
│   ✓ Require exactly-once semantics                                     │
│   ✓ Human approval workflows                                           │
│                                                                         │
│   When to Use Express:                                                  │
│   ────────────────────                                                 │
│   ✓ High-volume, short-duration workflows                              │
│   ✓ Event processing pipelines                                         │
│   ✓ API orchestration                                                  │
│   ✓ Cost-sensitive scenarios                                           │
│                                                                         │
│   Pricing Example (1M executions, 5 state transitions each):          │
│   Standard: 5M transitions × $0.000025 = $125                          │
│   Express:  1M × $1.00/M + duration charges = ~$10-20                  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Service Integrations

Direct Service Integrations

{
  "Comment": "Direct integrations without Lambda",
  "States": {
    "PutItemInDynamoDB": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:putItem",
      "Parameters": {
        "TableName": "Orders",
        "Item": {
          "order_id": {"S.$": "$.orderId"},
          "status": {"S": "CREATED"},
          "created_at": {"S.$": "$$.State.EnteredTime"}
        }
      },
      "Next": "SendNotification"
    },
    
    "SendNotification": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
        "Message.$": "States.Format('Order {} created', $.orderId)"
      },
      "Next": "AddToQueue"
    },
    
    "AddToQueue": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sqs:sendMessage",
      "Parameters": {
        "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
        "MessageBody.$": "States.JsonToString($.order)"
      },
      "Next": "Done"
    },
    
    "InvokeLambda": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "process-order",
        "Payload.$": "$"
      },
      "Next": "Done"
    },
    
    "StartECSTask": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "my-cluster",
        "TaskDefinition": "process-batch",
        "LaunchType": "FARGATE"
      },
      "Next": "Done"
    }
  }
}

Wait for Callback Pattern

{
  "WaitForHumanApproval": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
    "Parameters": {
      "FunctionName": "send-approval-request",
      "Payload": {
        "orderId.$": "$.orderId",
        "approver": "[email protected]",
        "taskToken.$": "$$.Task.Token"
      }
    },
    "TimeoutSeconds": 86400,
    "Next": "ProcessApprovedOrder"
  }
}
# Lambda that sends approval request
def send_approval_request(event, context):
    task_token = event['taskToken']
    order_id = event['orderId']
    
    # Store token for later callback
    dynamodb.put_item(
        TableName='PendingApprovals',
        Item={
            'order_id': order_id,
            'task_token': task_token,
            'status': 'PENDING'
        }
    )
    
    # Send email with approval link
    ses.send_email(
        To=event['approver'],
        Subject=f'Approval needed for order {order_id}',
        Body=f'Click to approve: https://api.../approve?order={order_id}'
    )

# API endpoint that handles approval
def handle_approval(event, context):
    order_id = event['queryStringParameters']['order_id']
    action = event['queryStringParameters']['action']  # approve/reject
    
    # Get stored token
    item = dynamodb.get_item(
        TableName='PendingApprovals',
        Key={'order_id': order_id}
    )
    task_token = item['Item']['task_token']
    
    # Resume Step Function
    if action == 'approve':
        stepfunctions.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        stepfunctions.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )

Common Workflow Patterns

Saga Pattern (Distributed Transaction)

┌────────────────────────────────────────────────────────────────────────┐
│                    Saga Pattern                                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Each step has a compensating action for rollback:                    │
│                                                                         │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Reserve  │───►│  Charge   │───►│  Ship     │───► Success          │
│   │  Stock    │    │  Payment  │    │  Order    │                      │
│   └─────┬─────┘    └─────┬─────┘    └─────┬─────┘                      │
│         │                │                │                            │
│         │ Fail           │ Fail           │ Fail                       │
│         ▼                ▼                ▼                            │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Cancel   │◄───│  Refund   │◄───│  Cancel   │                      │
│   │  Reserve  │    │  Payment  │    │  Shipment │                      │
│   └───────────┘    └───────────┘    └───────────┘                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
{
  "StartAt": "ReserveStock",
  "States": {
    "ReserveStock": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:reserve-stock",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "OrderFailed"
      }],
      "Next": "ChargePayment"
    },
    "ChargePayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:charge-payment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "CancelReservation"
      }],
      "Next": "ShipOrder"
    },
    "ShipOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ship-order",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "RefundPayment"
      }],
      "Next": "OrderComplete"
    },
    "CancelReservation": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:cancel-reservation",
      "Next": "OrderFailed"
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:refund-payment",
      "Next": "CancelReservation"
    },
    "OrderComplete": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order could not be completed"
    }
  }
}

Fan-Out/Fan-In Pattern

{
  "ProcessAllOrders": {
    "Type": "Map",
    "InputPath": "$.orders",
    "MaxConcurrency": 50,
    "ItemProcessor": {
      "ProcessorConfig": {
        "Mode": "DISTRIBUTED",
        "ExecutionType": "EXPRESS"
      },
      "StartAt": "ProcessOrder",
      "States": {
        "ProcessOrder": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-order",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedOrders",
    "Next": "AggregateResults"
  }
}

Best Practices

Design for Idempotency

Tasks may retry—ensure operations are safe to repeat

Use ResultPath Wisely

Preserve input data while adding task results

Set Timeouts

Always set TimeoutSeconds to prevent stuck executions

Use Express for High Volume

Express workflows are much cheaper for short tasks

🎯 Interview Questions

Step Functions:
  • Complex orchestration with branching
  • Need visibility into workflow state
  • Error handling with retries and fallbacks
  • Long-running workflows
SQS + Lambda:
  • Simple event processing
  • High-volume, independent tasks
  • Don’t need orchestration
  • Cost-sensitive (cheaper at high scale)
Options:
  1. Wait for Task Token: Pause execution, resume via callback
  2. Activity Tasks: Worker polls for tasks, reports completion
  3. Async Lambda: Start Lambda, use callback pattern
Example: Human approval, external system integration
Use Standard when:
  • Execution longer than 5 minutes
  • Need execution history for audit
  • Require exactly-once semantics
Use Express when:
  • High volume (over 1000/sec)
  • Short duration (under 5 min)
  • Cost-sensitive
  • At-least-once is acceptable

Next Module

AWS SAM

Build serverless applications with SAM