Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Step Functions Architecture

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Lambda
AWS Step Functions lets you coordinate multiple AWS services into serverless workflows using visual state machines. Think of Step Functions as a recipe card for your cloud kitchen β€” each step says β€œdo this, then check the result, then decide what to do next.” Without it, you would wire up Lambda-to-Lambda calls with SQS queues, build your own retry logic, and pray that error handling works. Step Functions gives you that orchestration layer with built-in retries, timeouts, and a visual execution history so you can see exactly where a workflow failed at 3 AM. This module covers workflow design patterns, error handling, and production best practices. What You’ll Learn:
  • State machine concepts and design
  • State types (Task, Choice, Parallel, Map)
  • Error handling and retries
  • Standard vs Express workflows
  • Service integrations
  • Workflow patterns for common use cases

Why Step Functions?

Visual Workflows

Design and visualize complex business processes as state machines

Built-in Error Handling

Automatic retries, catch blocks, and compensation logic

200+ Integrations

Native integration with Lambda, DynamoDB, SQS, SNS, and more

Audit Trail

Complete execution history for debugging and compliance

State Machine Concepts

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    State Machine Components                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   STATE MACHINE                                                         β”‚
β”‚   ─────────────                                                        β”‚
β”‚   β€’ Collection of states that define workflow                          β”‚
β”‚   β€’ Starts at StartAt state, ends at End state                         β”‚
β”‚   β€’ Executes synchronously or asynchronously                           β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚                                                                β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚    β”‚
β”‚   β”‚    β”‚  START   β”‚                                               β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                               β”‚    β”‚
β”‚   β”‚         β”‚                                                      β”‚    β”‚
β”‚   β”‚         β–Ό                                                      β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚    β”‚
β”‚   β”‚    β”‚ Validate β”‚ ──── Task State (Lambda)                      β”‚    β”‚
β”‚   β”‚    β”‚  Input   β”‚                                               β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                               β”‚    β”‚
β”‚   β”‚         β”‚                                                      β”‚    β”‚
β”‚   β”‚         β–Ό                                                      β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                              β”‚    β”‚
β”‚   β”‚    β”‚  Valid?  │────►│ Reject   β”‚ ──── Choice State            β”‚    β”‚
β”‚   β”‚    β”‚  (yes)   β”‚ no  β”‚          β”‚                              β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚    β”‚
β”‚   β”‚         β”‚ yes                                                  β”‚    β”‚
β”‚   β”‚         β–Ό                                                      β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚    β”‚
β”‚   β”‚    β”‚ Process  β”‚ ──── Task State (DynamoDB)                    β”‚    β”‚
β”‚   β”‚    β”‚  Order   β”‚                                               β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                                               β”‚    β”‚
β”‚   β”‚         β”‚                                                      β”‚    β”‚
β”‚   β”‚         β–Ό                                                      β”‚    β”‚
β”‚   β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                               β”‚    β”‚
β”‚   β”‚    β”‚   END    β”‚                                               β”‚    β”‚
β”‚   β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                               β”‚    β”‚
β”‚   β”‚                                                                β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

State Types

Task State

Performs work by invoking an AWS service or activity.
{
  "ValidateOrder": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
    "InputPath": "$.order",
    "ResultPath": "$.validation",
    "OutputPath": "$",
    // Always set TimeoutSeconds. Without it, a stuck Lambda (e.g., waiting
    // on a downstream service that never responds) will hold the execution
    // open indefinitely. Standard workflows charge per state transition AND
    // keep the execution in "Running" state, which counts against your
    // concurrent execution quota (1M default, but still finite).
    "TimeoutSeconds": 30,
    "Next": "CheckValidation"
  }
}

Choice State

Branching logic based on input.
{
  "CheckValidation": {
    "Type": "Choice",
    "Choices": [
      {
        "Variable": "$.validation.isValid",
        "BooleanEquals": true,
        "Next": "ProcessPayment"
      },
      {
        "Variable": "$.validation.errorCode",
        "StringEquals": "INSUFFICIENT_STOCK",
        "Next": "NotifyOutOfStock"
      }
    ],
    "Default": "RejectOrder"
  }
}

Parallel State

Execute multiple branches simultaneously.
{
  "ProcessInParallel": {
    "Type": "Parallel",
    "Branches": [
      {
        "StartAt": "SendEmail",
        "States": {
          "SendEmail": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:send-email",
            "End": true
          }
        }
      },
      {
        "StartAt": "UpdateInventory",
        "States": {
          "UpdateInventory": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:update-inventory",
            "End": true
          }
        }
      }
    ],
    "Next": "FinalizeOrder"
  }
}

Map State

Iterate over an array and process each item.
{
  "ProcessLineItems": {
    "Type": "Map",
    "InputPath": "$.order.items",
    "ItemsPath": "$",
    "MaxConcurrency": 10,
    "Iterator": {
      "StartAt": "ProcessItem",
      "States": {
        "ProcessItem": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-item",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedItems",
    "Next": "CalculateTotal"
  }
}

Wait State

Pause execution for a specified time.
{
  "WaitForDelivery": {
    "Type": "Wait",
    "Seconds": 3600,
    "Next": "CheckDeliveryStatus"
  },
  "WaitUntilShipDate": {
    "Type": "Wait",
    "TimestampPath": "$.order.shipDate",
    "Next": "StartShipping"
  }
}

Other States

{
  "PassThrough": {
    "Type": "Pass",
    "Result": {"status": "processed"},
    "ResultPath": "$.result",
    "Next": "NextState"
  },
  "OrderFailed": {
    "Type": "Fail",
    "Cause": "Order processing failed",
    "Error": "OrderError"
  },
  "OrderComplete": {
    "Type": "Succeed"
  }
}

Input/Output Processing

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Data Flow Through States                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   State Input (raw)                                                     β”‚
β”‚   {                                                                     β”‚
β”‚     "order": {"id": "123", "items": [...], "total": 99.99},            β”‚
β”‚     "customer": {"id": "C001", "email": "..."}                         β”‚
β”‚   }                                                                     β”‚
β”‚         β”‚                                                               β”‚
β”‚         β”‚ InputPath: "$.order"                                         β”‚
β”‚         β–Ό                                                               β”‚
β”‚   Task Input                                                            β”‚
β”‚   {"id": "123", "items": [...], "total": 99.99}                        β”‚
β”‚         β”‚                                                               β”‚
β”‚         β”‚ Lambda executes                                               β”‚
β”‚         β–Ό                                                               β”‚
β”‚   Task Result                                                           β”‚
β”‚   {"validation": "success", "discount": 10.00}                         β”‚
β”‚         β”‚                                                               β”‚
β”‚         β”‚ ResultPath: "$.orderValidation"                              β”‚
β”‚         β–Ό                                                               β”‚
β”‚   State with Result                                                     β”‚
β”‚   {                                                                     β”‚
β”‚     "order": {"id": "123", "items": [...], "total": 99.99},            β”‚
β”‚     "customer": {"id": "C001", "email": "..."},                        β”‚
β”‚     "orderValidation": {"validation": "success", "discount": 10.00}    β”‚
β”‚   }                                                                     β”‚
β”‚         β”‚                                                               β”‚
β”‚         β”‚ OutputPath: "$"                                              β”‚
β”‚         β–Ό                                                               β”‚
β”‚   State Output (passed to next state)                                   β”‚
β”‚   (same as above)                                                       β”‚
β”‚                                                                         β”‚
β”‚   Path Reference:                                                       β”‚
β”‚   InputPath:  What to send to task                                     β”‚
β”‚   ResultPath: Where to put task result (null = discard)                β”‚
β”‚   OutputPath: What to pass to next state                               β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Intrinsic Functions

{
  "TransformData": {
    "Type": "Pass",
    "Parameters": {
      "orderId.$": "States.UUID()",
      "orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
      "itemCount.$": "States.ArrayLength($.items)",
      "fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
      "items.$": "States.ArrayPartition($.allItems, 10)",
      "jsonString.$": "States.JsonToString($.data)",
      "parsedJson.$": "States.StringToJson($.jsonString)"
    },
    "Next": "ProcessOrder"
  }
}

Error Handling

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Error Handling Pattern                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚   β”‚ ProcessPayment                                                 β”‚    β”‚
β”‚   β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚    β”‚
β”‚   β”‚ β”‚ Lambda: process-payment                                  β”‚   β”‚    β”‚
β”‚   β”‚ β”‚                                                          β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ Retry:                                                   β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ β€’ 3 attempts                                             β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ β€’ 1s β†’ 2s β†’ 4s (exponential backoff)                    β”‚   β”‚    β”‚
β”‚   β”‚ β”‚                                                          β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ Catch:                                                   β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ β€’ PaymentDeclined β†’ HandleDeclined                       β”‚   β”‚    β”‚
β”‚   β”‚ β”‚ β€’ States.ALL β†’ FallbackHandler                          β”‚   β”‚    β”‚
β”‚   β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚    β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚                                                                         β”‚
β”‚   Error Types:                                                          β”‚
β”‚   ─────────────                                                        β”‚
β”‚   States.ALL          - Matches any error                              β”‚
β”‚   States.Timeout      - Task timed out                                 β”‚
β”‚   States.TaskFailed   - Lambda execution error                         β”‚
β”‚   States.Permissions  - Permission error                               β”‚
β”‚   Custom.PaymentFailed - Custom error from Lambda                      β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Retry Configuration

{
  "ProcessPayment": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:process-payment",
    "Retry": [
      {
        "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
        "IntervalSeconds": 1,
        "MaxAttempts": 3,
        "BackoffRate": 2.0,
        "MaxDelaySeconds": 30
      },
      {
        "ErrorEquals": ["States.Timeout"],
        "IntervalSeconds": 5,
        "MaxAttempts": 2,
        "BackoffRate": 1.0
      }
    ],
    "Catch": [
      {
        "ErrorEquals": ["PaymentDeclined"],
        "ResultPath": "$.error",
        "Next": "HandleDeclinedPayment"
      },
      {
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "NotifyFailure"
      }
    ],
    "Next": "ConfirmOrder"
  }
}

Lambda Error Throwing

class PaymentDeclinedException(Exception):
    pass

class InsufficientFundsException(Exception):
    pass

def lambda_handler(event, context):
    try:
        result = process_payment(event)
        return result
        
    except CardDeclinedError:
        # Step Functions will catch this
        raise PaymentDeclinedException("Card was declined")
        
    except InsufficientFundsError:
        raise InsufficientFundsException("Insufficient funds in account")

Standard vs Express Workflows

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Workflow Type Comparison                             β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   Feature              β”‚ Standard              β”‚ Express               β”‚
β”‚   ─────────────────────┼──────────────────────┼────────────────────── β”‚
β”‚   Duration             β”‚ Up to 1 year         β”‚ Up to 5 minutes       β”‚
β”‚   Execution History    β”‚ Stored (90 days)     β”‚ CloudWatch Logs only  β”‚
β”‚   Start Rate           β”‚ 2,000/sec            β”‚ 100,000/sec           β”‚
β”‚   Pricing              β”‚ Per state transition β”‚ Per execution + dur.  β”‚
β”‚   Idempotency          β”‚ Exactly-once         β”‚ At-least-once         β”‚
β”‚   Execution Type       β”‚ Async (default)      β”‚ Sync or Async         β”‚
β”‚                                                                         β”‚
β”‚   When to Use Standard:                                                β”‚
β”‚   ─────────────────────                                                β”‚
β”‚   βœ“ Long-running workflows (hours/days)                                β”‚
β”‚   βœ“ Need execution history for audit                                   β”‚
β”‚   βœ“ Require exactly-once semantics                                     β”‚
β”‚   βœ“ Human approval workflows                                           β”‚
β”‚                                                                         β”‚
β”‚   When to Use Express:                                                  β”‚
β”‚   ────────────────────                                                 β”‚
β”‚   βœ“ High-volume, short-duration workflows                              β”‚
β”‚   βœ“ Event processing pipelines                                         β”‚
β”‚   βœ“ API orchestration                                                  β”‚
β”‚   βœ“ Cost-sensitive scenarios                                           β”‚
β”‚                                                                         β”‚
β”‚   Pricing Example (1M executions, 5 state transitions each):          β”‚
β”‚   Standard: 5M transitions x $0.000025 = $125                          β”‚
β”‚   Express:  1M x $1.00/M + duration charges = ~$10-20                  β”‚
β”‚                                                                         β”‚
β”‚   Cost mistake: Using Standard workflows for high-volume, short-lived  β”‚
β”‚   tasks (like API orchestration). A team processing 10M API requests/   β”‚
β”‚   month with 8 transitions each pays $2,000 on Standard vs ~$100 on    β”‚
β”‚   Express. Rule of thumb: if it finishes in under 5 minutes and you     β”‚
β”‚   can tolerate at-least-once semantics, use Express.                    β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Service Integrations

Direct Service Integrations

{
  "Comment": "Direct integrations without Lambda",
  "States": {
    "PutItemInDynamoDB": {
      "Type": "Task",
      // Direct service integration -- no Lambda needed. This calls
      // DynamoDB directly from Step Functions, saving both the cost of
      // a Lambda invocation (~$0.20/M) and ~100ms of cold-start latency.
      // Use direct integrations whenever the operation is a simple
      // service call that doesn't need custom business logic.
      "Resource": "arn:aws:states:::dynamodb:putItem",
      "Parameters": {
        "TableName": "Orders",
        "Item": {
          "order_id": {"S.$": "$.orderId"},
          "status": {"S": "CREATED"},
          "created_at": {"S.$": "$$.State.EnteredTime"}
        }
      },
      "Next": "SendNotification"
    },
    
    "SendNotification": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
        "Message.$": "States.Format('Order {} created', $.orderId)"
      },
      "Next": "AddToQueue"
    },
    
    "AddToQueue": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sqs:sendMessage",
      "Parameters": {
        "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
        "MessageBody.$": "States.JsonToString($.order)"
      },
      "Next": "Done"
    },
    
    "InvokeLambda": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "process-order",
        "Payload.$": "$"
      },
      "Next": "Done"
    },
    
    "StartECSTask": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "my-cluster",
        "TaskDefinition": "process-batch",
        "LaunchType": "FARGATE"
      },
      "Next": "Done"
    }
  }
}

Wait for Callback Pattern

This is one of the most powerful Step Functions patterns. The workflow pauses and waits for an external system (a human, a webhook, a third-party API) to call back with a result. The execution stays in β€œWaiting” state without consuming compute or costing money beyond the initial state transition. Real-world use cases include: manager approval for expenses, waiting for a payment processor webhook, or pausing until a manual QA review is complete.
{
  "WaitForHumanApproval": {
    "Type": "Task",
    // waitForTaskToken pauses the workflow until an external system calls
    // SendTaskSuccess or SendTaskFailure with the token. The workflow
    // consumes no compute while waiting -- you only pay for the state
    // transition, not the wait time. This is the pattern for human
    // approvals, external webhooks, or any async callback.
    "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
    "Parameters": {
      "FunctionName": "send-approval-request",
      "Payload": {
        "orderId.$": "$.orderId",
        "approver": "manager@company.com",
        "taskToken.$": "$$.Task.Token"
      }
    },
    // Common mistake: Not setting a timeout on callback tasks. Without
    // TimeoutSeconds, the execution waits forever if the callback never
    // arrives (e.g., the approver ignores the email). This counts against
    // your open executions quota and the execution eventually becomes
    // unrecoverable. Always set a reasonable timeout with a Catch block
    // for States.Timeout that sends a reminder or auto-rejects.
    "TimeoutSeconds": 86400,
    "Next": "ProcessApprovedOrder"
  }
}
# Lambda that sends approval request
def send_approval_request(event, context):
    task_token = event['taskToken']
    order_id = event['orderId']
    
    # Store token for later callback
    dynamodb.put_item(
        TableName='PendingApprovals',
        Item={
            'order_id': order_id,
            'task_token': task_token,
            'status': 'PENDING'
        }
    )
    
    # Send email with approval link
    ses.send_email(
        To=event['approver'],
        Subject=f'Approval needed for order {order_id}',
        Body=f'Click to approve: https://api.../approve?order={order_id}'
    )

# API endpoint that handles approval
def handle_approval(event, context):
    order_id = event['queryStringParameters']['order_id']
    action = event['queryStringParameters']['action']  # approve/reject
    
    # Get stored token
    item = dynamodb.get_item(
        TableName='PendingApprovals',
        Key={'order_id': order_id}
    )
    task_token = item['Item']['task_token']
    
    # Resume Step Function
    if action == 'approve':
        stepfunctions.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        stepfunctions.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )

Common Workflow Patterns

Saga Pattern (Distributed Transaction)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Saga Pattern                                         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                         β”‚
β”‚   Each step has a compensating action for rollback:                    β”‚
β”‚                                                                         β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚   β”‚  Reserve  │───►│  Charge   │───►│  Ship     │───► Success          β”‚
β”‚   β”‚  Stock    β”‚    β”‚  Payment  β”‚    β”‚  Order    β”‚                      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚         β”‚                β”‚                β”‚                            β”‚
β”‚         β”‚ Fail           β”‚ Fail           β”‚ Fail                       β”‚
β”‚         β–Ό                β–Ό                β–Ό                            β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚   β”‚  Cancel   │◄───│  Refund   │◄───│  Cancel   β”‚                      β”‚
β”‚   β”‚  Reserve  β”‚    β”‚  Payment  β”‚    β”‚  Shipment β”‚                      β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β”‚                                                                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
{
  "StartAt": "ReserveStock",
  "States": {
    "ReserveStock": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:reserve-stock",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "OrderFailed"
      }],
      "Next": "ChargePayment"
    },
    "ChargePayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:charge-payment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "CancelReservation"
      }],
      "Next": "ShipOrder"
    },
    "ShipOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ship-order",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "RefundPayment"
      }],
      "Next": "OrderComplete"
    },
    "CancelReservation": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:cancel-reservation",
      "Next": "OrderFailed"
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:refund-payment",
      "Next": "CancelReservation"
    },
    "OrderComplete": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order could not be completed"
    }
  }
}

Fan-Out/Fan-In Pattern

{
  "ProcessAllOrders": {
    "Type": "Map",
    "InputPath": "$.orders",
    "MaxConcurrency": 50,
    "ItemProcessor": {
      "ProcessorConfig": {
        "Mode": "DISTRIBUTED",
        "ExecutionType": "EXPRESS"
      },
      "StartAt": "ProcessOrder",
      "States": {
        "ProcessOrder": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-order",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedOrders",
    "Next": "AggregateResults"
  }
}

Best Practices

Design for Idempotency

Tasks may retryβ€”ensure operations are safe to repeat

Use ResultPath Wisely

Preserve input data while adding task results

Set Timeouts

Always set TimeoutSeconds to prevent stuck executions

Use Express for High Volume

Express workflows are much cheaper for short tasks

🎯 Interview Questions

Step Functions:
  • Complex orchestration with branching
  • Need visibility into workflow state
  • Error handling with retries and fallbacks
  • Long-running workflows
SQS + Lambda:
  • Simple event processing
  • High-volume, independent tasks
  • Don’t need orchestration
  • Cost-sensitive (cheaper at high scale)
Options:
  1. Wait for Task Token: Pause execution, resume via callback
  2. Activity Tasks: Worker polls for tasks, reports completion
  3. Async Lambda: Start Lambda, use callback pattern
Example: Human approval, external system integration
Use Standard when:
  • Execution longer than 5 minutes
  • Need execution history for audit
  • Require exactly-once semantics
Use Express when:
  • High volume (over 1000/sec)
  • Short duration (under 5 min)
  • Cost-sensitive
  • At-least-once is acceptable

Next Module

AWS SAM

Build serverless applications with SAM