> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# AWS Step Functions

> Master serverless workflow orchestration with state machines

<Frame>
  <img src="https://mintcdn.com/devweeekends/sTu6A4whRFPJo0_g/images/aws/step-functions-architecture.svg?fit=max&auto=format&n=sTu6A4whRFPJo0_g&q=85&s=35447a89380880190e09926f32e7232c" alt="Step Functions Architecture" width="1080" height="1080" data-path="images/aws/step-functions-architecture.svg" />
</Frame>

## Module Overview

<Info>
  **Estimated Time**: 4-5 hours | **Difficulty**: Intermediate | **Prerequisites**: Lambda
</Info>

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows using visual state machines. Think of Step Functions as a recipe card for your cloud kitchen -- each step says "do this, then check the result, then decide what to do next." Without it, you would wire up Lambda-to-Lambda calls with SQS queues, build your own retry logic, and pray that error handling works. Step Functions gives you that orchestration layer with built-in retries, timeouts, and a visual execution history so you can see exactly where a workflow failed at 3 AM. This module covers workflow design patterns, error handling, and production best practices.

**What You'll Learn:**

* State machine concepts and design
* State types (Task, Choice, Parallel, Map)
* Error handling and retries
* Standard vs Express workflows
* Service integrations
* Workflow patterns for common use cases

***

## Why Step Functions?

<CardGroup cols={2}>
  <Card title="Visual Workflows" icon="diagram-project">
    Design and visualize complex business processes as state machines
  </Card>

  <Card title="Built-in Error Handling" icon="shield-check">
    Automatic retries, catch blocks, and compensation logic
  </Card>

  <Card title="200+ Integrations" icon="plug">
    Native integration with Lambda, DynamoDB, SQS, SNS, and more
  </Card>

  <Card title="Audit Trail" icon="clipboard-list">
    Complete execution history for debugging and compliance
  </Card>
</CardGroup>

***

## State Machine Concepts

```
┌────────────────────────────────────────────────────────────────────────┐
│                    State Machine Components                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   STATE MACHINE                                                         │
│   ─────────────                                                        │
│   • Collection of states that define workflow                          │
│   • Starts at StartAt state, ends at End state                         │
│   • Executes synchronously or asynchronously                           │
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                                                                │    │
│   │    ┌──────────┐                                               │    │
│   │    │  START   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Validate │ ──── Task State (Lambda)                      │    │
│   │    │  Input   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐     ┌──────────┐                              │    │
│   │    │  Valid?  │────►│ Reject   │ ──── Choice State            │    │
│   │    │  (yes)   │ no  │          │                              │    │
│   │    └────┬─────┘     └──────────┘                              │    │
│   │         │ yes                                                  │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Process  │ ──── Task State (DynamoDB)                    │    │
│   │    │  Order   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │   END    │                                               │    │
│   │    └──────────┘                                               │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

***

## State Types

### Task State

Performs work by invoking an AWS service or activity.

```json theme={null}
{
  "ValidateOrder": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
    "InputPath": "$.order",
    "ResultPath": "$.validation",
    "OutputPath": "$",
    // Always set TimeoutSeconds. Without it, a stuck Lambda (e.g., waiting
    // on a downstream service that never responds) will hold the execution
    // open indefinitely. Standard workflows charge per state transition AND
    // keep the execution in "Running" state, which counts against your
    // concurrent execution quota (1M default, but still finite).
    "TimeoutSeconds": 30,
    "Next": "CheckValidation"
  }
}
```

### Choice State

Branching logic based on input.

```json theme={null}
{
  "CheckValidation": {
    "Type": "Choice",
    "Choices": [
      {
        "Variable": "$.validation.isValid",
        "BooleanEquals": true,
        "Next": "ProcessPayment"
      },
      {
        "Variable": "$.validation.errorCode",
        "StringEquals": "INSUFFICIENT_STOCK",
        "Next": "NotifyOutOfStock"
      }
    ],
    "Default": "RejectOrder"
  }
}
```

### Parallel State

Execute multiple branches simultaneously.

```json theme={null}
{
  "ProcessInParallel": {
    "Type": "Parallel",
    "Branches": [
      {
        "StartAt": "SendEmail",
        "States": {
          "SendEmail": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:send-email",
            "End": true
          }
        }
      },
      {
        "StartAt": "UpdateInventory",
        "States": {
          "UpdateInventory": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:update-inventory",
            "End": true
          }
        }
      }
    ],
    "Next": "FinalizeOrder"
  }
}
```

### Map State

Iterate over an array and process each item.

```json theme={null}
{
  "ProcessLineItems": {
    "Type": "Map",
    "InputPath": "$.order.items",
    "ItemsPath": "$",
    "MaxConcurrency": 10,
    "Iterator": {
      "StartAt": "ProcessItem",
      "States": {
        "ProcessItem": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-item",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedItems",
    "Next": "CalculateTotal"
  }
}
```

### Wait State

Pause execution for a specified time.

```json theme={null}
{
  "WaitForDelivery": {
    "Type": "Wait",
    "Seconds": 3600,
    "Next": "CheckDeliveryStatus"
  },
  "WaitUntilShipDate": {
    "Type": "Wait",
    "TimestampPath": "$.order.shipDate",
    "Next": "StartShipping"
  }
}
```

### Other States

```json theme={null}
{
  "PassThrough": {
    "Type": "Pass",
    "Result": {"status": "processed"},
    "ResultPath": "$.result",
    "Next": "NextState"
  },
  "OrderFailed": {
    "Type": "Fail",
    "Cause": "Order processing failed",
    "Error": "OrderError"
  },
  "OrderComplete": {
    "Type": "Succeed"
  }
}
```

***

## Input/Output Processing

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Data Flow Through States                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   State Input (raw)                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."}                         │
│   }                                                                     │
│         │                                                               │
│         │ InputPath: "$.order"                                         │
│         ▼                                                               │
│   Task Input                                                            │
│   {"id": "123", "items": [...], "total": 99.99}                        │
│         │                                                               │
│         │ Lambda executes                                               │
│         ▼                                                               │
│   Task Result                                                           │
│   {"validation": "success", "discount": 10.00}                         │
│         │                                                               │
│         │ ResultPath: "$.orderValidation"                              │
│         ▼                                                               │
│   State with Result                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."},                        │
│     "orderValidation": {"validation": "success", "discount": 10.00}    │
│   }                                                                     │
│         │                                                               │
│         │ OutputPath: "$"                                              │
│         ▼                                                               │
│   State Output (passed to next state)                                   │
│   (same as above)                                                       │
│                                                                         │
│   Path Reference:                                                       │
│   InputPath:  What to send to task                                     │
│   ResultPath: Where to put task result (null = discard)                │
│   OutputPath: What to pass to next state                               │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Intrinsic Functions

```json theme={null}
{
  "TransformData": {
    "Type": "Pass",
    "Parameters": {
      "orderId.$": "States.UUID()",
      "orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
      "itemCount.$": "States.ArrayLength($.items)",
      "fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
      "items.$": "States.ArrayPartition($.allItems, 10)",
      "jsonString.$": "States.JsonToString($.data)",
      "parsedJson.$": "States.StringToJson($.jsonString)"
    },
    "Next": "ProcessOrder"
  }
}
```

***

## Error Handling

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Error Handling Pattern                               │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │ ProcessPayment                                                 │    │
│   │ ┌─────────────────────────────────────────────────────────┐   │    │
│   │ │ Lambda: process-payment                                  │   │    │
│   │ │                                                          │   │    │
│   │ │ Retry:                                                   │   │    │
│   │ │ • 3 attempts                                             │   │    │
│   │ │ • 1s → 2s → 4s (exponential backoff)                    │   │    │
│   │ │                                                          │   │    │
│   │ │ Catch:                                                   │   │    │
│   │ │ • PaymentDeclined → HandleDeclined                       │   │    │
│   │ │ • States.ALL → FallbackHandler                          │   │    │
│   │ └─────────────────────────────────────────────────────────┘   │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Error Types:                                                          │
│   ─────────────                                                        │
│   States.ALL          - Matches any error                              │
│   States.Timeout      - Task timed out                                 │
│   States.TaskFailed   - Lambda execution error                         │
│   States.Permissions  - Permission error                               │
│   Custom.PaymentFailed - Custom error from Lambda                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

### Retry Configuration

```json theme={null}
{
  "ProcessPayment": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:process-payment",
    "Retry": [
      {
        "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
        "IntervalSeconds": 1,
        "MaxAttempts": 3,
        "BackoffRate": 2.0,
        "MaxDelaySeconds": 30
      },
      {
        "ErrorEquals": ["States.Timeout"],
        "IntervalSeconds": 5,
        "MaxAttempts": 2,
        "BackoffRate": 1.0
      }
    ],
    "Catch": [
      {
        "ErrorEquals": ["PaymentDeclined"],
        "ResultPath": "$.error",
        "Next": "HandleDeclinedPayment"
      },
      {
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "NotifyFailure"
      }
    ],
    "Next": "ConfirmOrder"
  }
}
```

### Lambda Error Throwing

```python theme={null}
class PaymentDeclinedException(Exception):
    pass

class InsufficientFundsException(Exception):
    pass

def lambda_handler(event, context):
    try:
        result = process_payment(event)
        return result
        
    except CardDeclinedError:
        # Step Functions will catch this
        raise PaymentDeclinedException("Card was declined")
        
    except InsufficientFundsError:
        raise InsufficientFundsException("Insufficient funds in account")
```

***

## Standard vs Express Workflows

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Workflow Type Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Feature              │ Standard              │ Express               │
│   ─────────────────────┼──────────────────────┼────────────────────── │
│   Duration             │ Up to 1 year         │ Up to 5 minutes       │
│   Execution History    │ Stored (90 days)     │ CloudWatch Logs only  │
│   Start Rate           │ 2,000/sec            │ 100,000/sec           │
│   Pricing              │ Per state transition │ Per execution + dur.  │
│   Idempotency          │ Exactly-once         │ At-least-once         │
│   Execution Type       │ Async (default)      │ Sync or Async         │
│                                                                         │
│   When to Use Standard:                                                │
│   ─────────────────────                                                │
│   ✓ Long-running workflows (hours/days)                                │
│   ✓ Need execution history for audit                                   │
│   ✓ Require exactly-once semantics                                     │
│   ✓ Human approval workflows                                           │
│                                                                         │
│   When to Use Express:                                                  │
│   ────────────────────                                                 │
│   ✓ High-volume, short-duration workflows                              │
│   ✓ Event processing pipelines                                         │
│   ✓ API orchestration                                                  │
│   ✓ Cost-sensitive scenarios                                           │
│                                                                         │
│   Pricing Example (1M executions, 5 state transitions each):          │
│   Standard: 5M transitions x $0.000025 = $125                          │
│   Express:  1M x $1.00/M + duration charges = ~$10-20                  │
│                                                                         │
│   Cost mistake: Using Standard workflows for high-volume, short-lived  │
│   tasks (like API orchestration). A team processing 10M API requests/   │
│   month with 8 transitions each pays $2,000 on Standard vs ~$100 on    │
│   Express. Rule of thumb: if it finishes in under 5 minutes and you     │
│   can tolerate at-least-once semantics, use Express.                    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

***

## Service Integrations

### Direct Service Integrations

```json theme={null}
{
  "Comment": "Direct integrations without Lambda",
  "States": {
    "PutItemInDynamoDB": {
      "Type": "Task",
      // Direct service integration -- no Lambda needed. This calls
      // DynamoDB directly from Step Functions, saving both the cost of
      // a Lambda invocation (~$0.20/M) and ~100ms of cold-start latency.
      // Use direct integrations whenever the operation is a simple
      // service call that doesn't need custom business logic.
      "Resource": "arn:aws:states:::dynamodb:putItem",
      "Parameters": {
        "TableName": "Orders",
        "Item": {
          "order_id": {"S.$": "$.orderId"},
          "status": {"S": "CREATED"},
          "created_at": {"S.$": "$$.State.EnteredTime"}
        }
      },
      "Next": "SendNotification"
    },
    
    "SendNotification": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
        "Message.$": "States.Format('Order {} created', $.orderId)"
      },
      "Next": "AddToQueue"
    },
    
    "AddToQueue": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sqs:sendMessage",
      "Parameters": {
        "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
        "MessageBody.$": "States.JsonToString($.order)"
      },
      "Next": "Done"
    },
    
    "InvokeLambda": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "process-order",
        "Payload.$": "$"
      },
      "Next": "Done"
    },
    
    "StartECSTask": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "my-cluster",
        "TaskDefinition": "process-batch",
        "LaunchType": "FARGATE"
      },
      "Next": "Done"
    }
  }
}
```

### Wait for Callback Pattern

This is one of the most powerful Step Functions patterns. The workflow pauses and waits for an external system (a human, a webhook, a third-party API) to call back with a result. The execution stays in "Waiting" state without consuming compute or costing money beyond the initial state transition. Real-world use cases include: manager approval for expenses, waiting for a payment processor webhook, or pausing until a manual QA review is complete.

```json theme={null}
{
  "WaitForHumanApproval": {
    "Type": "Task",
    // waitForTaskToken pauses the workflow until an external system calls
    // SendTaskSuccess or SendTaskFailure with the token. The workflow
    // consumes no compute while waiting -- you only pay for the state
    // transition, not the wait time. This is the pattern for human
    // approvals, external webhooks, or any async callback.
    "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
    "Parameters": {
      "FunctionName": "send-approval-request",
      "Payload": {
        "orderId.$": "$.orderId",
        "approver": "manager@company.com",
        "taskToken.$": "$$.Task.Token"
      }
    },
    // Common mistake: Not setting a timeout on callback tasks. Without
    // TimeoutSeconds, the execution waits forever if the callback never
    // arrives (e.g., the approver ignores the email). This counts against
    // your open executions quota and the execution eventually becomes
    // unrecoverable. Always set a reasonable timeout with a Catch block
    // for States.Timeout that sends a reminder or auto-rejects.
    "TimeoutSeconds": 86400,
    "Next": "ProcessApprovedOrder"
  }
}
```

```python theme={null}
# Lambda that sends approval request
def send_approval_request(event, context):
    task_token = event['taskToken']
    order_id = event['orderId']
    
    # Store token for later callback
    dynamodb.put_item(
        TableName='PendingApprovals',
        Item={
            'order_id': order_id,
            'task_token': task_token,
            'status': 'PENDING'
        }
    )
    
    # Send email with approval link
    ses.send_email(
        To=event['approver'],
        Subject=f'Approval needed for order {order_id}',
        Body=f'Click to approve: https://api.../approve?order={order_id}'
    )

# API endpoint that handles approval
def handle_approval(event, context):
    order_id = event['queryStringParameters']['order_id']
    action = event['queryStringParameters']['action']  # approve/reject
    
    # Get stored token
    item = dynamodb.get_item(
        TableName='PendingApprovals',
        Key={'order_id': order_id}
    )
    task_token = item['Item']['task_token']
    
    # Resume Step Function
    if action == 'approve':
        stepfunctions.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        stepfunctions.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )
```

***

## Common Workflow Patterns

### Saga Pattern (Distributed Transaction)

```
┌────────────────────────────────────────────────────────────────────────┐
│                    Saga Pattern                                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Each step has a compensating action for rollback:                    │
│                                                                         │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Reserve  │───►│  Charge   │───►│  Ship     │───► Success          │
│   │  Stock    │    │  Payment  │    │  Order    │                      │
│   └─────┬─────┘    └─────┬─────┘    └─────┬─────┘                      │
│         │                │                │                            │
│         │ Fail           │ Fail           │ Fail                       │
│         ▼                ▼                ▼                            │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Cancel   │◄───│  Refund   │◄───│  Cancel   │                      │
│   │  Reserve  │    │  Payment  │    │  Shipment │                      │
│   └───────────┘    └───────────┘    └───────────┘                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘
```

```json theme={null}
{
  "StartAt": "ReserveStock",
  "States": {
    "ReserveStock": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:reserve-stock",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "OrderFailed"
      }],
      "Next": "ChargePayment"
    },
    "ChargePayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:charge-payment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "CancelReservation"
      }],
      "Next": "ShipOrder"
    },
    "ShipOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ship-order",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "RefundPayment"
      }],
      "Next": "OrderComplete"
    },
    "CancelReservation": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:cancel-reservation",
      "Next": "OrderFailed"
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:refund-payment",
      "Next": "CancelReservation"
    },
    "OrderComplete": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order could not be completed"
    }
  }
}
```

### Fan-Out/Fan-In Pattern

```json theme={null}
{
  "ProcessAllOrders": {
    "Type": "Map",
    "InputPath": "$.orders",
    "MaxConcurrency": 50,
    "ItemProcessor": {
      "ProcessorConfig": {
        "Mode": "DISTRIBUTED",
        "ExecutionType": "EXPRESS"
      },
      "StartAt": "ProcessOrder",
      "States": {
        "ProcessOrder": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-order",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedOrders",
    "Next": "AggregateResults"
  }
}
```

***

## Best Practices

<CardGroup cols={2}>
  <Card title="Design for Idempotency" icon="shield-check">
    Tasks may retry—ensure operations are safe to repeat
  </Card>

  <Card title="Use ResultPath Wisely" icon="route">
    Preserve input data while adding task results
  </Card>

  <Card title="Set Timeouts" icon="clock">
    Always set TimeoutSeconds to prevent stuck executions
  </Card>

  <Card title="Use Express for High Volume" icon="bolt">
    Express workflows are much cheaper for short tasks
  </Card>
</CardGroup>

***

## 🎯 Interview Questions

<AccordionGroup>
  <Accordion title="Q1: When to use Step Functions vs SQS + Lambda?">
    **Step Functions:**

    * Complex orchestration with branching
    * Need visibility into workflow state
    * Error handling with retries and fallbacks
    * Long-running workflows

    **SQS + Lambda:**

    * Simple event processing
    * High-volume, independent tasks
    * Don't need orchestration
    * Cost-sensitive (cheaper at high scale)
  </Accordion>

  <Accordion title="Q2: How to handle long-running operations?">
    **Options:**

    1. **Wait for Task Token**: Pause execution, resume via callback
    2. **Activity Tasks**: Worker polls for tasks, reports completion
    3. **Async Lambda**: Start Lambda, use callback pattern

    **Example**: Human approval, external system integration
  </Accordion>

  <Accordion title="Q3: Standard vs Express - how to choose?">
    **Use Standard when:**

    * Execution longer than 5 minutes
    * Need execution history for audit
    * Require exactly-once semantics

    **Use Express when:**

    * High volume (over 1000/sec)
    * Short duration (under 5 min)
    * Cost-sensitive
    * At-least-once is acceptable
  </Accordion>
</AccordionGroup>

***

## Next Module

<Card title="AWS SAM" icon="box" href="/aws/sam">
  Build serverless applications with SAM
</Card>
