AWS Step Functions - Dev Weekends

Module Overview

Estimated Time: 4-5 hours | Difficulty: Intermediate | Prerequisites: Lambda

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows using visual state machines. This module covers workflow design patterns, error handling, and production best practices. What You’ll Learn:

State machine concepts and design
State types (Task, Choice, Parallel, Map)
Error handling and retries
Standard vs Express workflows
Service integrations
Workflow patterns for common use cases

Why Step Functions?

Visual Workflows

Design and visualize complex business processes as state machines

Built-in Error Handling

Automatic retries, catch blocks, and compensation logic

200+ Integrations

Native integration with Lambda, DynamoDB, SQS, SNS, and more

Audit Trail

Complete execution history for debugging and compliance

State Machine Concepts

┌────────────────────────────────────────────────────────────────────────┐
│                    State Machine Components                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   STATE MACHINE                                                         │
│   ─────────────                                                        │
│   • Collection of states that define workflow                          │
│   • Starts at StartAt state, ends at End state                         │
│   • Executes synchronously or asynchronously                           │
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                                                                │    │
│   │    ┌──────────┐                                               │    │
│   │    │  START   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Validate │ ──── Task State (Lambda)                      │    │
│   │    │  Input   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐     ┌──────────┐                              │    │
│   │    │  Valid?  │────►│ Reject   │ ──── Choice State            │    │
│   │    │  (yes)   │ no  │          │                              │    │
│   │    └────┬─────┘     └──────────┘                              │    │
│   │         │ yes                                                  │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │ Process  │ ──── Task State (DynamoDB)                    │    │
│   │    │  Order   │                                               │    │
│   │    └────┬─────┘                                               │    │
│   │         │                                                      │    │
│   │         ▼                                                      │    │
│   │    ┌──────────┐                                               │    │
│   │    │   END    │                                               │    │
│   │    └──────────┘                                               │    │
│   │                                                                │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

State Types

Task State

Performs work by invoking an AWS service or activity.

{
  "ValidateOrder": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:validate-order",
    "InputPath": "$.order",
    "ResultPath": "$.validation",
    "OutputPath": "$",
    "TimeoutSeconds": 30,
    "Next": "CheckValidation"
  }
}

Choice State

Branching logic based on input.

{
  "CheckValidation": {
    "Type": "Choice",
    "Choices": [
      {
        "Variable": "$.validation.isValid",
        "BooleanEquals": true,
        "Next": "ProcessPayment"
      },
      {
        "Variable": "$.validation.errorCode",
        "StringEquals": "INSUFFICIENT_STOCK",
        "Next": "NotifyOutOfStock"
      }
    ],
    "Default": "RejectOrder"
  }
}

Parallel State

Execute multiple branches simultaneously.

{
  "ProcessInParallel": {
    "Type": "Parallel",
    "Branches": [
      {
        "StartAt": "SendEmail",
        "States": {
          "SendEmail": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:send-email",
            "End": true
          }
        }
      },
      {
        "StartAt": "UpdateInventory",
        "States": {
          "UpdateInventory": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:...:update-inventory",
            "End": true
          }
        }
      }
    ],
    "Next": "FinalizeOrder"
  }
}

Map State

Iterate over an array and process each item.

{
  "ProcessLineItems": {
    "Type": "Map",
    "InputPath": "$.order.items",
    "ItemsPath": "$",
    "MaxConcurrency": 10,
    "Iterator": {
      "StartAt": "ProcessItem",
      "States": {
        "ProcessItem": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-item",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedItems",
    "Next": "CalculateTotal"
  }
}

Wait State

Pause execution for a specified time.

{
  "WaitForDelivery": {
    "Type": "Wait",
    "Seconds": 3600,
    "Next": "CheckDeliveryStatus"
  },
  "WaitUntilShipDate": {
    "Type": "Wait",
    "TimestampPath": "$.order.shipDate",
    "Next": "StartShipping"
  }
}

Other States

{
  "PassThrough": {
    "Type": "Pass",
    "Result": {"status": "processed"},
    "ResultPath": "$.result",
    "Next": "NextState"
  },
  "OrderFailed": {
    "Type": "Fail",
    "Cause": "Order processing failed",
    "Error": "OrderError"
  },
  "OrderComplete": {
    "Type": "Succeed"
  }
}

Input/Output Processing

┌────────────────────────────────────────────────────────────────────────┐
│                    Data Flow Through States                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   State Input (raw)                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."}                         │
│   }                                                                     │
│         │                                                               │
│         │ InputPath: "$.order"                                         │
│         ▼                                                               │
│   Task Input                                                            │
│   {"id": "123", "items": [...], "total": 99.99}                        │
│         │                                                               │
│         │ Lambda executes                                               │
│         ▼                                                               │
│   Task Result                                                           │
│   {"validation": "success", "discount": 10.00}                         │
│         │                                                               │
│         │ ResultPath: "$.orderValidation"                              │
│         ▼                                                               │
│   State with Result                                                     │
│   {                                                                     │
│     "order": {"id": "123", "items": [...], "total": 99.99},            │
│     "customer": {"id": "C001", "email": "..."},                        │
│     "orderValidation": {"validation": "success", "discount": 10.00}    │
│   }                                                                     │
│         │                                                               │
│         │ OutputPath: "$"                                              │
│         ▼                                                               │
│   State Output (passed to next state)                                   │
│   (same as above)                                                       │
│                                                                         │
│   Path Reference:                                                       │
│   InputPath:  What to send to task                                     │
│   ResultPath: Where to put task result (null = discard)                │
│   OutputPath: What to pass to next state                               │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Intrinsic Functions

{
  "TransformData": {
    "Type": "Pass",
    "Parameters": {
      "orderId.$": "States.UUID()",
      "orderDate.$": "States.Format('Order placed at {}', $$.State.EnteredTime)",
      "itemCount.$": "States.ArrayLength($.items)",
      "fullName.$": "States.Format('{} {}', $.firstName, $.lastName)",
      "items.$": "States.ArrayPartition($.allItems, 10)",
      "jsonString.$": "States.JsonToString($.data)",
      "parsedJson.$": "States.StringToJson($.jsonString)"
    },
    "Next": "ProcessOrder"
  }
}

Error Handling

┌────────────────────────────────────────────────────────────────────────┐
│                    Error Handling Pattern                               │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │ ProcessPayment                                                 │    │
│   │ ┌─────────────────────────────────────────────────────────┐   │    │
│   │ │ Lambda: process-payment                                  │   │    │
│   │ │                                                          │   │    │
│   │ │ Retry:                                                   │   │    │
│   │ │ • 3 attempts                                             │   │    │
│   │ │ • 1s → 2s → 4s (exponential backoff)                    │   │    │
│   │ │                                                          │   │    │
│   │ │ Catch:                                                   │   │    │
│   │ │ • PaymentDeclined → HandleDeclined                       │   │    │
│   │ │ • States.ALL → FallbackHandler                          │   │    │
│   │ └─────────────────────────────────────────────────────────┘   │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                         │
│   Error Types:                                                          │
│   ─────────────                                                        │
│   States.ALL          - Matches any error                              │
│   States.Timeout      - Task timed out                                 │
│   States.TaskFailed   - Lambda execution error                         │
│   States.Permissions  - Permission error                               │
│   Custom.PaymentFailed - Custom error from Lambda                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Retry Configuration

{
  "ProcessPayment": {
    "Type": "Task",
    "Resource": "arn:aws:lambda:...:process-payment",
    "Retry": [
      {
        "ErrorEquals": ["Lambda.ServiceException", "Lambda.TooManyRequestsException"],
        "IntervalSeconds": 1,
        "MaxAttempts": 3,
        "BackoffRate": 2.0,
        "MaxDelaySeconds": 30
      },
      {
        "ErrorEquals": ["States.Timeout"],
        "IntervalSeconds": 5,
        "MaxAttempts": 2,
        "BackoffRate": 1.0
      }
    ],
    "Catch": [
      {
        "ErrorEquals": ["PaymentDeclined"],
        "ResultPath": "$.error",
        "Next": "HandleDeclinedPayment"
      },
      {
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "NotifyFailure"
      }
    ],
    "Next": "ConfirmOrder"
  }
}

Lambda Error Throwing

class PaymentDeclinedException(Exception):
    pass

class InsufficientFundsException(Exception):
    pass

def lambda_handler(event, context):
    try:
        result = process_payment(event)
        return result
        
    except CardDeclinedError:
        # Step Functions will catch this
        raise PaymentDeclinedException("Card was declined")
        
    except InsufficientFundsError:
        raise InsufficientFundsException("Insufficient funds in account")

Standard vs Express Workflows

┌────────────────────────────────────────────────────────────────────────┐
│                    Workflow Type Comparison                             │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Feature              │ Standard              │ Express               │
│   ─────────────────────┼──────────────────────┼────────────────────── │
│   Duration             │ Up to 1 year         │ Up to 5 minutes       │
│   Execution History    │ Stored (90 days)     │ CloudWatch Logs only  │
│   Start Rate           │ 2,000/sec            │ 100,000/sec           │
│   Pricing              │ Per state transition │ Per execution + dur.  │
│   Idempotency          │ Exactly-once         │ At-least-once         │
│   Execution Type       │ Async (default)      │ Sync or Async         │
│                                                                         │
│   When to Use Standard:                                                │
│   ─────────────────────                                                │
│   ✓ Long-running workflows (hours/days)                                │
│   ✓ Need execution history for audit                                   │
│   ✓ Require exactly-once semantics                                     │
│   ✓ Human approval workflows                                           │
│                                                                         │
│   When to Use Express:                                                  │
│   ────────────────────                                                 │
│   ✓ High-volume, short-duration workflows                              │
│   ✓ Event processing pipelines                                         │
│   ✓ API orchestration                                                  │
│   ✓ Cost-sensitive scenarios                                           │
│                                                                         │
│   Pricing Example (1M executions, 5 state transitions each):          │
│   Standard: 5M transitions × $0.000025 = $125                          │
│   Express:  1M × $1.00/M + duration charges = ~$10-20                  │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

Service Integrations

Direct Service Integrations

{
  "Comment": "Direct integrations without Lambda",
  "States": {
    "PutItemInDynamoDB": {
      "Type": "Task",
      "Resource": "arn:aws:states:::dynamodb:putItem",
      "Parameters": {
        "TableName": "Orders",
        "Item": {
          "order_id": {"S.$": "$.orderId"},
          "status": {"S": "CREATED"},
          "created_at": {"S.$": "$$.State.EnteredTime"}
        }
      },
      "Next": "SendNotification"
    },
    
    "SendNotification": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:123456789012:order-notifications",
        "Message.$": "States.Format('Order {} created', $.orderId)"
      },
      "Next": "AddToQueue"
    },
    
    "AddToQueue": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sqs:sendMessage",
      "Parameters": {
        "QueueUrl": "https://sqs.us-east-1.amazonaws.com/123456789012/orders",
        "MessageBody.$": "States.JsonToString($.order)"
      },
      "Next": "Done"
    },
    
    "InvokeLambda": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "process-order",
        "Payload.$": "$"
      },
      "Next": "Done"
    },
    
    "StartECSTask": {
      "Type": "Task",
      "Resource": "arn:aws:states:::ecs:runTask.sync",
      "Parameters": {
        "Cluster": "my-cluster",
        "TaskDefinition": "process-batch",
        "LaunchType": "FARGATE"
      },
      "Next": "Done"
    }
  }
}

Wait for Callback Pattern

{
  "WaitForHumanApproval": {
    "Type": "Task",
    "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
    "Parameters": {
      "FunctionName": "send-approval-request",
      "Payload": {
        "orderId.$": "$.orderId",
        "approver": "manager@company.com",
        "taskToken.$": "$$.Task.Token"
      }
    },
    "TimeoutSeconds": 86400,
    "Next": "ProcessApprovedOrder"
  }
}

# Lambda that sends approval request
def send_approval_request(event, context):
    task_token = event['taskToken']
    order_id = event['orderId']
    
    # Store token for later callback
    dynamodb.put_item(
        TableName='PendingApprovals',
        Item={
            'order_id': order_id,
            'task_token': task_token,
            'status': 'PENDING'
        }
    )
    
    # Send email with approval link
    ses.send_email(
        To=event['approver'],
        Subject=f'Approval needed for order {order_id}',
        Body=f'Click to approve: https://api.../approve?order={order_id}'
    )

# API endpoint that handles approval
def handle_approval(event, context):
    order_id = event['queryStringParameters']['order_id']
    action = event['queryStringParameters']['action']  # approve/reject
    
    # Get stored token
    item = dynamodb.get_item(
        TableName='PendingApprovals',
        Key={'order_id': order_id}
    )
    task_token = item['Item']['task_token']
    
    # Resume Step Function
    if action == 'approve':
        stepfunctions.send_task_success(
            taskToken=task_token,
            output=json.dumps({'approved': True})
        )
    else:
        stepfunctions.send_task_failure(
            taskToken=task_token,
            error='Rejected',
            cause='Manager rejected the order'
        )

Common Workflow Patterns

Saga Pattern (Distributed Transaction)

┌────────────────────────────────────────────────────────────────────────┐
│                    Saga Pattern                                         │
├────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   Each step has a compensating action for rollback:                    │
│                                                                         │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Reserve  │───►│  Charge   │───►│  Ship     │───► Success          │
│   │  Stock    │    │  Payment  │    │  Order    │                      │
│   └─────┬─────┘    └─────┬─────┘    └─────┬─────┘                      │
│         │                │                │                            │
│         │ Fail           │ Fail           │ Fail                       │
│         ▼                ▼                ▼                            │
│   ┌───────────┐    ┌───────────┐    ┌───────────┐                      │
│   │  Cancel   │◄───│  Refund   │◄───│  Cancel   │                      │
│   │  Reserve  │    │  Payment  │    │  Shipment │                      │
│   └───────────┘    └───────────┘    └───────────┘                      │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

{
  "StartAt": "ReserveStock",
  "States": {
    "ReserveStock": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:reserve-stock",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "Next": "OrderFailed"
      }],
      "Next": "ChargePayment"
    },
    "ChargePayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:charge-payment",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "CancelReservation"
      }],
      "Next": "ShipOrder"
    },
    "ShipOrder": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:ship-order",
      "Catch": [{
        "ErrorEquals": ["States.ALL"],
        "ResultPath": "$.error",
        "Next": "RefundPayment"
      }],
      "Next": "OrderComplete"
    },
    "CancelReservation": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:cancel-reservation",
      "Next": "OrderFailed"
    },
    "RefundPayment": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:...:refund-payment",
      "Next": "CancelReservation"
    },
    "OrderComplete": {
      "Type": "Succeed"
    },
    "OrderFailed": {
      "Type": "Fail",
      "Error": "OrderProcessingFailed",
      "Cause": "Order could not be completed"
    }
  }
}

Fan-Out/Fan-In Pattern

{
  "ProcessAllOrders": {
    "Type": "Map",
    "InputPath": "$.orders",
    "MaxConcurrency": 50,
    "ItemProcessor": {
      "ProcessorConfig": {
        "Mode": "DISTRIBUTED",
        "ExecutionType": "EXPRESS"
      },
      "StartAt": "ProcessOrder",
      "States": {
        "ProcessOrder": {
          "Type": "Task",
          "Resource": "arn:aws:lambda:...:process-order",
          "End": true
        }
      }
    },
    "ResultPath": "$.processedOrders",
    "Next": "AggregateResults"
  }
}

Best Practices

Design for Idempotency

Tasks may retry—ensure operations are safe to repeat

Use ResultPath Wisely

Preserve input data while adding task results

Set Timeouts

Always set TimeoutSeconds to prevent stuck executions

Use Express for High Volume

Express workflows are much cheaper for short tasks

🎯 Interview Questions

Q1: When to use Step Functions vs SQS + Lambda?

Step Functions:

Complex orchestration with branching
Need visibility into workflow state
Error handling with retries and fallbacks
Long-running workflows

SQS + Lambda:

Simple event processing
High-volume, independent tasks
Don’t need orchestration
Cost-sensitive (cheaper at high scale)

Q2: How to handle long-running operations?

Options:

Wait for Task Token: Pause execution, resume via callback
Activity Tasks: Worker polls for tasks, reports completion
Async Lambda: Start Lambda, use callback pattern

Example: Human approval, external system integration

Q3: Standard vs Express - how to choose?

Use Standard when:

Execution longer than 5 minutes
Need execution history for audit
Require exactly-once semantics

Use Express when:

High volume (over 1000/sec)
Short duration (under 5 min)
Cost-sensitive
At-least-once is acceptable

Next Module

AWS SAM

Build serverless applications with SAM

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Module Overview

​Why Step Functions?

Visual Workflows

Built-in Error Handling

200+ Integrations

Audit Trail

​State Machine Concepts

​State Types

​Task State

​Choice State

​Parallel State

​Map State

​Wait State

​Other States

​Input/Output Processing

​Intrinsic Functions

​Error Handling

​Retry Configuration

​Lambda Error Throwing

​Standard vs Express Workflows

​Service Integrations

​Direct Service Integrations

​Wait for Callback Pattern

​Common Workflow Patterns

​Saga Pattern (Distributed Transaction)

​Fan-Out/Fan-In Pattern

​Best Practices

Design for Idempotency

Use ResultPath Wisely

Set Timeouts

Use Express for High Volume

​🎯 Interview Questions

​Next Module

AWS SAM

Module Overview

Why Step Functions?

State Machine Concepts

State Types

Task State

Choice State

Parallel State

Map State

Wait State

Other States

Input/Output Processing

Intrinsic Functions

Error Handling

Retry Configuration

Lambda Error Throwing

Standard vs Express Workflows

Service Integrations

Direct Service Integrations

Wait for Callback Pattern

Common Workflow Patterns

Saga Pattern (Distributed Transaction)

Fan-Out/Fan-In Pattern

Best Practices

🎯 Interview Questions

Next Module