Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Conversational AI requires careful design of dialogue flows, state management, and context handling. This chapter covers proven patterns for building production chatbots. Think of a chatbot like a waiter at a restaurant. A bad waiter asks you to repeat your order three times, forgets you asked for no onions, and brings the check before dessert. A great waiter remembers your preferences, anticipates what you need next, and gracefully handles “actually, can I change my order?” The patterns below are the training manual for building that great waiter.

Conversation State Machine

A state machine is to a chatbot what a flowchart is to a customer service script. Without one, your bot is just reacting to the last message with no memory of where it is in the conversation. With one, it knows “I’ve collected the departure city but still need the date” — and can behave accordingly.

Basic State Management

from openai import OpenAI
from dataclasses import dataclass, field
from enum import Enum
from typing import Any
import json


class ConversationState(Enum):
    """States in the conversation flow."""
    GREETING = "greeting"
    GATHERING_INFO = "gathering_info"
    CONFIRMING = "confirming"
    PROCESSING = "processing"
    COMPLETED = "completed"
    ERROR = "error"


@dataclass
class ConversationContext:
    """Holds conversation state and collected data."""
    state: ConversationState = ConversationState.GREETING
    collected_data: dict = field(default_factory=dict)
    history: list = field(default_factory=list)
    retry_count: int = 0
    metadata: dict = field(default_factory=dict)


class StatefulChatbot:
    """Chatbot with explicit state management."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
        self.context = ConversationContext()
    
    def _get_system_prompt(self) -> str:
        """Get system prompt based on current state."""
        prompts = {
            ConversationState.GREETING: """You are a helpful assistant. 
                Greet the user warmly and ask how you can help them today.
                Keep it brief and friendly.""",
            
            ConversationState.GATHERING_INFO: """You are gathering information.
                Ask clarifying questions one at a time.
                Be conversational but focused.
                Acknowledge what the user tells you.""",
            
            ConversationState.CONFIRMING: """You are confirming details.
                Summarize what you've collected and ask for confirmation.
                Be clear and organized in your summary.""",
            
            ConversationState.PROCESSING: """You are processing the request.
                Acknowledge the request and explain next steps.
                Be reassuring and informative.""",
            
            ConversationState.COMPLETED: """The task is complete.
                Thank the user and offer further assistance.
                Be warm and professional.""",
        }
        
        return prompts.get(
            self.context.state,
            "You are a helpful assistant. Respond appropriately."
        )
    
    def _build_messages(self, user_input: str) -> list[dict]:
        """Build message list for API call."""
        messages = [{"role": "system", "content": self._get_system_prompt()}]
        
        # Add conversation history
        for msg in self.context.history[-10:]:  # Last 10 messages
            messages.append(msg)
        
        # Add current user input
        messages.append({"role": "user", "content": user_input})
        
        return messages
    
    def _determine_transition(self, user_input: str, response: str) -> ConversationState:
        """Determine next state based on conversation.
        
        Pitfall: Using an LLM for state transitions adds latency and cost
        to every turn. In production, prefer rule-based transitions for
        predictable flows (e.g., "did the user say yes?") and reserve
        LLM-based classification for ambiguous cases only.
        """
        analysis_prompt = f"""Analyze this conversation turn and determine the appropriate next state.

Current state: {self.context.state.value}
Collected data: {json.dumps(self.context.collected_data)}
User said: {user_input}
Assistant said: {response}

Available states:
- greeting: Initial greeting
- gathering_info: Collecting required information
- confirming: Verifying collected information
- processing: Executing the request
- completed: Task finished
- error: Something went wrong

Return JSON: {{"next_state": "state_name", "reason": "brief reason"}}"""
        
        result = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": analysis_prompt}],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(result.choices[0].message.content)
        return ConversationState(data.get("next_state", "gathering_info"))
    
    def process_message(self, user_input: str) -> str:
        """Process user message and return response."""
        messages = self._build_messages(user_input)
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages
        )
        
        assistant_message = response.choices[0].message.content
        
        # Update history
        self.context.history.append({"role": "user", "content": user_input})
        self.context.history.append({"role": "assistant", "content": assistant_message})
        
        # Determine and apply state transition
        new_state = self._determine_transition(user_input, assistant_message)
        self.context.state = new_state
        
        return assistant_message
    
    def reset(self):
        """Reset conversation to initial state."""
        self.context = ConversationContext()


# Usage
chatbot = StatefulChatbot()

# Simulate conversation
print(chatbot.process_message("Hello!"))
print(f"State: {chatbot.context.state}")

print(chatbot.process_message("I need help booking a flight"))
print(f"State: {chatbot.context.state}")

Slot Filling Pattern

Slot filling is like filling out a form, except the user talks naturally instead of typing into labeled boxes. The bot’s job is to extract structured data (“New York”, “next Friday”, “2 passengers”) from unstructured sentences (“I want to fly from New York next Friday, just me and my wife”). The pattern below separates what you need to collect (the slots) from how you collect it (the conversation), making it easy to reuse across different domains.
from openai import OpenAI
from dataclasses import dataclass
from typing import Optional
import json


@dataclass
class Slot:
    """A piece of required information."""
    name: str
    description: str
    required: bool = True
    value: Optional[str] = None
    validation_prompt: str = None
    
    @property
    def is_filled(self) -> bool:
        return self.value is not None


class SlotFillingBot:
    """Bot that collects required information through conversation."""
    
    def __init__(self, slots: list[Slot], model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
        self.slots = {s.name: s for s in slots}
        self.history: list[dict] = []
    
    def _get_unfilled_slots(self) -> list[Slot]:
        """Get list of unfilled required slots."""
        return [s for s in self.slots.values() if s.required and not s.is_filled]
    
    def _extract_slot_values(self, user_input: str) -> dict:
        """Extract slot values from user input.
        
        Tip: Users often provide multiple pieces of info at once ("fly from
        NYC to London next Friday for 2 people"). This method handles that
        by extracting all slots in a single LLM call rather than asking
        one question at a time -- which would feel robotic.
        """
        slot_descriptions = {
            name: slot.description 
            for name, slot in self.slots.items()
        }
        
        prompt = f"""Extract information from the user message.

Required slots:
{json.dumps(slot_descriptions, indent=2)}

User message: "{user_input}"

Return JSON with extracted values. Use null for slots not mentioned:
{{"slot_name": "extracted_value_or_null", ...}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def _validate_slot(self, slot: Slot, value: str) -> tuple[bool, str]:
        """Validate a slot value."""
        if not slot.validation_prompt:
            return True, value
        
        prompt = f"""{slot.validation_prompt}

Value to validate: "{value}"

Return JSON: {{"valid": true/false, "corrected_value": "value or null", "reason": "if invalid"}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        
        if result.get("valid"):
            return True, result.get("corrected_value", value)
        return False, result.get("reason", "Invalid value")
    
    def _generate_question(self, slot: Slot) -> str:
        """Generate a natural question for a slot."""
        context = ""
        filled = [s for s in self.slots.values() if s.is_filled]
        if filled:
            context = "Already collected: " + ", ".join(
                f"{s.name}={s.value}" for s in filled
            )
        
        prompt = f"""Generate a natural, conversational question to ask for this information.
{context}

Slot to ask about:
- Name: {slot.name}
- Description: {slot.description}

Generate only the question, no preamble:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content.strip()
    
    def _generate_confirmation(self) -> str:
        """Generate confirmation message with all slots."""
        slots_summary = "\n".join(
            f"- {slot.name}: {slot.value}"
            for slot in self.slots.values()
            if slot.is_filled
        )
        
        prompt = f"""Generate a confirmation message summarizing this information:

{slots_summary}

Ask the user to confirm or correct anything. Be concise and clear:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def process_message(self, user_input: str) -> dict:
        """Process user message and return response with status."""
        # Extract slot values from input
        extracted = self._extract_slot_values(user_input)
        
        # Validate and fill slots
        for name, value in extracted.items():
            if value and name in self.slots:
                slot = self.slots[name]
                is_valid, result = self._validate_slot(slot, value)
                if is_valid:
                    slot.value = result
        
        # Update history
        self.history.append({"role": "user", "content": user_input})
        
        # Check if all required slots are filled
        unfilled = self._get_unfilled_slots()
        
        if not unfilled:
            # All slots filled - confirm
            response = self._generate_confirmation()
            status = "confirming"
        else:
            # Ask for next unfilled slot
            response = self._generate_question(unfilled[0])
            status = "collecting"
        
        self.history.append({"role": "assistant", "content": response})
        
        return {
            "response": response,
            "status": status,
            "filled_slots": {
                name: slot.value 
                for name, slot in self.slots.items() 
                if slot.is_filled
            },
            "unfilled_slots": [s.name for s in unfilled]
        }


# Usage - Flight booking example
slots = [
    Slot(
        name="origin",
        description="Departure city or airport",
        validation_prompt="Validate this is a valid city or airport name"
    ),
    Slot(
        name="destination",
        description="Arrival city or airport",
        validation_prompt="Validate this is a valid city or airport name"
    ),
    Slot(
        name="date",
        description="Travel date",
        validation_prompt="Validate this is a valid date format (YYYY-MM-DD preferred)"
    ),
    Slot(
        name="passengers",
        description="Number of passengers",
        validation_prompt="Validate this is a positive integer"
    )
]

bot = SlotFillingBot(slots)

# Simulate conversation
result = bot.process_message("I want to fly from New York to London")
print(result["response"])
print(f"Filled: {result['filled_slots']}")
print(f"Still need: {result['unfilled_slots']}")

result = bot.process_message("Next Friday, just me")
print(result["response"])
print(f"Status: {result['status']}")

Multi-Turn Context Management

from openai import OpenAI
from dataclasses import dataclass, field
from typing import Optional
import json


@dataclass
class ConversationTurn:
    """A single turn in conversation."""
    role: str
    content: str
    metadata: dict = field(default_factory=dict)


@dataclass
class Topic:
    """A conversation topic or thread."""
    name: str
    summary: str
    turns: list[ConversationTurn] = field(default_factory=list)
    resolved: bool = False


class ContextManager:
    """Manages multi-turn conversation context."""
    
    def __init__(
        self,
        max_history: int = 20,
        model: str = "gpt-4o-mini"
    ):
        self.client = OpenAI()
        self.model = model
        self.max_history = max_history
        self.history: list[ConversationTurn] = []
        self.topics: list[Topic] = []
        self.current_topic: Optional[Topic] = None
        self.user_profile: dict = {}
    
    def _summarize_old_context(self, turns: list[ConversationTurn]) -> str:
        """Summarize older conversation turns."""
        if not turns:
            return ""
        
        history_text = "\n".join(
            f"{t.role}: {t.content}" for t in turns
        )
        
        prompt = f"""Summarize this conversation history concisely.
Preserve key facts, decisions, and context needed for future turns.

History:
{history_text}

Summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def _detect_topic_change(self, message: str) -> bool:
        """Detect if user is changing topics."""
        if not self.current_topic:
            return True
        
        prompt = f"""Is this message changing to a new topic?

Current topic: {self.current_topic.name}
Topic summary: {self.current_topic.summary}

New message: "{message}"

Return JSON: {{"topic_change": true/false, "reason": "brief reason"}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        return result.get("topic_change", False)
    
    def _identify_topic(self, message: str) -> str:
        """Identify the topic of a message."""
        prompt = f"""Identify the main topic of this message in 2-4 words.

Message: "{message}"

Topic:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content.strip()
    
    def _extract_user_info(self, message: str) -> dict:
        """Extract user profile information from message."""
        prompt = f"""Extract any personal information the user mentions about themselves.

Message: "{message}"

Return JSON with any of: name, preferences, location, occupation, interests, or other relevant info.
Return empty object if nothing is mentioned."""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def add_message(self, role: str, content: str) -> dict:
        """Add a message and manage context."""
        turn = ConversationTurn(role=role, content=content)
        self.history.append(turn)
        
        # Extract user info if user message
        if role == "user":
            user_info = self._extract_user_info(content)
            self.user_profile.update(user_info)
            
            # Handle topic management
            if self._detect_topic_change(content):
                # Archive current topic
                if self.current_topic:
                    self.current_topic.resolved = True
                    self.topics.append(self.current_topic)
                
                # Start new topic
                topic_name = self._identify_topic(content)
                self.current_topic = Topic(
                    name=topic_name,
                    summary=content[:100]
                )
            
            if self.current_topic:
                self.current_topic.turns.append(turn)
        
        # Compress history if needed
        context_summary = ""
        if len(self.history) > self.max_history:
            old_turns = self.history[:-self.max_history]
            context_summary = self._summarize_old_context(old_turns)
            self.history = self.history[-self.max_history:]
        
        return {
            "context_summary": context_summary,
            "current_topic": self.current_topic.name if self.current_topic else None,
            "user_profile": self.user_profile,
            "history_length": len(self.history)
        }
    
    def get_context_for_prompt(self) -> str:
        """Get formatted context for LLM prompt."""
        parts = []
        
        # User profile
        if self.user_profile:
            parts.append(f"User profile: {json.dumps(self.user_profile)}")
        
        # Current topic
        if self.current_topic:
            parts.append(f"Current topic: {self.current_topic.name}")
        
        # Recent topics
        recent_topics = [t.name for t in self.topics[-3:] if t.resolved]
        if recent_topics:
            parts.append(f"Previous topics discussed: {', '.join(recent_topics)}")
        
        return "\n".join(parts)
    
    def get_messages(self) -> list[dict]:
        """Get history as message list for API."""
        return [
            {"role": t.role, "content": t.content}
            for t in self.history
        ]


# Usage
context_mgr = ContextManager(max_history=10)

# User provides information over time
context_mgr.add_message("user", "Hi, I'm Alex and I work in software engineering")
context_mgr.add_message("assistant", "Hello Alex! Nice to meet you. How can I help today?")

context_mgr.add_message("user", "I'm looking for Python learning resources")
context_mgr.add_message("assistant", "I can help with that. What's your current Python level?")

print(f"User profile: {context_mgr.user_profile}")
print(f"Current topic: {context_mgr.current_topic.name}")
print(f"Context: {context_mgr.get_context_for_prompt()}")

Intent Classification

Intent classification is the traffic cop at the front door of your chatbot. Before you can help a user, you need to know what they want: are they booking, canceling, or just asking a question? Getting this wrong means routing someone who wants to cancel to the booking flow — the chatbot equivalent of being transferred to the wrong department on a phone call.
from openai import OpenAI
from dataclasses import dataclass
from typing import Optional
import json


@dataclass
class Intent:
    """A user intent with handler."""
    name: str
    description: str
    examples: list[str]  # Few-shot examples improve classification accuracy
    handler: callable = None
    confidence_threshold: float = 0.8  # Below this, escalate or clarify


class IntentClassifier:
    """Classify user intents for routing."""
    
    def __init__(self, intents: list[Intent], model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
        self.intents = {i.name: i for i in intents}
    
    def _build_classification_prompt(self) -> str:
        """Build prompt for intent classification."""
        intent_descriptions = []
        for name, intent in self.intents.items():
            examples = ", ".join(f'"{e}"' for e in intent.examples[:3])
            intent_descriptions.append(
                f"- {name}: {intent.description}\n  Examples: {examples}"
            )
        
        return f"""Classify the user's intent into one of these categories:

{chr(10).join(intent_descriptions)}

Return JSON:
{{
    "intent": "intent_name",
    "confidence": 0.0-1.0,
    "entities": {{"extracted_entity": "value"}},
    "reasoning": "brief explanation"
}}"""
    
    def classify(self, message: str) -> dict:
        """Classify a user message."""
        system_prompt = self._build_classification_prompt()
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": message}
            ],
            response_format={"type": "json_object"}
        )
        
        result = json.loads(response.choices[0].message.content)
        
        intent_name = result.get("intent")
        confidence = result.get("confidence", 0)
        
        intent = self.intents.get(intent_name)
        if intent and confidence >= intent.confidence_threshold:
            result["matched"] = True
            result["intent_object"] = intent
        else:
            result["matched"] = False
        
        return result
    
    def route(self, message: str, fallback: callable = None) -> any:
        """Route message to appropriate handler."""
        classification = self.classify(message)
        
        if classification["matched"]:
            intent = classification["intent_object"]
            if intent.handler:
                return intent.handler(
                    message,
                    classification.get("entities", {})
                )
        
        if fallback:
            return fallback(message)
        
        return None


# Define handlers
def handle_booking(message: str, entities: dict) -> str:
    return f"Starting booking process. Extracted: {entities}"

def handle_status(message: str, entities: dict) -> str:
    return f"Checking status for: {entities}"

def handle_cancel(message: str, entities: dict) -> str:
    return f"Processing cancellation: {entities}"

def handle_help(message: str, entities: dict) -> str:
    return "Here are the things I can help with..."


# Create classifier
intents = [
    Intent(
        name="booking",
        description="User wants to make a new booking or reservation",
        examples=[
            "I want to book a flight",
            "Can you help me make a reservation?",
            "I need to schedule an appointment"
        ],
        handler=handle_booking
    ),
    Intent(
        name="status",
        description="User wants to check status of existing booking",
        examples=[
            "What's the status of my order?",
            "Where is my booking?",
            "Track my reservation"
        ],
        handler=handle_status
    ),
    Intent(
        name="cancel",
        description="User wants to cancel something",
        examples=[
            "Cancel my booking",
            "I need to cancel my order",
            "Remove my reservation"
        ],
        handler=handle_cancel
    ),
    Intent(
        name="help",
        description="User needs help or information",
        examples=[
            "Help",
            "What can you do?",
            "I need assistance"
        ],
        handler=handle_help
    )
]

classifier = IntentClassifier(intents)

# Classify messages
result = classifier.classify("I'd like to book a hotel for next week")
print(f"Intent: {result['intent']} (confidence: {result['confidence']})")
print(f"Entities: {result.get('entities', {})}")

# Route to handler
response = classifier.route("Check my order status please", fallback=lambda m: "I didn't understand that")
print(response)

Conversation Flows

from openai import OpenAI
from dataclasses import dataclass, field
from typing import Callable, Optional
from enum import Enum


class FlowStep(Enum):
    """Standard flow steps."""
    START = "start"
    COLLECT = "collect"
    VALIDATE = "validate"
    CONFIRM = "confirm"
    EXECUTE = "execute"
    COMPLETE = "complete"
    ERROR = "error"


@dataclass
class FlowNode:
    """A node in the conversation flow."""
    name: str
    prompt_template: str
    next_steps: dict = field(default_factory=dict)  # condition -> next_node
    validator: Optional[Callable] = None
    processor: Optional[Callable] = None


class ConversationFlow:
    """Define and execute conversation flows."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
        self.nodes: dict[str, FlowNode] = {}
        self.current_node: Optional[str] = None
        self.context: dict = {}
        self.history: list = []
    
    def add_node(self, node: FlowNode):
        """Add a node to the flow."""
        self.nodes[node.name] = node
    
    def start(self, start_node: str):
        """Start the flow at a specific node."""
        self.current_node = start_node
        return self._execute_node()
    
    def _execute_node(self) -> str:
        """Execute current node and return response."""
        node = self.nodes.get(self.current_node)
        if not node:
            return "Flow error: node not found"
        
        # Format prompt with context
        prompt = node.prompt_template.format(**self.context)
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": prompt}
            ] + self.history[-10:]
        )
        
        return response.choices[0].message.content
    
    def process_input(self, user_input: str) -> dict:
        """Process user input and advance flow."""
        self.history.append({"role": "user", "content": user_input})
        
        node = self.nodes.get(self.current_node)
        if not node:
            return {"error": "Invalid flow state"}
        
        # Run validator if present
        if node.validator:
            is_valid, result = node.validator(user_input, self.context)
            if not is_valid:
                response = f"Invalid input: {result}. Please try again."
                self.history.append({"role": "assistant", "content": response})
                return {
                    "response": response,
                    "node": self.current_node,
                    "valid": False
                }
        
        # Run processor if present
        if node.processor:
            self.context = node.processor(user_input, self.context)
        
        # Determine next node
        next_node = self._determine_next(node, user_input)
        if next_node:
            self.current_node = next_node
        
        # Execute new node
        response = self._execute_node()
        self.history.append({"role": "assistant", "content": response})
        
        return {
            "response": response,
            "node": self.current_node,
            "context": self.context,
            "valid": True
        }
    
    def _determine_next(self, node: FlowNode, user_input: str) -> Optional[str]:
        """Determine next node based on input and conditions."""
        # Check explicit conditions
        for condition, next_node in node.next_steps.items():
            if condition == "default":
                continue
            if condition.lower() in user_input.lower():
                return next_node
        
        # Use LLM for complex routing
        if len(node.next_steps) > 1:
            options = list(node.next_steps.keys())
            
            prompt = f"""Based on the user's response, which path should we take?

User said: "{user_input}"
Options: {options}

Return just the option name:"""
            
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}]
            )
            
            choice = response.choices[0].message.content.strip().lower()
            if choice in node.next_steps:
                return node.next_steps[choice]
        
        return node.next_steps.get("default")


# Example: Support ticket flow
def validate_email(user_input: str, context: dict) -> tuple[bool, str]:
    """Validate email format."""
    if "@" in user_input and "." in user_input:
        return True, user_input
    return False, "Please provide a valid email address"

def collect_email(user_input: str, context: dict) -> dict:
    """Store collected email."""
    context["email"] = user_input
    return context


# Build flow
flow = ConversationFlow()

flow.add_node(FlowNode(
    name="welcome",
    prompt_template="Welcome the user and ask for their email address for the support ticket.",
    next_steps={"default": "collect_email"}
))

flow.add_node(FlowNode(
    name="collect_email",
    prompt_template="Ask for the user's email address.",
    validator=validate_email,
    processor=collect_email,
    next_steps={"default": "collect_issue"}
))

flow.add_node(FlowNode(
    name="collect_issue",
    prompt_template="Email collected: {email}. Now ask them to describe their issue.",
    next_steps={
        "billing": "billing_flow",
        "technical": "technical_flow",
        "default": "general_support"
    }
))

# Run flow
print(flow.start("welcome"))
result = flow.process_input("user@example.com")
print(result["response"])

Error Handling and Recovery

The difference between a demo chatbot and a production one is what happens when things go wrong. Users will send gibberish, ask for things outside your scope, or get frustrated when the bot misunderstands. The pattern below implements three layers of defense: graceful clarification, frustration detection (before the user rage-quits), and human escalation as a safety valve.
from openai import OpenAI
from dataclasses import dataclass


@dataclass
class ErrorContext:
    """Context for error recovery."""
    error_type: str
    message: str
    retry_count: int
    recoverable: bool  # After max_retries, stop looping and escalate


class RobustChatbot:
    """Chatbot with error handling and recovery."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
        self.max_retries = 3
        self.error_count = 0
        self.last_error: ErrorContext = None
    
    def _handle_unclear_input(self, message: str) -> str:
        """Handle unclear or ambiguous input."""
        prompt = f"""The user's input was unclear. Generate a helpful clarification request.

User said: "{message}"

Ask for clarification in a friendly way:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def _detect_frustration(self, messages: list[str]) -> bool:
        """Detect if user is frustrated."""
        if len(messages) < 2:
            return False
        
        recent = " ".join(messages[-3:])
        
        prompt = f"""Analyze if the user seems frustrated in these messages.
Consider: repeated questions, escalating tone, explicit frustration.

Messages: "{recent}"

Return just: yes or no"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return "yes" in response.choices[0].message.content.lower()
    
    def _offer_escalation(self) -> str:
        """Offer to escalate to human support."""
        return """I understand this has been frustrating. Would you like me to:
1. Connect you with a human support agent
2. Try a different approach to help
3. Start fresh with a new question

Please let me know how you'd like to proceed."""
    
    def _recover_from_error(self, error: ErrorContext) -> str:
        """Generate recovery message based on error."""
        prompts = {
            "unclear_input": "Ask the user to rephrase their question more specifically.",
            "missing_info": "Politely request the missing information.",
            "validation_failed": "Explain what was wrong and how to correct it.",
            "system_error": "Apologize for the technical issue and offer alternatives.",
        }
        
        prompt = prompts.get(
            error.error_type,
            "Acknowledge the issue and offer to help differently."
        )
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def process_with_recovery(
        self,
        message: str,
        history: list[str]
    ) -> dict:
        """Process message with error recovery."""
        try:
            # Check for frustration
            if self._detect_frustration(history + [message]):
                return {
                    "response": self._offer_escalation(),
                    "escalation_offered": True,
                    "error": None
                }
            
            # Normal processing
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": message}
                ]
            )
            
            self.error_count = 0  # Reset on success
            
            return {
                "response": response.choices[0].message.content,
                "escalation_offered": False,
                "error": None
            }
            
        except Exception as e:
            self.error_count += 1
            
            error = ErrorContext(
                error_type="system_error",
                message=str(e),
                retry_count=self.error_count,
                recoverable=self.error_count < self.max_retries
            )
            
            if error.recoverable:
                recovery = self._recover_from_error(error)
            else:
                recovery = self._offer_escalation()
            
            return {
                "response": recovery,
                "escalation_offered": not error.recoverable,
                "error": error
            }


# Usage
bot = RobustChatbot()

# Simulated conversation with potential issues
history = [
    "How do I reset my password?",
    "I already tried that, it didn't work",
    "This is ridiculous, I've been trying for 20 minutes"
]

result = bot.process_with_recovery(
    "This is so frustrating!!!",
    history
)

print(result["response"])
if result["escalation_offered"]:
    print("(Escalation offered to user)")
Chatbot Design Principles
  • Design flows on paper first — Draw the state machine before writing code. Most bugs are design bugs, not code bugs.
  • Always provide a way out — If the user says “never mind” or “talk to a human,” honor it immediately. Trapping users in a flow destroys trust.
  • Extract and remember user info — Asking someone their name twice is the fastest way to signal “I’m not really listening.”
  • Handle errors as normal paths — Unclear input isn’t an error; it’s the most common case. Budget 40% of your effort on the unhappy path.
  • Use explicit state management — Implicit state (guessing from history) works until it doesn’t. Be explicit about where you are in the flow.

Pattern Selection Framework

Not every chatbot needs every pattern. Choosing the wrong architecture wastes engineering time and adds latency. Use this decision framework.
Question Your Bot Must AnswerPatternWhen to Skip It
”What does the user want?”Intent ClassificationSingle-purpose bots (e.g., a FAQ bot with one intent)
“What information do I still need?”Slot FillingOpen-ended conversations with no required fields
”Where are we in the conversation?”State MachineStateless Q&A with no multi-step flows
”What did we talk about before?”Context ManagementSingle-turn interactions (classification, extraction)
“The user is confused or angry”Error Recovery / EscalationInternal tools where users tolerate rough edges
Decision tree for new projects:
  1. Is this a single-turn interaction (user asks, bot answers, done)? Use a simple prompt with no state management.
  2. Does the bot need to collect structured data (dates, names, IDs)? Add Slot Filling.
  3. Are there multiple conversation paths (booking vs. canceling vs. checking status)? Add Intent Classification to route, then State Machine per path.
  4. Will conversations span more than 5-10 turns? Add Context Management with summarization.
  5. Will real users (not just your team) interact with this? Add Error Recovery and escalation.

Edge Cases That Break Chatbots

These are the failure modes that demos never show but production always encounters. Mid-flow topic switches. User is halfway through booking a flight, then asks “wait, what’s your cancellation policy?” A rigid state machine either ignores the question or resets the flow. The fix: detect out-of-scope intents within a flow, answer the side question, then offer to resume where you left off. This requires a conversation stack, not a flat state machine. Slot correction after confirmation. User confirms all details, then says “actually, change the date to Thursday.” If your flow has already moved past the CONFIRMING state, there is no path back. Build explicit “edit slot” transitions from the CONFIRMING and PROCESSING states back to GATHERING_INFO. Ambiguous multi-intent messages. “I want to book a flight and also check my existing reservation” contains two intents. Single-intent classifiers pick one and ignore the other. Either decompose the message into sub-intents before routing, or acknowledge both and handle them sequentially: “I’ll help with both. Let’s start with your booking — then we’ll check your reservation.” Copy-pasted text blobs. Users paste error messages, email threads, or entire documents into chat. Your slot extractor tries to parse a 2000-word wall of text and either hallucinates values or times out. Add input length guards and a fallback: “That’s a lot of text. Could you summarize what you need help with?” Language switching. Users who start in English and switch to Spanish mid-conversation break intent classifiers trained on monolingual data. If you serve multilingual users, either detect language per message and route to language-appropriate prompts, or use models with strong multilingual capabilities.

Practice Exercise

Build a customer service chatbot that:
  1. Uses state machines for conversation flow
  2. Implements slot filling for order inquiries
  3. Maintains multi-turn context
  4. Classifies intents for routing
  5. Handles errors with graceful recovery
Focus on:
  • Natural conversation flow
  • Complete information gathering
  • Appropriate escalation triggers
  • Consistent user experience

Interview Deep-Dive

Strong Answer:
  • The key design decision is extracting all slots in a single LLM call rather than asking one question at a time. When a user packs multiple pieces of information into one sentence, you need an extraction prompt that identifies every slot simultaneously — origin, destination, date, and passenger count. Asking “where are you flying from?” after they already told you is the fastest way to lose user trust.
  • For the correction scenario, you maintain a mutable slot store and re-run extraction on every user message. If the user says “actually, make that Paris instead of London,” the extraction call should detect that “destination” is being updated and overwrite the existing value. The critical implementation detail is that you never treat slots as immutable once filled — you always allow overwriting.
  • Validation runs after extraction. “Next Friday” needs to be resolved to an actual date (relative date parsing is a whole sub-problem). “Just me and my wife” needs to be interpreted as 2 passengers, not stored as a string. Each slot should have a validation function that normalizes the raw extraction into a canonical format.
  • In production, the gotcha is ambiguity. If the user says “I want to go home,” is “home” the origin or the destination? You need a disambiguation step that uses conversation context — if they already provided an origin, “home” is likely the destination.
Follow-up: What if the LLM extraction hallucinates a slot value that the user never mentioned — say, it fills in “economy class” when the user said nothing about cabin class?This is a real production problem. The defense is a confidence-gated extraction pipeline. You ask the LLM to return both the extracted value and a confidence score, and you only fill slots above a threshold (typically 0.8). Below that, you leave the slot empty and ask the user explicitly. Another approach is to use null as the default extraction and require the LLM to justify why a non-null value is warranted. In practice, teams lose hours debugging hallucinated slot values that silently corrupt downstream booking logic — the fix is always to validate extraction against what the user actually said, not just trust the LLM output.
Strong Answer:
  • The first move is a hybrid approach: rule-based transitions for the 80% of cases that are predictable, LLM-based classification only for the ambiguous 20%. If the current state is “collecting email” and the user message contains an ”@” sign, you do not need an LLM to tell you the next state. A simple regex check transitions to the next step in microseconds instead of 500ms.
  • For the ambiguous cases, you can use a distilled classifier. Take 10,000 historical conversation turns with their LLM-determined transitions, train a small BERT or logistic regression model on them, and deploy it locally. This gives you sub-10ms inference with 90%+ accuracy on state transitions. Reserve the LLM call for the 5-10% of cases where the local model’s confidence is below threshold.
  • The cost optimization is dramatic. Rule-based transitions cost zero. A local classifier costs fractions of a cent per thousand calls. You are replacing 200/monthinLLMcallswithmaybe200/month in LLM calls with maybe 5/month in compute, plus a one-time training effort.
  • The architectural insight is that state transitions are a classification problem, not a generation problem. You do not need GPT-4’s generative capabilities to decide if the user is confirming or correcting — you need a fast classifier with a finite set of output labels.
Follow-up: How do you handle the case where a user says something completely off-script, like asking about the weather in the middle of a booking flow?This is where you need a “topic detection” layer before the state machine. You classify every incoming message as either “on-flow” or “off-topic” before running the state transition logic. For off-topic messages, you have two choices: acknowledge and redirect (“Great question about the weather, but let me finish helping you with your booking first”) or fork into a sub-conversation and return. The redirect approach is simpler and works for most production chatbots. The sub-conversation approach is more natural but significantly more complex because you need a conversation stack, not just a single state variable. Most teams start with redirect and graduate to sub-conversations when user research shows it matters.
Strong Answer:
  • Frustration detection is a multi-signal problem. The naive approach is sentiment analysis on the latest message, but that misses the bigger picture. The real signals are: (1) repeated questions — the user asking the same thing in different words means you failed to answer it, (2) escalating message length — short, terse responses often indicate frustration better than long rants, (3) explicit frustration markers (“this is ridiculous,” “let me talk to a human”), and (4) conversation velocity — rapidly fired messages suggest impatience.
  • I would implement a sliding-window frustration score that combines these signals. Each signal contributes a weighted score, and you track the score over the last 3-5 messages. A single frustrated message is normal — a pattern of frustration over 3+ turns means you are failing.
  • The escalation decision is not binary. I use three tiers: (1) soft escalation — the bot acknowledges difficulty and offers to try a different approach, (2) medium escalation — the bot proactively offers human transfer, and (3) hard escalation — the bot immediately transfers without asking, triggered by explicit requests or extreme frustration scores.
  • The production gotcha is false positives. Sarcasm, humor, and cultural differences all affect frustration detection. A user saying “LOL this is so broken” might be frustrated or might be joking. Using the LLM for frustration detection costs more but handles nuance better than keyword matching. The compromise is keyword matching for obvious cases and LLM analysis for ambiguous ones.
Follow-up: You have data showing that 15% of escalated conversations could have been resolved by the bot if it had tried one more approach. How do you reduce unnecessary escalations without trapping frustrated users?The key insight is adding a “recovery attempt” step before escalation. When the frustration score crosses the soft threshold, instead of immediately offering a human, the bot says “I realize I have not been answering your question well. Let me try a different approach.” Then you reformulate the prompt — maybe simplifying it, maybe adding retrieved context, maybe using a more capable model. You get exactly one recovery attempt. If the user’s frustration score does not decrease after that attempt, you escalate immediately. Track the success rate of recovery attempts as a metric, and if it drops below 30-40%, reduce the recovery window and escalate faster. The principle is: never make a frustrated user repeat themselves, but one genuine attempt at improvement is worth trying.
Strong Answer:
  • A pure sliding window is a lossy compression strategy. The moment you drop message 11, any information it contained is gone forever. The production solution is a hybrid approach: keep recent messages verbatim (they carry nuance and exact wording) and maintain a running summary of older messages that preserves key facts, decisions, and commitments.
  • For the specific “50K budget" scenario, the critical design pattern is a structured fact store alongside the conversation summary. When the user mentions a specific number, name, date, or commitment, you extract it into a key-value store (e.g., `{'"budget": "50K”, “mentioned_at”: “turn_3”’}`). This structured data persists independently of the sliding window and is injected into the system prompt as context. Summaries are lossy by nature — they might compress “$50K” into “discussed budget constraints” and lose the exact number.
  • The architecture is three layers: (1) a system prompt that never gets dropped, (2) a structured facts dictionary that persists the entire conversation, (3) a running summary of old turns, and (4) the recent message window. Each layer serves a different purpose and has different persistence characteristics.
  • The trade-off is token budget. The structured facts store and running summary consume tokens from your context window. In practice, budget about 500 tokens for facts and 500 for the summary, leaving the rest for the system prompt and recent messages.
Follow-up: How do you decide which facts are worth extracting into the structured store versus letting them live in the summary?The extraction heuristic is: anything that could be referenced later with an exact value gets extracted. Numbers, dates, names, decisions, and commitments all qualify. Opinions, preferences, and general discussion points can live in the summary because the user is unlikely to say “as I said earlier, I prefer blue” with the expectation of exact recall. You can also use the LLM itself to decide — after each user turn, run a cheap extraction call that asks “what specific facts, numbers, or commitments did the user mention?” and store the results. The cost is one extra mini-model call per turn, but it prevents the much more expensive failure mode of losing a critical detail and having the user repeat themselves.