Conversational AI requires careful design of dialogue flows, state management, and context handling. This chapter covers proven patterns for building production chatbots. Think of a chatbot like a waiter at a restaurant. A bad waiter asks you to repeat your order three times, forgets you asked for no onions, and brings the check before dessert. A great waiter remembers your preferences, anticipates what you need next, and gracefully handles “actually, can I change my order?” The patterns below are the training manual for building that great waiter.Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Conversation State Machine
A state machine is to a chatbot what a flowchart is to a customer service script. Without one, your bot is just reacting to the last message with no memory of where it is in the conversation. With one, it knows “I’ve collected the departure city but still need the date” — and can behave accordingly.Basic State Management
Slot Filling Pattern
Slot filling is like filling out a form, except the user talks naturally instead of typing into labeled boxes. The bot’s job is to extract structured data (“New York”, “next Friday”, “2 passengers”) from unstructured sentences (“I want to fly from New York next Friday, just me and my wife”). The pattern below separates what you need to collect (the slots) from how you collect it (the conversation), making it easy to reuse across different domains.Multi-Turn Context Management
Intent Classification
Intent classification is the traffic cop at the front door of your chatbot. Before you can help a user, you need to know what they want: are they booking, canceling, or just asking a question? Getting this wrong means routing someone who wants to cancel to the booking flow — the chatbot equivalent of being transferred to the wrong department on a phone call.Conversation Flows
Error Handling and Recovery
The difference between a demo chatbot and a production one is what happens when things go wrong. Users will send gibberish, ask for things outside your scope, or get frustrated when the bot misunderstands. The pattern below implements three layers of defense: graceful clarification, frustration detection (before the user rage-quits), and human escalation as a safety valve.- Design flows on paper first — Draw the state machine before writing code. Most bugs are design bugs, not code bugs.
- Always provide a way out — If the user says “never mind” or “talk to a human,” honor it immediately. Trapping users in a flow destroys trust.
- Extract and remember user info — Asking someone their name twice is the fastest way to signal “I’m not really listening.”
- Handle errors as normal paths — Unclear input isn’t an error; it’s the most common case. Budget 40% of your effort on the unhappy path.
- Use explicit state management — Implicit state (guessing from history) works until it doesn’t. Be explicit about where you are in the flow.
Pattern Selection Framework
Not every chatbot needs every pattern. Choosing the wrong architecture wastes engineering time and adds latency. Use this decision framework.| Question Your Bot Must Answer | Pattern | When to Skip It |
|---|---|---|
| ”What does the user want?” | Intent Classification | Single-purpose bots (e.g., a FAQ bot with one intent) |
| “What information do I still need?” | Slot Filling | Open-ended conversations with no required fields |
| ”Where are we in the conversation?” | State Machine | Stateless Q&A with no multi-step flows |
| ”What did we talk about before?” | Context Management | Single-turn interactions (classification, extraction) |
| “The user is confused or angry” | Error Recovery / Escalation | Internal tools where users tolerate rough edges |
- Is this a single-turn interaction (user asks, bot answers, done)? Use a simple prompt with no state management.
- Does the bot need to collect structured data (dates, names, IDs)? Add Slot Filling.
- Are there multiple conversation paths (booking vs. canceling vs. checking status)? Add Intent Classification to route, then State Machine per path.
- Will conversations span more than 5-10 turns? Add Context Management with summarization.
- Will real users (not just your team) interact with this? Add Error Recovery and escalation.
Edge Cases That Break Chatbots
These are the failure modes that demos never show but production always encounters. Mid-flow topic switches. User is halfway through booking a flight, then asks “wait, what’s your cancellation policy?” A rigid state machine either ignores the question or resets the flow. The fix: detect out-of-scope intents within a flow, answer the side question, then offer to resume where you left off. This requires a conversation stack, not a flat state machine. Slot correction after confirmation. User confirms all details, then says “actually, change the date to Thursday.” If your flow has already moved past the CONFIRMING state, there is no path back. Build explicit “edit slot” transitions from the CONFIRMING and PROCESSING states back to GATHERING_INFO. Ambiguous multi-intent messages. “I want to book a flight and also check my existing reservation” contains two intents. Single-intent classifiers pick one and ignore the other. Either decompose the message into sub-intents before routing, or acknowledge both and handle them sequentially: “I’ll help with both. Let’s start with your booking — then we’ll check your reservation.” Copy-pasted text blobs. Users paste error messages, email threads, or entire documents into chat. Your slot extractor tries to parse a 2000-word wall of text and either hallucinates values or times out. Add input length guards and a fallback: “That’s a lot of text. Could you summarize what you need help with?” Language switching. Users who start in English and switch to Spanish mid-conversation break intent classifiers trained on monolingual data. If you serve multilingual users, either detect language per message and route to language-appropriate prompts, or use models with strong multilingual capabilities.Practice Exercise
Build a customer service chatbot that:- Uses state machines for conversation flow
- Implements slot filling for order inquiries
- Maintains multi-turn context
- Classifies intents for routing
- Handles errors with graceful recovery
- Natural conversation flow
- Complete information gathering
- Appropriate escalation triggers
- Consistent user experience
Interview Deep-Dive
You are building a customer service chatbot that handles flight bookings. A user says 'I want to fly from New York to London next Friday, just me and my wife.' How do you design the slot-filling system to extract all relevant information from a single utterance, and what happens when the user corrects a slot mid-conversation?
You are building a customer service chatbot that handles flight bookings. A user says 'I want to fly from New York to London next Friday, just me and my wife.' How do you design the slot-filling system to extract all relevant information from a single utterance, and what happens when the user corrects a slot mid-conversation?
- The key design decision is extracting all slots in a single LLM call rather than asking one question at a time. When a user packs multiple pieces of information into one sentence, you need an extraction prompt that identifies every slot simultaneously — origin, destination, date, and passenger count. Asking “where are you flying from?” after they already told you is the fastest way to lose user trust.
- For the correction scenario, you maintain a mutable slot store and re-run extraction on every user message. If the user says “actually, make that Paris instead of London,” the extraction call should detect that “destination” is being updated and overwrite the existing value. The critical implementation detail is that you never treat slots as immutable once filled — you always allow overwriting.
- Validation runs after extraction. “Next Friday” needs to be resolved to an actual date (relative date parsing is a whole sub-problem). “Just me and my wife” needs to be interpreted as 2 passengers, not stored as a string. Each slot should have a validation function that normalizes the raw extraction into a canonical format.
- In production, the gotcha is ambiguity. If the user says “I want to go home,” is “home” the origin or the destination? You need a disambiguation step that uses conversation context — if they already provided an origin, “home” is likely the destination.
Your chatbot uses an LLM to determine state transitions after every user message. In production, this adds 400-600ms of latency per turn and costs $200/month in classification calls alone. Walk me through how you would redesign this to be faster and cheaper while maintaining accuracy.
Your chatbot uses an LLM to determine state transitions after every user message. In production, this adds 400-600ms of latency per turn and costs $200/month in classification calls alone. Walk me through how you would redesign this to be faster and cheaper while maintaining accuracy.
- The first move is a hybrid approach: rule-based transitions for the 80% of cases that are predictable, LLM-based classification only for the ambiguous 20%. If the current state is “collecting email” and the user message contains an ”@” sign, you do not need an LLM to tell you the next state. A simple regex check transitions to the next step in microseconds instead of 500ms.
- For the ambiguous cases, you can use a distilled classifier. Take 10,000 historical conversation turns with their LLM-determined transitions, train a small BERT or logistic regression model on them, and deploy it locally. This gives you sub-10ms inference with 90%+ accuracy on state transitions. Reserve the LLM call for the 5-10% of cases where the local model’s confidence is below threshold.
- The cost optimization is dramatic. Rule-based transitions cost zero. A local classifier costs fractions of a cent per thousand calls. You are replacing 5/month in compute, plus a one-time training effort.
- The architectural insight is that state transitions are a classification problem, not a generation problem. You do not need GPT-4’s generative capabilities to decide if the user is confirming or correcting — you need a fast classifier with a finite set of output labels.
Walk me through how you would implement frustration detection in a production chatbot. What signals would you use, and at what point do you escalate to a human agent?
Walk me through how you would implement frustration detection in a production chatbot. What signals would you use, and at what point do you escalate to a human agent?
- Frustration detection is a multi-signal problem. The naive approach is sentiment analysis on the latest message, but that misses the bigger picture. The real signals are: (1) repeated questions — the user asking the same thing in different words means you failed to answer it, (2) escalating message length — short, terse responses often indicate frustration better than long rants, (3) explicit frustration markers (“this is ridiculous,” “let me talk to a human”), and (4) conversation velocity — rapidly fired messages suggest impatience.
- I would implement a sliding-window frustration score that combines these signals. Each signal contributes a weighted score, and you track the score over the last 3-5 messages. A single frustrated message is normal — a pattern of frustration over 3+ turns means you are failing.
- The escalation decision is not binary. I use three tiers: (1) soft escalation — the bot acknowledges difficulty and offers to try a different approach, (2) medium escalation — the bot proactively offers human transfer, and (3) hard escalation — the bot immediately transfers without asking, triggered by explicit requests or extreme frustration scores.
- The production gotcha is false positives. Sarcasm, humor, and cultural differences all affect frustration detection. A user saying “LOL this is so broken” might be frustrated or might be joking. Using the LLM for frustration detection costs more but handles nuance better than keyword matching. The compromise is keyword matching for obvious cases and LLM analysis for ambiguous ones.
Your chatbot maintains conversation history by keeping the last 10 messages. A user references something from 20 messages ago: 'like I said earlier about the budget being $50K.' How do you handle long-range context dependencies in production?
Your chatbot maintains conversation history by keeping the last 10 messages. A user references something from 20 messages ago: 'like I said earlier about the budget being $50K.' How do you handle long-range context dependencies in production?
- A pure sliding window is a lossy compression strategy. The moment you drop message 11, any information it contained is gone forever. The production solution is a hybrid approach: keep recent messages verbatim (they carry nuance and exact wording) and maintain a running summary of older messages that preserves key facts, decisions, and commitments.
- For the specific “50K budget" scenario, the critical design pattern is a structured fact store alongside the conversation summary. When the user mentions a specific number, name, date, or commitment, you extract it into a key-value store (e.g., `{'"budget": "50K”, “mentioned_at”: “turn_3”’}`). This structured data persists independently of the sliding window and is injected into the system prompt as context. Summaries are lossy by nature — they might compress “$50K” into “discussed budget constraints” and lose the exact number.
- The architecture is three layers: (1) a system prompt that never gets dropped, (2) a structured facts dictionary that persists the entire conversation, (3) a running summary of old turns, and (4) the recent message window. Each layer serves a different purpose and has different persistence characteristics.
- The trade-off is token budget. The structured facts store and running summary consume tokens from your context window. In practice, budget about 500 tokens for facts and 500 for the summary, leaving the rest for the system prompt and recent messages.