Use this file to discover all available pages before exploring further.
Advanced prompting techniques dramatically improve LLM reasoning and output quality. This chapter covers research-backed methods that go beyond basic prompting.Think of basic prompting like asking someone a question and expecting them to blurt out an answer. Advanced prompting is like asking that same person to show their work, debate themselves, explore multiple paths, and double-check their reasoning before committing to a final answer. The difference in output quality can be dramatic — Google Research found that Chain-of-Thought prompting alone improved math problem accuracy from around 18% to 57% on GSM8K benchmarks.
Encourage step-by-step reasoning for complex problems. This is the single highest-ROI prompting technique you can learn. Just as a math teacher requires students to show their work (not because the answer matters less, but because the reasoning process catches errors), CoT forces the model to externalize its reasoning chain, which dramatically reduces logical mistakes.
from openai import OpenAIdef zero_shot_cot(problem: str) -> str: """Apply zero-shot chain-of-thought prompting. The magic phrase 'Let's think step by step' was discovered by Kojima et al. (2022) to be surprisingly effective at triggering reasoning behavior -- even without providing any examples. This is the lowest-effort, highest-impact prompting upgrade you can make. """ client = OpenAI() # The trailing instruction is the key -- it tells the model to decompose # the problem rather than jumping straight to an answer prompt = f"""{problem}Let's think step by step.""" response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content# Usageproblem = """If a train travels at 60 mph for 2 hours, then at 80 mph for 3 hours,what is the average speed for the entire journey?"""solution = zero_shot_cot(problem)print(solution)
from openai import OpenAIdef few_shot_cot(problem: str, examples: list[dict]) -> str: """Apply few-shot chain-of-thought with examples. Few-shot CoT is like training a new employee by walking through solved examples before handing them a new problem. The examples teach the model the STYLE of reasoning you want, not just the format of the answer. Production tip: The quality of your examples matters more than quantity. Two well-chosen examples that demonstrate edge cases will outperform five trivial ones. """ client = OpenAI() # Build prompt with examples -- each example shows the reasoning chain, # teaching the model HOW to think, not just what to output prompt_parts = [] for example in examples: prompt_parts.append(f"Problem: {example['problem']}") prompt_parts.append(f"Reasoning: {example['reasoning']}") prompt_parts.append(f"Answer: {example['answer']}\n") prompt_parts.append(f"Problem: {problem}") prompt_parts.append("Reasoning:") prompt = "\n".join(prompt_parts) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content# Usageexamples = [ { "problem": "Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have?", "reasoning": "Roger started with 5 balls. 2 cans of 3 balls = 6 balls. 5 + 6 = 11 balls.", "answer": "11" }, { "problem": "A store has 15 apples. If they sell 6 and receive 10 more, how many do they have?", "reasoning": "Started with 15 apples. Sold 6, so 15 - 6 = 9. Received 10 more, so 9 + 10 = 19.", "answer": "19" }]problem = "A farmer has 23 cows. He sells 7 and buys 12 more. How many cows does he have?"solution = few_shot_cot(problem, examples)print(solution)
from openai import OpenAIdef self_ask_cot(question: str) -> str: """Use self-ask prompting for multi-hop reasoning.""" client = OpenAI() system = """You are a reasoning assistant that breaks down complex questions.When given a question, identify if follow-up questions are needed to answer it.Format your response as:- Follow-up: <question> (if needed)- Intermediate answer: <answer>- Repeat until you can answer the original question- Final answer: <answer>""" prompt = f"""Question: {question}Are follow-up questions needed here?""" response = client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": system}, {"role": "user", "content": prompt} ] ) return response.choices[0].message.content# Usagequestion = """Who was president of the United States when the first iPhone was released,and what was that president's age at the time?"""answer = self_ask_cot(question)print(answer)
Explore multiple reasoning paths for complex problems. If Chain-of-Thought is like solving a maze by walking one path, Tree-of-Thought is like sending scouts down every fork simultaneously and reporting back which routes look most promising. ToT excels at problems where the first approach might be a dead end — creative writing, puzzle-solving, code architecture decisions, and strategic planning.The trade-off is real: ToT makes multiple LLM calls per step (generating thoughts + evaluating each one), so latency and cost scale with breadth and depth. Use it when answer quality matters more than speed.
from openai import OpenAIfrom dataclasses import dataclassimport json@dataclassclass ThoughtNode: """A node in the thought tree.""" thought: str evaluation: float children: list = None def __post_init__(self): if self.children is None: self.children = []class TreeOfThought: """Implement Tree-of-Thought reasoning.""" def __init__(self, model: str = "gpt-4o"): self.client = OpenAI() self.model = model def generate_thoughts( self, problem: str, context: str = "", num_thoughts: int = 3 ) -> list[str]: """Generate multiple distinct thoughts/approaches.""" prompt = f"""Problem: {problem}{f"Current progress: {context}" if context else ""}Generate {num_thoughts} distinct approaches or next steps to solve this problem.Each approach should be different and explore a unique angle.Return as JSON: {{"thoughts": ["thought1", "thought2", ...]}}""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"} ) data = json.loads(response.choices[0].message.content) return data.get("thoughts", []) def evaluate_thought( self, problem: str, thought: str ) -> float: """Evaluate how promising a thought is (0-1 scale).""" prompt = f"""Problem: {problem}Proposed approach: {thought}Evaluate this approach on a scale of 0.0 to 1.0:- 1.0 = Definitely leads to the solution- 0.5 = Might help, uncertain- 0.0 = Dead end, won't helpReturn JSON: {{"score": <float>, "reasoning": "<brief explanation>"}}""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], response_format={"type": "json_object"} ) data = json.loads(response.choices[0].message.content) return data.get("score", 0.5) def solve( self, problem: str, depth: int = 3, breadth: int = 3, threshold: float = 0.3 ) -> str: """Solve using tree-of-thought search.""" # Generate initial thoughts initial_thoughts = self.generate_thoughts(problem, num_thoughts=breadth) # Create root nodes candidates = [] for thought in initial_thoughts: score = self.evaluate_thought(problem, thought) if score >= threshold: candidates.append(ThoughtNode(thought=thought, evaluation=score)) # Explore best candidates for _ in range(depth - 1): if not candidates: break # Sort by evaluation and take top candidates candidates.sort(key=lambda x: x.evaluation, reverse=True) best = candidates[:breadth] new_candidates = [] for node in best: # Generate follow-up thoughts follow_ups = self.generate_thoughts( problem, context=node.thought, num_thoughts=breadth ) for thought in follow_ups: combined = f"{node.thought}\nThen: {thought}" score = self.evaluate_thought(problem, combined) if score >= threshold: child = ThoughtNode(thought=combined, evaluation=score) node.children.append(child) new_candidates.append(child) candidates = new_candidates # Get best final thought if candidates: best = max(candidates, key=lambda x: x.evaluation) return self._generate_final_answer(problem, best.thought) return "Could not find a solution path." def _generate_final_answer(self, problem: str, reasoning: str) -> str: """Generate final answer based on best reasoning path.""" prompt = f"""Problem: {problem}Reasoning path:{reasoning}Based on this reasoning, provide the final answer.""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content# Usagetot = TreeOfThought()problem = """You have 4 numbers: 2, 3, 4, 5Using each number exactly once and any arithmetic operations (+, -, *, /),create an expression that equals 24."""solution = tot.solve(problem, depth=3, breadth=3)print(solution)
Combine reasoning with tool use for complex tasks.
from openai import OpenAIimport jsonimport reclass ReActAgent: """Agent using ReAct pattern for reasoning and action.""" def __init__(self, tools: dict, model: str = "gpt-4o"): self.client = OpenAI() self.model = model self.tools = tools # name -> function mapping self.tool_descriptions = self._format_tools() def _format_tools(self) -> str: """Format tool descriptions for the prompt.""" lines = [] for name, func in self.tools.items(): doc = func.__doc__ or "No description" lines.append(f"- {name}: {doc}") return "\n".join(lines) def run(self, task: str, max_steps: int = 10) -> str: """Run the ReAct loop.""" system = f"""You are an agent that solves tasks through reasoning and actions.Available tools:{self.tool_descriptions}For each step, respond with exactly one of:1. Thought: <your reasoning about what to do next>2. Action: <tool_name>(<args>)3. Answer: <final answer when task is complete>Always start with a Thought, then take an Action if needed, observe the result,and continue until you can provide the Answer.""" messages = [ {"role": "system", "content": system}, {"role": "user", "content": f"Task: {task}"} ] trajectory = [] for step in range(max_steps): response = self.client.chat.completions.create( model=self.model, messages=messages ) output = response.choices[0].message.content.strip() trajectory.append(output) # Check for final answer if output.startswith("Answer:"): return output[7:].strip() # Parse and execute action if "Action:" in output: action_match = re.search(r'Action:\s*(\w+)\((.*?)\)', output) if action_match: tool_name = action_match.group(1) args_str = action_match.group(2) # Execute tool if tool_name in self.tools: try: # Parse arguments if args_str: args = [a.strip().strip('"\'') for a in args_str.split(",")] result = self.tools[tool_name](*args) else: result = self.tools[tool_name]() observation = f"Observation: {result}" except Exception as e: observation = f"Observation: Error - {str(e)}" else: observation = f"Observation: Unknown tool '{tool_name}'" trajectory.append(observation) messages.append({"role": "assistant", "content": output}) messages.append({"role": "user", "content": observation}) else: messages.append({"role": "assistant", "content": output}) messages.append({ "role": "user", "content": "Please format your action as: Action: tool_name(args)" }) else: messages.append({"role": "assistant", "content": output}) messages.append({ "role": "user", "content": "Continue with your next thought or action." }) return "Max steps reached. Trajectory:\n" + "\n".join(trajectory)# Example toolsdef search(query: str) -> str: """Search for information on the web.""" # Simulated search results return f"Search results for '{query}': Found relevant information about the topic."def calculate(expression: str) -> str: """Evaluate a mathematical expression.""" try: result = eval(expression) return str(result) except Exception as e: return f"Error: {e}"def lookup(topic: str) -> str: """Look up a specific fact or definition.""" facts = { "python": "Python is a high-level programming language created by Guido van Rossum.", "earth radius": "Earth's radius is approximately 6,371 kilometers.", } return facts.get(topic.lower(), f"No information found for '{topic}'")# Usageagent = ReActAgent(tools={ "search": search, "calculate": calculate, "lookup": lookup})result = agent.run( "What is the circumference of Earth? Calculate it using the radius.")print(result)
Generate multiple reasoning paths and select the most consistent answer.
from openai import OpenAIfrom collections import Counterimport jsonclass SelfConsistency: """Apply self-consistency decoding.""" def __init__(self, model: str = "gpt-4o"): self.client = OpenAI() self.model = model def solve( self, problem: str, num_samples: int = 5, temperature: float = 0.7 ) -> dict: """Solve using self-consistency.""" # Generate multiple reasoning paths responses = [] prompt = f"""{problem}Think through this step by step and provide your answer.At the end, clearly state your final answer after "Final Answer:".""" for _ in range(num_samples): response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], temperature=temperature ) text = response.choices[0].message.content responses.append(text) # Extract final answers answers = [] for response in responses: answer = self._extract_answer(response) if answer: answers.append(answer) # Find most common answer if answers: counter = Counter(answers) most_common = counter.most_common(1)[0] return { "answer": most_common[0], "confidence": most_common[1] / len(answers), "num_samples": num_samples, "all_answers": dict(counter), "sample_reasoning": responses[0] } return { "answer": None, "confidence": 0, "num_samples": num_samples, "error": "Could not extract consistent answer" } def _extract_answer(self, response: str) -> str: """Extract the final answer from a response.""" # Look for "Final Answer:" pattern if "Final Answer:" in response: answer = response.split("Final Answer:")[-1].strip() # Clean up the answer answer = answer.split("\n")[0].strip() return answer # Fallback: last line lines = [l.strip() for l in response.split("\n") if l.strip()] if lines: return lines[-1] return None# Usagesc = SelfConsistency()problem = """A farmer has 17 sheep. All but 9 run away. How many sheep are left?"""result = sc.solve(problem, num_samples=5)print(f"Answer: {result['answer']}")print(f"Confidence: {result['confidence']:.0%}")print(f"Vote distribution: {result['all_answers']}")
from openai import OpenAIfrom dataclasses import dataclassfrom typing import Callable, Any@dataclassclass ChainStep: """A step in a prompt chain.""" name: str prompt_template: str parser: Callable[[str], Any] = Noneclass PromptChain: """Chain multiple prompts together.""" def __init__(self, model: str = "gpt-4o"): self.client = OpenAI() self.model = model self.steps: list[ChainStep] = [] def add_step( self, name: str, prompt_template: str, parser: Callable[[str], Any] = None ): """Add a step to the chain.""" self.steps.append(ChainStep( name=name, prompt_template=prompt_template, parser=parser )) return self def run(self, initial_input: dict) -> dict: """Run the prompt chain.""" context = initial_input.copy() results = {} for step in self.steps: # Format prompt with current context prompt = step.prompt_template.format(**context) response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}] ) output = response.choices[0].message.content # Parse if parser provided if step.parser: output = step.parser(output) results[step.name] = output context[step.name] = output return results# Usage: Research and summarization chainchain = PromptChain()chain.add_step( name="outline", prompt_template="""Create an outline for an article about: {topic}Include 3-5 main sections with brief descriptions.Format as a numbered list.""")chain.add_step( name="draft", prompt_template="""Based on this outline:{outline}Write a detailed first draft of the article about {topic}.Expand each section with relevant information and examples.""")chain.add_step( name="review", prompt_template="""Review this draft and identify improvements:{draft}List:1. Factual issues or gaps2. Clarity improvements3. Missing information4. Structural suggestions""")chain.add_step( name="final", prompt_template="""Improve this draft based on the review:Original draft:{draft}Review feedback:{review}Write the improved final version.""")results = chain.run({"topic": "the future of renewable energy"})print("Final article:")print(results["final"])
Decompose complex problems into simpler subproblems.
from openai import OpenAIclass LeastToMost: """Implement least-to-most prompting.""" def __init__(self, model: str = "gpt-4o"): self.client = OpenAI() self.model = model def decompose(self, problem: str) -> list[str]: """Decompose problem into subproblems.""" prompt = f"""Problem: {problem}Break this problem down into simpler subproblems that can be solved sequentially.List them from simplest to most complex, where each builds on previous answers.Format as a numbered list of subproblems.""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}] ) text = response.choices[0].message.content # Parse numbered list subproblems = [] for line in text.split("\n"): line = line.strip() if line and line[0].isdigit(): # Remove number and punctuation subproblem = line.lstrip("0123456789.):- ").strip() if subproblem: subproblems.append(subproblem) return subproblems def solve(self, problem: str) -> dict: """Solve using least-to-most approach.""" # Decompose subproblems = self.decompose(problem) # Solve sequentially solutions = [] context = f"Original problem: {problem}\n\n" for i, subproblem in enumerate(subproblems): prompt = f"""{context}Previous solutions:{chr(10).join(f"- {s['subproblem']}: {s['solution']}" for s in solutions)}Now solve: {subproblem}Provide a clear, concise solution.""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}] ) solution = response.choices[0].message.content solutions.append({ "subproblem": subproblem, "solution": solution }) # Final synthesis synthesis_prompt = f"""Original problem: {problem}Solutions to subproblems:{chr(10).join(f"{i+1}. {s['subproblem']}: {s['solution']}" for i, s in enumerate(solutions))}Now provide the complete final answer to the original problem.""" response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": synthesis_prompt}] ) return { "subproblems": subproblems, "solutions": solutions, "final_answer": response.choices[0].message.content }# Usagel2m = LeastToMost()problem = """Design a system to recommend movies to users based on their watching historyand preferences. The system should handle millions of users and movies."""result = l2m.solve(problem)print("Subproblems:")for i, sp in enumerate(result["subproblems"], 1): print(f" {i}. {sp}")print("\nFinal Answer:")print(result["final_answer"])
Prompting Best Practices
Choose the technique based on problem complexity
Chain-of-Thought works well for math and logical reasoning
Tree-of-Thought excels at creative problem-solving
ReAct is ideal when external tools are needed
Self-Consistency improves reliability for high-stakes answers
When would you choose Tree-of-Thought over Chain-of-Thought in a production system, and what are the cost implications?
Strong Answer:
The key decision factor is whether the problem has a single reasoning path or multiple viable approaches. Chain-of-Thought works well for linear reasoning tasks like math, logical deduction, or step-by-step analysis where there is one clear path forward. Tree-of-Thought excels at problems where the first approach might be a dead end, such as creative generation, code architecture decisions, puzzle-solving, or strategic planning.
In production, the cost trade-off is significant. ToT makes multiple LLM calls per step: you generate N candidate thoughts, then evaluate each one, and repeat this at every depth level. For a tree with breadth 3 and depth 3, you are looking at roughly 3 + 9 + 27 generation calls plus the same number of evaluation calls. That is around 78 LLM calls for a single user query versus 1 call for CoT.
The practical heuristic I use: if the task has a verifiable correct answer and the first-attempt success rate with CoT is above 80%, stick with CoT. If you need the answer to be creative or exploratory, or if the cost of a wrong answer is very high (say, generating a legal contract), the additional cost of ToT is justified.
One pattern that works well in production is a “CoT-first, ToT-fallback” approach. Run CoT first, evaluate the confidence of the result, and only escalate to ToT if the confidence is below a threshold. This keeps average cost low while catching the hard cases.
Follow-up: How would you implement confidence-based routing between CoT and ToT without adding too much latency?You can use a lightweight classifier or even the model’s own logprobs as a confidence signal. After the CoT pass, check whether the model hedged in its answer (phrases like “I think” or “it might be”) or whether the logprobs for the final answer token are below a threshold. If confidence is low, trigger the ToT pass. The first CoT response takes maybe 1-2 seconds, and the confidence check is essentially free since it operates on the existing response. You only pay the ToT latency tax on the subset of queries that actually need it, which in practice is around 10-20% of requests.
Self-Consistency samples multiple reasoning paths and takes a majority vote. What failure modes have you seen with this approach, and how do you mitigate them?
Strong Answer:
The biggest failure mode is when the model is confidently wrong in a consistent way. If 4 out of 5 samples all arrive at the same incorrect answer through similar flawed reasoning, majority voting amplifies the error instead of catching it. This happens most often with questions that trigger a common misconception the model learned during training.
A second failure mode is answer extraction brittleness. Self-Consistency depends on being able to compare final answers across samples, but if the model phrases the same answer differently each time (“42”, “forty-two”, “the answer is 42 units”), your extraction logic might treat them as different answers and report artificially low confidence.
Temperature selection matters more than most people realize. Too low (under 0.3) and all samples follow nearly identical reasoning paths, defeating the purpose. Too high (above 1.0) and you get incoherent samples that pollute the vote. The sweet spot is usually 0.5 to 0.8 depending on the task.
To mitigate these issues, I normalize extracted answers before comparison (strip units, convert to canonical forms), use semantic similarity rather than exact match when answers are free-text, and combine Self-Consistency with a verification step where I ask the model to critique the majority answer. If the critique identifies a flaw, I fall back to the second-most-common answer or flag for human review.
Follow-up: At what sample count does Self-Consistency hit diminishing returns, and how do you balance cost versus reliability?In practice, 5 to 7 samples capture most of the benefit. Research from Google and others shows that going from 1 to 5 samples gives you the biggest accuracy jump, 5 to 10 gives a modest improvement, and beyond 10 you are paying significantly more for marginal gains. For production, I typically use 5 samples for standard queries and increase to 9 or 11 for high-stakes outputs like financial calculations or medical information. The odd number avoids ties in the vote. Cost-wise, Self-Consistency at 5 samples with gpt-4o-mini is still cheaper than a single gpt-4o call, so you can often get better reliability at lower cost by sampling a cheaper model multiple times.
You are building a customer support agent that uses ReAct. The agent sometimes enters infinite loops between Thought and Action steps. How do you diagnose and fix this?
Strong Answer:
Infinite loops in ReAct agents typically happen for one of three reasons. First, the observation from a tool is ambiguous or unhelpful, so the agent keeps retrying the same tool hoping for a better result. Second, the agent oscillates between two tools, each of which provides partial information that triggers calling the other. Third, the agent’s reasoning gets stuck in a circular pattern where it keeps re-deriving the same thought without making progress toward the answer.
The diagnostic approach starts with logging. You need to record every Thought, Action, and Observation in the trajectory. Then look for patterns: repeated identical tool calls, oscillation between two actions, or thoughts that repeat nearly verbatim. In the code from this chapter, the trajectory list captures this, but in production you want structured logging with deduplication detection.
To fix it, I implement multiple safeguards. A hard cap on iterations (the max_steps parameter) is the safety net, but you also want a “stale detection” mechanism that compares the current thought to the last N thoughts using simple string similarity. If similarity exceeds 80%, inject a meta-prompt like “You seem to be repeating yourself. Summarize what you know so far and either provide a final answer or try a completely different approach.”
Another effective technique is tool call deduplication: if the agent tries to call the same tool with the same arguments twice in a row, intercept it and return the cached result along with a nudge like “You already searched for this. Here is what you found. Use this information to proceed.”
At a system design level, I also add a “give up gracefully” escape hatch. If the agent hits 70% of max_steps without converging, force it to synthesize a partial answer from what it has gathered so far, rather than continuing to flail.
Follow-up: How would you monitor ReAct agent behavior in production to catch degradation before users notice?I track three key metrics: average steps-to-completion (a rising trend means the agent is struggling more), the percentage of runs hitting the max_steps cap (your “failure rate”), and the distribution of tool calls per run. Set alerts when the median steps-to-completion increases by more than 20% over a rolling window, or when the max_steps hit rate exceeds 5%. You also want to sample and review trajectories from the tail end of the steps distribution, because those are the runs closest to failure. A weekly review of the top 10 longest trajectories often reveals systematic prompt or tool issues before they become widespread.
Prompt chaining versus a single long prompt: walk me through how you decide which approach to use for a complex task.
Strong Answer:
The way I think about this is in terms of task decomposability and error propagation. If a task can be cleanly split into independent subtasks where each step has a well-defined input and output, chaining wins. If the subtasks are deeply interdependent and the model needs to hold the full context simultaneously to make good decisions, a single prompt is better.
Chaining has several concrete advantages. Each step can use a different model, so you can use gpt-4o-mini for classification and extraction steps but gpt-4o for the final synthesis. Each step can be independently tested, cached, and retried. You can add validation between steps to catch errors early. And each prompt is shorter, which means fewer hallucinations since the model has less context to get confused by.
The downside of chaining is error compounding. If step 1 has 90% accuracy and step 2 has 90% accuracy, your end-to-end accuracy is 81%. With 5 steps at 90% each, you are down to 59%. So you need each step to be highly reliable, or you need error correction mechanisms between steps.
A single long prompt avoids the error compounding problem and lets the model see all context at once, but it hits practical limits. Beyond about 4,000 tokens of instructions, models start ignoring parts of the prompt. And you lose the ability to use different models or add intermediate validation.
My decision framework: if the task has more than 3 clearly separable stages, chain. If the task requires tight integration between steps (like writing code where the function signature in step 1 affects the implementation in step 3), use a single prompt with clear section markers. For the middle ground, I use a hybrid: chain the major stages but keep tightly coupled substeps within a single prompt.
Follow-up: How do you handle the case where a later step in a chain reveals that an earlier step made an error?This is where the Corrective RAG pattern from the advanced RAG chapter is relevant even outside retrieval. I implement a lightweight “review and revise” step at the end of the chain that takes the final output and the outputs of all intermediate steps, then checks for consistency. If it finds a contradiction (say, the outline mentions 5 sections but the draft only covers 4), it identifies which step went wrong and reruns just that step with additional guidance. This is cheaper than rerunning the entire chain and catches most cross-step errors. For truly critical pipelines, I add assertion checks between steps, like verifying that the output of the extraction step actually contains the required fields before passing it to analysis.