Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Advanced prompting techniques dramatically improve LLM reasoning and output quality. This chapter covers research-backed methods that go beyond basic prompting. Think of basic prompting like asking someone a question and expecting them to blurt out an answer. Advanced prompting is like asking that same person to show their work, debate themselves, explore multiple paths, and double-check their reasoning before committing to a final answer. The difference in output quality can be dramatic — Google Research found that Chain-of-Thought prompting alone improved math problem accuracy from around 18% to 57% on GSM8K benchmarks.

Chain-of-Thought (CoT) Prompting

Encourage step-by-step reasoning for complex problems. This is the single highest-ROI prompting technique you can learn. Just as a math teacher requires students to show their work (not because the answer matters less, but because the reasoning process catches errors), CoT forces the model to externalize its reasoning chain, which dramatically reduces logical mistakes.

Zero-Shot CoT

from openai import OpenAI


def zero_shot_cot(problem: str) -> str:
    """Apply zero-shot chain-of-thought prompting.
    
    The magic phrase 'Let's think step by step' was discovered by Kojima et al.
    (2022) to be surprisingly effective at triggering reasoning behavior -- even
    without providing any examples. This is the lowest-effort, highest-impact
    prompting upgrade you can make.
    """
    client = OpenAI()
    
    # The trailing instruction is the key -- it tells the model to decompose
    # the problem rather than jumping straight to an answer
    prompt = f"""{problem}

Let's think step by step."""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content


# Usage
problem = """
If a train travels at 60 mph for 2 hours, then at 80 mph for 3 hours,
what is the average speed for the entire journey?
"""

solution = zero_shot_cot(problem)
print(solution)

Few-Shot CoT

from openai import OpenAI


def few_shot_cot(problem: str, examples: list[dict]) -> str:
    """Apply few-shot chain-of-thought with examples.
    
    Few-shot CoT is like training a new employee by walking through solved
    examples before handing them a new problem. The examples teach the model
    the STYLE of reasoning you want, not just the format of the answer.
    
    Production tip: The quality of your examples matters more than quantity.
    Two well-chosen examples that demonstrate edge cases will outperform
    five trivial ones.
    """
    client = OpenAI()
    
    # Build prompt with examples -- each example shows the reasoning chain,
    # teaching the model HOW to think, not just what to output
    prompt_parts = []
    
    for example in examples:
        prompt_parts.append(f"Problem: {example['problem']}")
        prompt_parts.append(f"Reasoning: {example['reasoning']}")
        prompt_parts.append(f"Answer: {example['answer']}\n")
    
    prompt_parts.append(f"Problem: {problem}")
    prompt_parts.append("Reasoning:")
    
    prompt = "\n".join(prompt_parts)
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content


# Usage
examples = [
    {
        "problem": "Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have?",
        "reasoning": "Roger started with 5 balls. 2 cans of 3 balls = 6 balls. 5 + 6 = 11 balls.",
        "answer": "11"
    },
    {
        "problem": "A store has 15 apples. If they sell 6 and receive 10 more, how many do they have?",
        "reasoning": "Started with 15 apples. Sold 6, so 15 - 6 = 9. Received 10 more, so 9 + 10 = 19.",
        "answer": "19"
    }
]

problem = "A farmer has 23 cows. He sells 7 and buys 12 more. How many cows does he have?"
solution = few_shot_cot(problem, examples)
print(solution)

Self-Ask with CoT

from openai import OpenAI


def self_ask_cot(question: str) -> str:
    """Use self-ask prompting for multi-hop reasoning."""
    client = OpenAI()
    
    system = """You are a reasoning assistant that breaks down complex questions.
When given a question, identify if follow-up questions are needed to answer it.
Format your response as:
- Follow-up: <question> (if needed)
- Intermediate answer: <answer>
- Repeat until you can answer the original question
- Final answer: <answer>"""
    
    prompt = f"""Question: {question}

Are follow-up questions needed here?"""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": prompt}
        ]
    )
    
    return response.choices[0].message.content


# Usage
question = """
Who was president of the United States when the first iPhone was released,
and what was that president's age at the time?
"""

answer = self_ask_cot(question)
print(answer)

Tree-of-Thought (ToT) Prompting

Explore multiple reasoning paths for complex problems. If Chain-of-Thought is like solving a maze by walking one path, Tree-of-Thought is like sending scouts down every fork simultaneously and reporting back which routes look most promising. ToT excels at problems where the first approach might be a dead end — creative writing, puzzle-solving, code architecture decisions, and strategic planning. The trade-off is real: ToT makes multiple LLM calls per step (generating thoughts + evaluating each one), so latency and cost scale with breadth and depth. Use it when answer quality matters more than speed.
from openai import OpenAI
from dataclasses import dataclass
import json


@dataclass
class ThoughtNode:
    """A node in the thought tree."""
    thought: str
    evaluation: float
    children: list = None
    
    def __post_init__(self):
        if self.children is None:
            self.children = []


class TreeOfThought:
    """Implement Tree-of-Thought reasoning."""
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
    
    def generate_thoughts(
        self,
        problem: str,
        context: str = "",
        num_thoughts: int = 3
    ) -> list[str]:
        """Generate multiple distinct thoughts/approaches."""
        prompt = f"""Problem: {problem}

{f"Current progress: {context}" if context else ""}

Generate {num_thoughts} distinct approaches or next steps to solve this problem.
Each approach should be different and explore a unique angle.

Return as JSON: {{"thoughts": ["thought1", "thought2", ...]}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        return data.get("thoughts", [])
    
    def evaluate_thought(
        self,
        problem: str,
        thought: str
    ) -> float:
        """Evaluate how promising a thought is (0-1 scale)."""
        prompt = f"""Problem: {problem}

Proposed approach: {thought}

Evaluate this approach on a scale of 0.0 to 1.0:
- 1.0 = Definitely leads to the solution
- 0.5 = Might help, uncertain
- 0.0 = Dead end, won't help

Return JSON: {{"score": <float>, "reasoning": "<brief explanation>"}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        return data.get("score", 0.5)
    
    def solve(
        self,
        problem: str,
        depth: int = 3,
        breadth: int = 3,
        threshold: float = 0.3
    ) -> str:
        """Solve using tree-of-thought search."""
        # Generate initial thoughts
        initial_thoughts = self.generate_thoughts(problem, num_thoughts=breadth)
        
        # Create root nodes
        candidates = []
        for thought in initial_thoughts:
            score = self.evaluate_thought(problem, thought)
            if score >= threshold:
                candidates.append(ThoughtNode(thought=thought, evaluation=score))
        
        # Explore best candidates
        for _ in range(depth - 1):
            if not candidates:
                break
            
            # Sort by evaluation and take top candidates
            candidates.sort(key=lambda x: x.evaluation, reverse=True)
            best = candidates[:breadth]
            
            new_candidates = []
            for node in best:
                # Generate follow-up thoughts
                follow_ups = self.generate_thoughts(
                    problem,
                    context=node.thought,
                    num_thoughts=breadth
                )
                
                for thought in follow_ups:
                    combined = f"{node.thought}\nThen: {thought}"
                    score = self.evaluate_thought(problem, combined)
                    
                    if score >= threshold:
                        child = ThoughtNode(thought=combined, evaluation=score)
                        node.children.append(child)
                        new_candidates.append(child)
            
            candidates = new_candidates
        
        # Get best final thought
        if candidates:
            best = max(candidates, key=lambda x: x.evaluation)
            return self._generate_final_answer(problem, best.thought)
        
        return "Could not find a solution path."
    
    def _generate_final_answer(self, problem: str, reasoning: str) -> str:
        """Generate final answer based on best reasoning path."""
        prompt = f"""Problem: {problem}

Reasoning path:
{reasoning}

Based on this reasoning, provide the final answer."""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content


# Usage
tot = TreeOfThought()

problem = """
You have 4 numbers: 2, 3, 4, 5
Using each number exactly once and any arithmetic operations (+, -, *, /),
create an expression that equals 24.
"""

solution = tot.solve(problem, depth=3, breadth=3)
print(solution)

ReAct (Reasoning + Acting)

Combine reasoning with tool use for complex tasks.
from openai import OpenAI
import json
import re


class ReActAgent:
    """Agent using ReAct pattern for reasoning and action."""
    
    def __init__(self, tools: dict, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.tools = tools  # name -> function mapping
        
        self.tool_descriptions = self._format_tools()
    
    def _format_tools(self) -> str:
        """Format tool descriptions for the prompt."""
        lines = []
        for name, func in self.tools.items():
            doc = func.__doc__ or "No description"
            lines.append(f"- {name}: {doc}")
        return "\n".join(lines)
    
    def run(self, task: str, max_steps: int = 10) -> str:
        """Run the ReAct loop."""
        system = f"""You are an agent that solves tasks through reasoning and actions.

Available tools:
{self.tool_descriptions}

For each step, respond with exactly one of:
1. Thought: <your reasoning about what to do next>
2. Action: <tool_name>(<args>)
3. Answer: <final answer when task is complete>

Always start with a Thought, then take an Action if needed, observe the result,
and continue until you can provide the Answer."""
        
        messages = [
            {"role": "system", "content": system},
            {"role": "user", "content": f"Task: {task}"}
        ]
        
        trajectory = []
        
        for step in range(max_steps):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages
            )
            
            output = response.choices[0].message.content.strip()
            trajectory.append(output)
            
            # Check for final answer
            if output.startswith("Answer:"):
                return output[7:].strip()
            
            # Parse and execute action
            if "Action:" in output:
                action_match = re.search(r'Action:\s*(\w+)\((.*?)\)', output)
                if action_match:
                    tool_name = action_match.group(1)
                    args_str = action_match.group(2)
                    
                    # Execute tool
                    if tool_name in self.tools:
                        try:
                            # Parse arguments
                            if args_str:
                                args = [a.strip().strip('"\'') for a in args_str.split(",")]
                                result = self.tools[tool_name](*args)
                            else:
                                result = self.tools[tool_name]()
                            
                            observation = f"Observation: {result}"
                        except Exception as e:
                            observation = f"Observation: Error - {str(e)}"
                    else:
                        observation = f"Observation: Unknown tool '{tool_name}'"
                    
                    trajectory.append(observation)
                    
                    messages.append({"role": "assistant", "content": output})
                    messages.append({"role": "user", "content": observation})
                else:
                    messages.append({"role": "assistant", "content": output})
                    messages.append({
                        "role": "user",
                        "content": "Please format your action as: Action: tool_name(args)"
                    })
            else:
                messages.append({"role": "assistant", "content": output})
                messages.append({
                    "role": "user",
                    "content": "Continue with your next thought or action."
                })
        
        return "Max steps reached. Trajectory:\n" + "\n".join(trajectory)


# Example tools
def search(query: str) -> str:
    """Search for information on the web."""
    # Simulated search results
    return f"Search results for '{query}': Found relevant information about the topic."

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Error: {e}"

def lookup(topic: str) -> str:
    """Look up a specific fact or definition."""
    facts = {
        "python": "Python is a high-level programming language created by Guido van Rossum.",
        "earth radius": "Earth's radius is approximately 6,371 kilometers.",
    }
    return facts.get(topic.lower(), f"No information found for '{topic}'")


# Usage
agent = ReActAgent(tools={
    "search": search,
    "calculate": calculate,
    "lookup": lookup
})

result = agent.run(
    "What is the circumference of Earth? Calculate it using the radius."
)
print(result)

Self-Consistency

Generate multiple reasoning paths and select the most consistent answer.
from openai import OpenAI
from collections import Counter
import json


class SelfConsistency:
    """Apply self-consistency decoding."""
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
    
    def solve(
        self,
        problem: str,
        num_samples: int = 5,
        temperature: float = 0.7
    ) -> dict:
        """Solve using self-consistency."""
        # Generate multiple reasoning paths
        responses = []
        
        prompt = f"""{problem}

Think through this step by step and provide your answer.
At the end, clearly state your final answer after "Final Answer:".
"""
        
        for _ in range(num_samples):
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}],
                temperature=temperature
            )
            
            text = response.choices[0].message.content
            responses.append(text)
        
        # Extract final answers
        answers = []
        for response in responses:
            answer = self._extract_answer(response)
            if answer:
                answers.append(answer)
        
        # Find most common answer
        if answers:
            counter = Counter(answers)
            most_common = counter.most_common(1)[0]
            
            return {
                "answer": most_common[0],
                "confidence": most_common[1] / len(answers),
                "num_samples": num_samples,
                "all_answers": dict(counter),
                "sample_reasoning": responses[0]
            }
        
        return {
            "answer": None,
            "confidence": 0,
            "num_samples": num_samples,
            "error": "Could not extract consistent answer"
        }
    
    def _extract_answer(self, response: str) -> str:
        """Extract the final answer from a response."""
        # Look for "Final Answer:" pattern
        if "Final Answer:" in response:
            answer = response.split("Final Answer:")[-1].strip()
            # Clean up the answer
            answer = answer.split("\n")[0].strip()
            return answer
        
        # Fallback: last line
        lines = [l.strip() for l in response.split("\n") if l.strip()]
        if lines:
            return lines[-1]
        
        return None


# Usage
sc = SelfConsistency()

problem = """
A farmer has 17 sheep. All but 9 run away. How many sheep are left?
"""

result = sc.solve(problem, num_samples=5)
print(f"Answer: {result['answer']}")
print(f"Confidence: {result['confidence']:.0%}")
print(f"Vote distribution: {result['all_answers']}")

Prompt Chaining

Break complex tasks into sequential steps.
from openai import OpenAI
from dataclasses import dataclass
from typing import Callable, Any


@dataclass
class ChainStep:
    """A step in a prompt chain."""
    name: str
    prompt_template: str
    parser: Callable[[str], Any] = None


class PromptChain:
    """Chain multiple prompts together."""
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
        self.steps: list[ChainStep] = []
    
    def add_step(
        self,
        name: str,
        prompt_template: str,
        parser: Callable[[str], Any] = None
    ):
        """Add a step to the chain."""
        self.steps.append(ChainStep(
            name=name,
            prompt_template=prompt_template,
            parser=parser
        ))
        return self
    
    def run(self, initial_input: dict) -> dict:
        """Run the prompt chain."""
        context = initial_input.copy()
        results = {}
        
        for step in self.steps:
            # Format prompt with current context
            prompt = step.prompt_template.format(**context)
            
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}]
            )
            
            output = response.choices[0].message.content
            
            # Parse if parser provided
            if step.parser:
                output = step.parser(output)
            
            results[step.name] = output
            context[step.name] = output
        
        return results


# Usage: Research and summarization chain
chain = PromptChain()

chain.add_step(
    name="outline",
    prompt_template="""Create an outline for an article about: {topic}

Include 3-5 main sections with brief descriptions.
Format as a numbered list."""
)

chain.add_step(
    name="draft",
    prompt_template="""Based on this outline:
{outline}

Write a detailed first draft of the article about {topic}.
Expand each section with relevant information and examples."""
)

chain.add_step(
    name="review",
    prompt_template="""Review this draft and identify improvements:
{draft}

List:
1. Factual issues or gaps
2. Clarity improvements
3. Missing information
4. Structural suggestions"""
)

chain.add_step(
    name="final",
    prompt_template="""Improve this draft based on the review:

Original draft:
{draft}

Review feedback:
{review}

Write the improved final version."""
)

results = chain.run({"topic": "the future of renewable energy"})
print("Final article:")
print(results["final"])

Least-to-Most Prompting

Decompose complex problems into simpler subproblems.
from openai import OpenAI


class LeastToMost:
    """Implement least-to-most prompting."""
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
    
    def decompose(self, problem: str) -> list[str]:
        """Decompose problem into subproblems."""
        prompt = f"""Problem: {problem}

Break this problem down into simpler subproblems that can be solved sequentially.
List them from simplest to most complex, where each builds on previous answers.

Format as a numbered list of subproblems."""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        text = response.choices[0].message.content
        
        # Parse numbered list
        subproblems = []
        for line in text.split("\n"):
            line = line.strip()
            if line and line[0].isdigit():
                # Remove number and punctuation
                subproblem = line.lstrip("0123456789.):- ").strip()
                if subproblem:
                    subproblems.append(subproblem)
        
        return subproblems
    
    def solve(self, problem: str) -> dict:
        """Solve using least-to-most approach."""
        # Decompose
        subproblems = self.decompose(problem)
        
        # Solve sequentially
        solutions = []
        context = f"Original problem: {problem}\n\n"
        
        for i, subproblem in enumerate(subproblems):
            prompt = f"""{context}
Previous solutions:
{chr(10).join(f"- {s['subproblem']}: {s['solution']}" for s in solutions)}

Now solve: {subproblem}

Provide a clear, concise solution."""
            
            response = self.client.chat.completions.create(
                model=self.model,
                messages=[{"role": "user", "content": prompt}]
            )
            
            solution = response.choices[0].message.content
            solutions.append({
                "subproblem": subproblem,
                "solution": solution
            })
        
        # Final synthesis
        synthesis_prompt = f"""Original problem: {problem}

Solutions to subproblems:
{chr(10).join(f"{i+1}. {s['subproblem']}: {s['solution']}" for i, s in enumerate(solutions))}

Now provide the complete final answer to the original problem."""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": synthesis_prompt}]
        )
        
        return {
            "subproblems": subproblems,
            "solutions": solutions,
            "final_answer": response.choices[0].message.content
        }


# Usage
l2m = LeastToMost()

problem = """
Design a system to recommend movies to users based on their watching history
and preferences. The system should handle millions of users and movies.
"""

result = l2m.solve(problem)

print("Subproblems:")
for i, sp in enumerate(result["subproblems"], 1):
    print(f"  {i}. {sp}")

print("\nFinal Answer:")
print(result["final_answer"])
Prompting Best Practices
  • Choose the technique based on problem complexity
  • Chain-of-Thought works well for math and logical reasoning
  • Tree-of-Thought excels at creative problem-solving
  • ReAct is ideal when external tools are needed
  • Self-Consistency improves reliability for high-stakes answers

Practice Exercise

Build a reasoning system that:
  1. Automatically selects the best prompting technique
  2. Implements at least 3 different techniques
  3. Evaluates and compares results across techniques
  4. Provides confidence scores for answers
  5. Supports custom tool integration
Focus on:
  • Problem classification for technique selection
  • Robust answer extraction
  • Performance measurement
  • Fallback strategies

Interview Deep-Dive

Strong Answer:
  • The key decision factor is whether the problem has a single reasoning path or multiple viable approaches. Chain-of-Thought works well for linear reasoning tasks like math, logical deduction, or step-by-step analysis where there is one clear path forward. Tree-of-Thought excels at problems where the first approach might be a dead end, such as creative generation, code architecture decisions, puzzle-solving, or strategic planning.
  • In production, the cost trade-off is significant. ToT makes multiple LLM calls per step: you generate N candidate thoughts, then evaluate each one, and repeat this at every depth level. For a tree with breadth 3 and depth 3, you are looking at roughly 3 + 9 + 27 generation calls plus the same number of evaluation calls. That is around 78 LLM calls for a single user query versus 1 call for CoT.
  • The practical heuristic I use: if the task has a verifiable correct answer and the first-attempt success rate with CoT is above 80%, stick with CoT. If you need the answer to be creative or exploratory, or if the cost of a wrong answer is very high (say, generating a legal contract), the additional cost of ToT is justified.
  • One pattern that works well in production is a “CoT-first, ToT-fallback” approach. Run CoT first, evaluate the confidence of the result, and only escalate to ToT if the confidence is below a threshold. This keeps average cost low while catching the hard cases.
Follow-up: How would you implement confidence-based routing between CoT and ToT without adding too much latency?You can use a lightweight classifier or even the model’s own logprobs as a confidence signal. After the CoT pass, check whether the model hedged in its answer (phrases like “I think” or “it might be”) or whether the logprobs for the final answer token are below a threshold. If confidence is low, trigger the ToT pass. The first CoT response takes maybe 1-2 seconds, and the confidence check is essentially free since it operates on the existing response. You only pay the ToT latency tax on the subset of queries that actually need it, which in practice is around 10-20% of requests.
Strong Answer:
  • The biggest failure mode is when the model is confidently wrong in a consistent way. If 4 out of 5 samples all arrive at the same incorrect answer through similar flawed reasoning, majority voting amplifies the error instead of catching it. This happens most often with questions that trigger a common misconception the model learned during training.
  • A second failure mode is answer extraction brittleness. Self-Consistency depends on being able to compare final answers across samples, but if the model phrases the same answer differently each time (“42”, “forty-two”, “the answer is 42 units”), your extraction logic might treat them as different answers and report artificially low confidence.
  • Temperature selection matters more than most people realize. Too low (under 0.3) and all samples follow nearly identical reasoning paths, defeating the purpose. Too high (above 1.0) and you get incoherent samples that pollute the vote. The sweet spot is usually 0.5 to 0.8 depending on the task.
  • To mitigate these issues, I normalize extracted answers before comparison (strip units, convert to canonical forms), use semantic similarity rather than exact match when answers are free-text, and combine Self-Consistency with a verification step where I ask the model to critique the majority answer. If the critique identifies a flaw, I fall back to the second-most-common answer or flag for human review.
Follow-up: At what sample count does Self-Consistency hit diminishing returns, and how do you balance cost versus reliability?In practice, 5 to 7 samples capture most of the benefit. Research from Google and others shows that going from 1 to 5 samples gives you the biggest accuracy jump, 5 to 10 gives a modest improvement, and beyond 10 you are paying significantly more for marginal gains. For production, I typically use 5 samples for standard queries and increase to 9 or 11 for high-stakes outputs like financial calculations or medical information. The odd number avoids ties in the vote. Cost-wise, Self-Consistency at 5 samples with gpt-4o-mini is still cheaper than a single gpt-4o call, so you can often get better reliability at lower cost by sampling a cheaper model multiple times.
Strong Answer:
  • Infinite loops in ReAct agents typically happen for one of three reasons. First, the observation from a tool is ambiguous or unhelpful, so the agent keeps retrying the same tool hoping for a better result. Second, the agent oscillates between two tools, each of which provides partial information that triggers calling the other. Third, the agent’s reasoning gets stuck in a circular pattern where it keeps re-deriving the same thought without making progress toward the answer.
  • The diagnostic approach starts with logging. You need to record every Thought, Action, and Observation in the trajectory. Then look for patterns: repeated identical tool calls, oscillation between two actions, or thoughts that repeat nearly verbatim. In the code from this chapter, the trajectory list captures this, but in production you want structured logging with deduplication detection.
  • To fix it, I implement multiple safeguards. A hard cap on iterations (the max_steps parameter) is the safety net, but you also want a “stale detection” mechanism that compares the current thought to the last N thoughts using simple string similarity. If similarity exceeds 80%, inject a meta-prompt like “You seem to be repeating yourself. Summarize what you know so far and either provide a final answer or try a completely different approach.”
  • Another effective technique is tool call deduplication: if the agent tries to call the same tool with the same arguments twice in a row, intercept it and return the cached result along with a nudge like “You already searched for this. Here is what you found. Use this information to proceed.”
  • At a system design level, I also add a “give up gracefully” escape hatch. If the agent hits 70% of max_steps without converging, force it to synthesize a partial answer from what it has gathered so far, rather than continuing to flail.
Follow-up: How would you monitor ReAct agent behavior in production to catch degradation before users notice?I track three key metrics: average steps-to-completion (a rising trend means the agent is struggling more), the percentage of runs hitting the max_steps cap (your “failure rate”), and the distribution of tool calls per run. Set alerts when the median steps-to-completion increases by more than 20% over a rolling window, or when the max_steps hit rate exceeds 5%. You also want to sample and review trajectories from the tail end of the steps distribution, because those are the runs closest to failure. A weekly review of the top 10 longest trajectories often reveals systematic prompt or tool issues before they become widespread.
Strong Answer:
  • The way I think about this is in terms of task decomposability and error propagation. If a task can be cleanly split into independent subtasks where each step has a well-defined input and output, chaining wins. If the subtasks are deeply interdependent and the model needs to hold the full context simultaneously to make good decisions, a single prompt is better.
  • Chaining has several concrete advantages. Each step can use a different model, so you can use gpt-4o-mini for classification and extraction steps but gpt-4o for the final synthesis. Each step can be independently tested, cached, and retried. You can add validation between steps to catch errors early. And each prompt is shorter, which means fewer hallucinations since the model has less context to get confused by.
  • The downside of chaining is error compounding. If step 1 has 90% accuracy and step 2 has 90% accuracy, your end-to-end accuracy is 81%. With 5 steps at 90% each, you are down to 59%. So you need each step to be highly reliable, or you need error correction mechanisms between steps.
  • A single long prompt avoids the error compounding problem and lets the model see all context at once, but it hits practical limits. Beyond about 4,000 tokens of instructions, models start ignoring parts of the prompt. And you lose the ability to use different models or add intermediate validation.
  • My decision framework: if the task has more than 3 clearly separable stages, chain. If the task requires tight integration between steps (like writing code where the function signature in step 1 affects the implementation in step 3), use a single prompt with clear section markers. For the middle ground, I use a hybrid: chain the major stages but keep tightly coupled substeps within a single prompt.
Follow-up: How do you handle the case where a later step in a chain reveals that an earlier step made an error?This is where the Corrective RAG pattern from the advanced RAG chapter is relevant even outside retrieval. I implement a lightweight “review and revise” step at the end of the chain that takes the final output and the outputs of all intermediate steps, then checks for consistency. If it finds a contradiction (say, the outline mentions 5 sections but the draft only covers 4), it identifies which step went wrong and reruns just that step with additional guidance. This is cheaper than rerunning the entire chain and catches most cross-step errors. For truly critical pipelines, I add assertion checks between steps, like verifying that the output of the extraction step actually contains the required fields before passing it to analysis.