Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
What is LangGraph?
LangGraph is a framework for building stateful, multi-step agent workflows. It models agents as graphs where:- Nodes = Processing steps (LLM calls, tool use, logic)
- Edges = Transitions between steps
- State = Data passed between nodes
Why LangGraph? Simple agent loops break down with complex logic. LangGraph gives you explicit control over agent flow, branching, and state management.
Core Concepts
Installation
Basic Agent Graph
Multi-Step Workflow
Human-in-the-Loop
Parallel Execution
Subgraphs
Visualization
Common Patterns
Agent Executor
LLM decides actions, tools execute, loop until done
Plan-Execute
Create plan first, then execute each step
Reflection
Execute, evaluate, improve, repeat
Multi-Agent
Multiple specialized agents in a workflow
Best Practices
Keep State Minimal
Keep State Minimal
Only store what’s needed between nodes. Large state = slower execution.
Use Checkpointing
Use Checkpointing
Enable persistence for long-running workflows and human-in-the-loop.
Handle Errors in Nodes
Handle Errors in Nodes
Each node should handle its own errors gracefully.
Test Individual Nodes
Test Individual Nodes
Unit test nodes before assembling the graph.
Next Steps
MCP Protocol
Learn the Model Context Protocol for tool integration
Interview Deep-Dive
Why would you use LangGraph instead of a simple while-loop agent? What problem does the graph abstraction solve?
Why would you use LangGraph instead of a simple while-loop agent? What problem does the graph abstraction solve?
Strong Answer:
- A simple while-loop agent works well for straightforward tool-calling patterns: loop until the LLM says “done.” But production agent workflows quickly outgrow this. The moment you need conditional branching (if the user wants a refund, go to the refund flow; if they want a product question, go to the knowledge base flow), parallel execution (search the web AND query the database simultaneously), human-in-the-loop approval gates, or stateful multi-step workflows with persistence, the while loop becomes spaghetti code with nested ifs and global state mutations.
- LangGraph solves this by modeling the workflow as a directed graph where nodes are processing steps and edges are transitions. The graph gives you three things you cannot easily get from a loop. First, explicit control flow: conditional edges make branching logic visible and testable. I can look at the graph definition and immediately understand every possible path the agent can take, which is impossible with deeply nested conditional loops. Second, state management: the TypedDict state schema is passed between nodes, and each node only modifies its slice of the state. This prevents the “one node accidentally overwrites another node’s data” bugs that plague global-state agent loops. Third, persistence via checkpointing: the graph can be paused at any node, serialized to a database, and resumed later. This is essential for human-in-the-loop workflows where the agent pauses for approval and the user might not respond for hours.
- In practice, I use a simple loop for agents with 1-2 tools and no branching. I reach for LangGraph when the workflow has 3+ distinct phases, needs any form of human approval, requires parallel execution, or needs to survive process restarts.
interrupt_before to pause execution before a sensitive node (like approving a payment), persist the state, and resume when the human approves. Without checkpointing, the entire agent context would be lost between the request and the approval. Second, crash recovery: if the server restarts mid-workflow, the agent resumes from the last checkpoint instead of restarting from scratch. For a workflow that has already made 5 API calls and is on step 6, this avoids redundant work and cost. Third, debugging and audit trails: the checkpoint history is a complete execution trace showing every state transition, which is invaluable for debugging why an agent made a particular decision three steps ago.Design a multi-agent system using LangGraph where a research agent and a writing agent collaborate to produce a report. How do you handle the handoff between agents?
Design a multi-agent system using LangGraph where a research agent and a writing agent collaborate to produce a report. How do you handle the handoff between agents?
Strong Answer:
- I would model this as a graph with four nodes: a planner, a research agent, a writer agent, and a reviewer. The state schema carries the shared context: the original task, the research plan, collected sources, drafted sections, and review feedback.
- The planner node takes the user’s request and produces a structured research plan: a list of topics to investigate and an outline for the final report. The research agent node iterates through the plan, using tools (web search, database queries, document retrieval) to gather information for each topic. It writes its findings into the state as structured research notes with source citations. The writer node takes the research notes and the outline and drafts each section of the report. The reviewer node evaluates the draft against the original request and the research notes, checking for accuracy, completeness, and coherence.
- The handoff design is the critical part. I do not pass raw state between agents — I use a structured interface. The research agent writes to a
research_notesfield in state with a specific schema (topic, findings, sources, confidence). The writer reads from this field, not from the research agent’s internal tool call history. This decoupling means I can swap out the research agent implementation (replace web search with a domain-specific database) without changing the writer at all. - For the review loop, I add a conditional edge from reviewer back to either writer or research: if the reviewer finds factual gaps, it routes back to research with specific questions. If it finds writing quality issues, it routes back to writer with feedback. I cap this loop at 2 iterations to prevent infinite revision cycles.
- I implement each agent as a subgraph so they can be developed and tested independently. The parent graph orchestrates the handoff and manages the shared state.
What are the failure modes of LangGraph agents in production, and how do you build resilience against them?
What are the failure modes of LangGraph agents in production, and how do you build resilience against them?
Strong Answer:
- I have seen five main failure modes in production LangGraph deployments.
- First, infinite loops. The agent gets stuck cycling between two nodes because the conditional edge never resolves to a terminal state. For example, the agent calls a tool, gets an unhelpful result, calls it again with a slightly different query, gets another unhelpful result, and repeats forever. The fix is a hard iteration limit on every cycle in the graph. I set
recursion_limiton the compiled graph (default is 25 in LangGraph) and add per-cycle counters in the state that conditional edges check. - Second, state bloat. Each tool call result gets appended to the messages list in state, and after 20+ tool calls the state exceeds the LLM’s context window. The fix is implementing a state summarization node that compresses old messages into a summary when the state exceeds a token threshold. I keep the last 5 messages verbatim and summarize everything older.
- Third, node failures. An external API call in a tool node times out or returns an error. Each node should have its own try-except with graceful degradation: write an error message to state rather than crashing the entire graph. The conditional edge after the node should check for errors and route to a recovery path (retry with different parameters, skip to next step, or exit gracefully with a partial result).
- Fourth, checkpoint corruption. If the serialized state becomes invalid (schema change between deployments, or a non-serializable object in state), resuming from a checkpoint fails. I version my state schema and include a migration path. When loading a checkpoint, I check the schema version and migrate if needed.
- Fifth, LLM decision quality degradation as state grows. The more context the LLM processes, the more likely it is to make poor routing or tool selection decisions. I keep the decision-relevant context small and focused by summarizing irrelevant prior steps and only surfacing the information the current node needs.