Skip to main content
December 2025 Update: Covers LangChain 0.3+ with LCEL (LangChain Expression Language), async support, and production patterns.

Introduction

In recent years, language models have become more advanced, allowing us to tackle complex tasks and extract information from large documents. However, these models have a limit on the amount of context they can consider, which can be tricky when dealing with lots of information. To overcome this challenge, LLM chains have emerged. They simplify the process of chaining multiple LLM calls together, making it easier to handle large volumes of data. RAG System Architecture with LangChain LLM chains use different language model components to process information and generate responses in a unified way. In this article, we will discuss different components and conventions in LangChain.

What is LangChain?

LangChain provides AI developers with tools to connect language models with external data sources. LangChain Components Diagram LLMs are large deep-learning models pre-trained on large amounts of data that can generate responses to user queries by answering questions or creating images from text-based prompts. LangChain is a framework for building applications powered by LLMs. It provides:
  • Chains: Composable sequences of LLM calls
  • Prompts: Template management and optimization
  • Memory: Conversation and context management
  • Tools: Integration with external APIs and functions
  • Agents: Autonomous decision-making workflows
Why LangChain? While you can build AI apps with raw APIs, LangChain provides abstractions that make production systems easier to build, test, and maintain.

Installation

pip install langchain langchain-openai langchain-core langchain-community

Core Concepts

1. LCEL (LangChain Expression Language)

LCEL is LangChain’s declarative way to compose chains:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define components
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("Translate to {language}: {text}")
output_parser = StrOutputParser()

# Compose chain
chain = prompt | llm | output_parser

# Invoke
result = chain.invoke({"language": "French", "text": "Hello, world!"})
print(result)  # "Bonjour, le monde!"
Key Benefits:
  • Declarative syntax with pipe operator (|)
  • Automatic async support
  • Built-in streaming
  • Easy debugging and observability

2. Prompts

Prompt Templates:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# System + User prompt
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer in {style}."),
    ("user", "{question}")
])

# With conversation history
conversation_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("user", "{input}")
])
Few-Shot Examples:
from langchain_core.prompts import FewShotChatMessagePromptTemplate

examples = [
    {"input": "happy", "output": "sad"},
    {"input": "tall", "output": "short"},
]

example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

final_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that gives antonyms."),
    few_shot_prompt,
    ("human", "{input}"),
])
Prompt Management:
# Save prompts to file
prompt.save("prompts/translation.yaml")

# Load from file
from langchain_core.prompts import load_prompt
loaded_prompt = load_prompt("prompts/translation.yaml")

3. Chains

Simple Chain:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("What is {topic}?")

chain = prompt | llm

response = chain.invoke({"topic": "quantum computing"})
Sequential Chain:
from langchain_core.output_parsers import StrOutputParser

# Chain 1: Generate question
question_chain = (
    ChatPromptTemplate.from_template("Generate a question about {topic}")
    | llm
    | StrOutputParser()
)

# Chain 2: Answer question
answer_chain = (
    ChatPromptTemplate.from_template("Answer this question: {question}")
    | llm
    | StrOutputParser()
)

# Combine
def qa_chain(topic: str):
    question = question_chain.invoke({"topic": topic})
    answer = answer_chain.invoke({"question": question})
    return {"question": question, "answer": answer}
RunnableParallel (Parallel Execution):
from langchain_core.runnables import RunnableParallel

parallel_chain = RunnableParallel({
    "summary": ChatPromptTemplate.from_template("Summarize: {text}") | llm,
    "sentiment": ChatPromptTemplate.from_template("Sentiment of: {text}") | llm,
    "keywords": ChatPromptTemplate.from_template("Extract keywords from: {text}") | llm,
})

results = parallel_chain.invoke({"text": "LangChain is a framework for LLM applications."})
Conditional Chains:
from langchain_core.runnables import RunnableLambda

def route_chain(input_data: dict):
    if input_data["type"] == "technical":
        return technical_chain
    else:
        return general_chain

routed_chain = RunnableLambda(route_chain)

4. Memory

Conversation Buffer:
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

memory = ConversationBufferMemory()
chain = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

chain.predict(input="Hi, I'm Alice")
chain.predict(input="What's my name?")  # Remembers "Alice"
Conversation Summary Memory:
from langchain.memory import ConversationSummaryMemory

summary_memory = ConversationSummaryMemory(llm=llm)
chain = ConversationChain(llm=llm, memory=summary_memory)

# Long conversation gets summarized automatically
Conversation Buffer Window:
from langchain.memory import ConversationBufferWindowMemory

window_memory = ConversationBufferWindowMemory(k=5)  # Last 5 exchanges
Vector Store Memory:
from langchain.memory import VectorStoreRetrieverMemory
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts([""], embeddings)

vector_memory = VectorStoreRetrieverMemory(
    retriever=vectorstore.as_retriever(search_kwargs={"k": 3})
)
Custom Memory:
from langchain.memory import BaseMemory
from typing import Dict, List

class CustomMemory(BaseMemory):
    def __init__(self):
        self.memories: List[Dict] = []
    
    def save_context(self, inputs: Dict, outputs: Dict):
        self.memories.append({"inputs": inputs, "outputs": outputs})
    
    def load_memory_variables(self, inputs: Dict) -> Dict:
        return {"history": self.memories}
    
    def clear(self):
        self.memories = []

5. Tools

Creating Tools:
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    # Implementation here
    return f"Results for {query}"

@tool
def calculator(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)
        return str(result)
    except:
        return "Invalid expression"

tools = [search_web, calculator]
Using Tools with LLM:
from langchain_core.messages import HumanMessage

llm_with_tools = llm.bind_tools(tools)

response = llm_with_tools.invoke([
    HumanMessage(content="What's 15 * 23? Then search for Python tutorials.")
])

# Check for tool calls
if response.tool_calls:
    for tool_call in response.tool_calls:
        print(f"Tool: {tool_call['name']}")
        print(f"Args: {tool_call['args']}")
Structured Tools:
from langchain_core.tools import StructuredTool
from pydantic import BaseModel, Field

class SearchInput(BaseModel):
    query: str = Field(description="Search query")
    max_results: int = Field(default=5, description="Maximum results")

def search_function(query: str, max_results: int = 5) -> str:
    # Implementation
    return f"Found {max_results} results for {query}"

structured_tool = StructuredTool.from_function(
    func=search_function,
    args_schema=SearchInput,
    name="web_search",
    description="Search the web for information"
)

6. RAG with LangChain

Complete RAG Chain:
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_core.runnables import RunnablePassthrough

# Setup vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_texts(
    texts=["LangChain is a framework for building LLM applications..."],
    embedding=embeddings
)

retriever = vectorstore.as_retriever()

# RAG chain
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the context:

Context: {context}

Question: {question}
""")

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough()
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

answer = rag_chain.invoke("What is LangChain?")
RAG with Sources:
from langchain_core.runnables import RunnableLambda

def format_docs_with_sources(docs):
    formatted = []
    for i, doc in enumerate(docs):
        formatted.append(f"[Source {i+1}]: {doc.page_content}")
    return "\n\n".join(formatted)

rag_with_sources = (
    {
        "context": retriever | format_docs_with_sources,
        "question": RunnablePassthrough()
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

7. Streaming

Streaming Responses:
from langchain_core.callbacks import StreamingStdOutCallbackHandler

llm = ChatOpenAI(
    model="gpt-4o",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

chain.invoke({"topic": "AI"})
Custom Streaming Handler:
from langchain_core.callbacks import BaseCallbackHandler

class CustomStreamHandler(BaseCallbackHandler):
    def __init__(self, on_token):
        self.on_token = on_token
        self.tokens = []
    
    def on_llm_new_token(self, token: str, **kwargs):
        self.tokens.append(token)
        self.on_token(token)
    
    def get_full_response(self) -> str:
        return "".join(self.tokens)

# Usage
tokens_received = []
handler = CustomStreamHandler(lambda t: tokens_received.append(t))

llm = ChatOpenAI(model="gpt-4o", streaming=True, callbacks=[handler])
response = llm.invoke([HumanMessage(content="Explain streaming")])
Async Streaming:
async def stream_chain():
    async for chunk in chain.astream({"topic": "AI"}):
        print(chunk.content, end="", flush=True)

import asyncio
asyncio.run(stream_chain())

8. Observability with LangSmith

import os

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-project"

# All chains automatically traced
chain.invoke({"input": "test"})
# View at smith.langchain.com
Custom Tracing:
from langchain_core.tracers import LangChainTracer
from langchain.callbacks import LangChainTracerV2

tracer = LangChainTracerV2(project_name="my-project")
chain.invoke({"input": "test"}, config={"callbacks": [tracer]})

Production Patterns

Error Handling

from langchain_core.runnables import RunnableLambda

def safe_invoke(chain, input_data):
    try:
        return chain.invoke(input_data)
    except Exception as e:
        return {"error": str(e), "input": input_data}

safe_chain = RunnableLambda(lambda x: safe_invoke(chain, x))
Retry Logic:
from langchain_core.runnables import RunnableRetry

retry_chain = RunnableRetry(
    chain,
    max_attempts=3,
    retry_if_exception_type=(Exception,)
)

Caching

from langchain.cache import InMemoryCache
from langchain.globals import set_llm_cache

set_llm_cache(InMemoryCache())

# First call - hits API
chain.invoke({"input": "test"})

# Second call - uses cache
chain.invoke({"input": "test"})
Redis Cache:
from langchain.cache import RedisCache
import redis

redis_client = redis.Redis()
set_llm_cache(RedisCache(redis_client))

Batch Processing

# Process multiple inputs in parallel
inputs = [{"topic": "AI"}, {"topic": "ML"}, {"topic": "NLP"}]
results = chain.batch(inputs)
Async Batch:
import asyncio

async def batch_process():
    inputs = [{"topic": "AI"}, {"topic": "ML"}]
    results = await chain.abatch(inputs)
    return results

asyncio.run(batch_process())

Output Parsing

Structured Output:
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

class Answer(BaseModel):
    answer: str = Field(description="The answer")
    confidence: float = Field(description="Confidence score")

parser = PydanticOutputParser(pydantic_object=Answer)

chain = prompt | llm | parser

result = chain.invoke({"question": "What is AI?"})
print(result.answer)
print(result.confidence)
JSON Output:
from langchain_core.output_parsers import JsonOutputParser

json_parser = JsonOutputParser()

chain = prompt | llm | json_parser
result = chain.invoke({"input": "Extract key info"})

When to Use LangChain

Use LangChain when:
  • Building complex multi-step workflows
  • Need prompt management and versioning
  • Require memory/conversation management
  • Integrating multiple tools and APIs
  • Want built-in observability (LangSmith)
  • Building production systems that need maintainability
  • Need to compose different LLM providers
Consider raw APIs when:
  • Simple one-off LLM calls
  • Maximum performance is critical
  • Want minimal dependencies
  • Building lightweight prototypes
  • Need fine-grained control over every API call

Limitations

  • Additional abstraction layer adds overhead
  • Learning curve for LCEL syntax
  • Dependency on LangChain ecosystem
  • Can be overkill for simple use cases
  • Version changes can break code

Performance Tips

  • Use async methods (ainvoke, astream) for better concurrency
  • Enable caching for repeated queries
  • Batch process when possible
  • Use streaming for better UX
  • Monitor with LangSmith to identify bottlenecks
  • Cache embeddings and prompts when possible

Key Takeaways

LCEL is Powerful

Use the pipe operator (|) to compose chains declaratively.

Prompts as Templates

Manage prompts separately from code for easier iteration.

Memory Built-in

LangChain provides multiple memory types for conversations.

Production Ready

Built-in observability, caching, and error handling.

What’s Next

LangGraph

Learn how to build complex agent workflows with state machines