Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Anthropic’s Claude models offer unique capabilities and API patterns. This chapter covers the Claude API in depth, from basic usage to advanced features like extended thinking.

Getting Started with Claude

Installation and Setup

import anthropic
from typing import Optional

# Initialize client
client = anthropic.Anthropic()  # Uses ANTHROPIC_API_KEY env var

# Or with explicit key
client = anthropic.Anthropic(api_key="your-api-key")

Basic Message API

Claude uses a messages-based API structure:
from anthropic import Anthropic


def chat_with_claude(
    prompt: str,
    model: str = "claude-sonnet-4-20250514",
    max_tokens: int = 1024
) -> str:
    """Send a message to Claude and get a response."""
    client = Anthropic()
    
    message = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    
    return message.content[0].text


# Usage
response = chat_with_claude("Explain quantum entanglement in simple terms.")
print(response)

Multi-Turn Conversations

from anthropic import Anthropic
from dataclasses import dataclass, field


@dataclass
class Conversation:
    """Manage multi-turn conversations with Claude."""
    
    client: Anthropic = field(default_factory=Anthropic)
    model: str = "claude-sonnet-4-20250514"
    system: str = ""
    messages: list = field(default_factory=list)
    max_tokens: int = 1024
    
    def add_user_message(self, content: str):
        """Add a user message to the conversation."""
        self.messages.append({"role": "user", "content": content})
    
    def add_assistant_message(self, content: str):
        """Add an assistant message to the conversation."""
        self.messages.append({"role": "assistant", "content": content})
    
    def send(self, user_message: str) -> str:
        """Send a message and get a response."""
        self.add_user_message(user_message)
        
        response = self.client.messages.create(
            model=self.model,
            max_tokens=self.max_tokens,
            system=self.system,
            messages=self.messages
        )
        
        assistant_message = response.content[0].text
        self.add_assistant_message(assistant_message)
        
        return assistant_message
    
    def reset(self):
        """Clear conversation history."""
        self.messages = []


# Usage
conversation = Conversation(
    system="You are a helpful Python tutor. Explain concepts with examples."
)

response1 = conversation.send("What are decorators?")
print(f"Claude: {response1}\n")

response2 = conversation.send("Can you show a practical example?")
print(f"Claude: {response2}")

System Prompts

System prompts in Claude are powerful for shaping behavior.

Effective System Prompt Patterns

from anthropic import Anthropic


class ClaudeAssistant:
    """Claude assistant with configurable personas."""
    
    PERSONAS = {
        "code_reviewer": """You are an expert code reviewer. Your responsibilities:
- Identify bugs, security issues, and performance problems
- Suggest improvements following best practices
- Explain your reasoning clearly
- Rate code quality on a 1-10 scale
- Be constructive and educational in feedback

Format reviews with sections: Summary, Issues, Suggestions, Rating""",

        "technical_writer": """You are a technical documentation expert. Your style:
- Write clear, concise documentation
- Include practical code examples
- Structure content with headers and lists
- Anticipate common questions
- Define technical terms on first use

Always target an audience of intermediate developers.""",

        "sql_expert": """You are a SQL and database optimization expert. You:
- Write efficient, readable SQL queries
- Explain query plans and optimization strategies
- Follow SQL style best practices
- Consider indexing and performance implications
- Support PostgreSQL, MySQL, and SQLite syntax

Always explain your queries and suggest indexes when relevant.""",
    }
    
    def __init__(self, persona: str = None, custom_system: str = None):
        self.client = Anthropic()
        self.system = custom_system or self.PERSONAS.get(persona, "")
    
    def ask(self, prompt: str, model: str = "claude-sonnet-4-20250514") -> str:
        """Send a prompt with the configured system context."""
        response = self.client.messages.create(
            model=model,
            max_tokens=2048,
            system=self.system,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.content[0].text


# Usage
reviewer = ClaudeAssistant(persona="code_reviewer")

code = """
def get_user(id):
    query = f"SELECT * FROM users WHERE id = {id}"
    return db.execute(query)
"""

review = reviewer.ask(f"Review this code:\n\n```python\n{code}\n```")
print(review)

Dynamic System Prompts

from datetime import datetime
from anthropic import Anthropic


def create_dynamic_system_prompt(
    user_name: str,
    user_role: str,
    context: dict = None
) -> str:
    """Generate a dynamic system prompt based on context."""
    context = context or {}
    
    base_prompt = f"""You are an AI assistant helping {user_name}, who is a {user_role}.

Current date and time: {datetime.now().strftime("%Y-%m-%d %H:%M")}
"""
    
    # Add context-specific instructions
    if context.get("project_type"):
        base_prompt += f"\nProject context: {context['project_type']}"
    
    if context.get("tech_stack"):
        stack = ", ".join(context["tech_stack"])
        base_prompt += f"\nTech stack in use: {stack}"
    
    if context.get("constraints"):
        base_prompt += f"\nConstraints to consider: {context['constraints']}"
    
    base_prompt += """

Adapt your responses to the user's expertise level. Be direct and actionable.
When suggesting code, match the project's existing patterns and stack."""
    
    return base_prompt


# Usage
client = Anthropic()

system = create_dynamic_system_prompt(
    user_name="Alex",
    user_role="Senior Backend Engineer",
    context={
        "project_type": "E-commerce API",
        "tech_stack": ["Python", "FastAPI", "PostgreSQL", "Redis"],
        "constraints": "Must support 10k requests/second"
    }
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=system,
    messages=[{"role": "user", "content": "How should I implement rate limiting?"}]
)

print(response.content[0].text)

Streaming Responses

Stream Claude responses for better UX:
from anthropic import Anthropic


def stream_response(prompt: str, system: str = "") -> str:
    """Stream a response from Claude."""
    client = Anthropic()
    
    full_response = ""
    
    with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            full_response += text
    
    print()  # Newline at end
    return full_response


# Usage
response = stream_response(
    "Write a haiku about programming",
    system="You are a creative poet."
)

Async Streaming

import asyncio
from anthropic import AsyncAnthropic


async def async_stream_response(prompt: str) -> str:
    """Async streaming from Claude."""
    client = AsyncAnthropic()
    
    full_response = ""
    
    async with client.messages.stream(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    ) as stream:
        async for text in stream.text_stream:
            print(text, end="", flush=True)
            full_response += text
    
    print()
    return full_response


# Usage
asyncio.run(async_stream_response("Explain async programming."))

Tool Use with Claude

Claude has powerful tool/function calling capabilities:
from anthropic import Anthropic
import json


def get_weather(location: str, unit: str = "celsius") -> dict:
    """Simulated weather API."""
    return {
        "location": location,
        "temperature": 22 if unit == "celsius" else 72,
        "unit": unit,
        "conditions": "sunny"
    }


def search_database(query: str, limit: int = 10) -> list:
    """Simulated database search."""
    return [
        {"id": 1, "title": f"Result for: {query}", "score": 0.95},
        {"id": 2, "title": f"Another result for: {query}", "score": 0.87},
    ]


# Define tools for Claude
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g., 'London, UK'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "search_database",
        "description": "Search the knowledge database",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                },
                "limit": {
                    "type": "integer",
                    "description": "Maximum results to return"
                }
            },
            "required": ["query"]
        }
    }
]

# Tool execution mapping
tool_functions = {
    "get_weather": get_weather,
    "search_database": search_database
}


def run_with_tools(user_message: str) -> str:
    """Run a conversation with tool use."""
    client = Anthropic()
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        
        # Check if Claude wants to use tools
        if response.stop_reason == "tool_use":
            # Process tool calls
            tool_results = []
            
            for block in response.content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input
                    
                    # Execute the tool
                    if tool_name in tool_functions:
                        result = tool_functions[tool_name](**tool_input)
                    else:
                        result = {"error": f"Unknown tool: {tool_name}"}
                    
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })
            
            # Add assistant message and tool results
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        
        else:
            # No more tool calls, return final response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text
            
            return ""


# Usage
response = run_with_tools("What's the weather in Tokyo and Paris?")
print(response)

Vision Capabilities

Claude can analyze images:
import base64
import httpx
from anthropic import Anthropic
from pathlib import Path


def encode_image_to_base64(image_path: str) -> str:
    """Encode a local image to base64."""
    with open(image_path, "rb") as f:
        return base64.standard_b64encode(f.read()).decode("utf-8")


def get_image_media_type(image_path: str) -> str:
    """Get media type from file extension."""
    suffix = Path(image_path).suffix.lower()
    media_types = {
        ".jpg": "image/jpeg",
        ".jpeg": "image/jpeg",
        ".png": "image/png",
        ".gif": "image/gif",
        ".webp": "image/webp"
    }
    return media_types.get(suffix, "image/jpeg")


def analyze_image(image_source: str, prompt: str) -> str:
    """Analyze an image with Claude Vision."""
    client = Anthropic()
    
    # Determine if URL or file path
    if image_source.startswith(("http://", "https://")):
        # Fetch and encode URL
        image_data = base64.standard_b64encode(
            httpx.get(image_source).content
        ).decode("utf-8")
        media_type = "image/jpeg"  # Default for URLs
    else:
        # Local file
        image_data = encode_image_to_base64(image_source)
        media_type = get_image_media_type(image_source)
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data
                        }
                    },
                    {
                        "type": "text",
                        "text": prompt
                    }
                ]
            }
        ]
    )
    
    return response.content[0].text


def compare_images(image_paths: list[str], prompt: str) -> str:
    """Compare multiple images with Claude."""
    client = Anthropic()
    
    content = []
    
    for path in image_paths:
        content.append({
            "type": "image",
            "source": {
                "type": "base64",
                "media_type": get_image_media_type(path),
                "data": encode_image_to_base64(path)
            }
        })
    
    content.append({"type": "text", "text": prompt})
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": content}]
    )
    
    return response.content[0].text


# Usage
analysis = analyze_image(
    "screenshot.png",
    "Describe this UI and suggest improvements."
)
print(analysis)

Extended Thinking

Claude’s extended thinking mode for complex reasoning:
from anthropic import Anthropic


def solve_with_thinking(problem: str, budget_tokens: int = 10000) -> dict:
    """Use Claude's extended thinking for complex problems."""
    client = Anthropic()
    
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=16000,
        thinking={
            "type": "enabled",
            "budget_tokens": budget_tokens
        },
        messages=[{"role": "user", "content": problem}]
    )
    
    result = {
        "thinking": "",
        "answer": ""
    }
    
    for block in response.content:
        if block.type == "thinking":
            result["thinking"] = block.thinking
        elif block.type == "text":
            result["answer"] = block.text
    
    return result


# Usage
problem = """
A farmer has 100 meters of fencing and wants to enclose a rectangular 
area next to a river (no fencing needed on the river side). What 
dimensions maximize the enclosed area?
"""

solution = solve_with_thinking(problem)
print("Thinking process:")
print(solution["thinking"][:500] + "...")
print("\nFinal answer:")
print(solution["answer"])

Token Counting and Cost Management

from anthropic import Anthropic


def count_tokens(text: str, model: str = "claude-sonnet-4-20250514") -> int:
    """Count tokens in text using Anthropic's tokenizer."""
    client = Anthropic()
    
    # Use the count_tokens endpoint
    result = client.messages.count_tokens(
        model=model,
        messages=[{"role": "user", "content": text}]
    )
    
    return result.input_tokens


class CostTracker:
    """Track Claude API usage and costs."""
    
    # Pricing per million tokens (as of 2024)
    PRICING = {
        "claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
        "claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
        "claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
        "claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
    }
    
    def __init__(self):
        self.total_input_tokens = 0
        self.total_output_tokens = 0
        self.requests = []
    
    def record_usage(self, model: str, input_tokens: int, output_tokens: int):
        """Record token usage from a request."""
        self.total_input_tokens += input_tokens
        self.total_output_tokens += output_tokens
        
        pricing = self.PRICING.get(model, {"input": 0, "output": 0})
        cost = (
            (input_tokens / 1_000_000) * pricing["input"] +
            (output_tokens / 1_000_000) * pricing["output"]
        )
        
        self.requests.append({
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cost": cost
        })
    
    def get_total_cost(self) -> float:
        """Get total cost across all requests."""
        return sum(r["cost"] for r in self.requests)
    
    def get_summary(self) -> dict:
        """Get usage summary."""
        return {
            "total_requests": len(self.requests),
            "total_input_tokens": self.total_input_tokens,
            "total_output_tokens": self.total_output_tokens,
            "total_cost": self.get_total_cost()
        }


# Usage with tracking
tracker = CostTracker()
client = Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

tracker.record_usage(
    "claude-sonnet-4-20250514",
    response.usage.input_tokens,
    response.usage.output_tokens
)

print(tracker.get_summary())

Error Handling and Retries

import time
from anthropic import Anthropic, APIError, RateLimitError, APIConnectionError


def robust_claude_call(
    messages: list,
    model: str = "claude-sonnet-4-20250514",
    max_retries: int = 3,
    base_delay: float = 1.0
) -> str:
    """Make a robust Claude API call with retries."""
    client = Anthropic()
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model=model,
                max_tokens=1024,
                messages=messages
            )
            return response.content[0].text
            
        except RateLimitError as e:
            if attempt < max_retries - 1:
                delay = base_delay * (2 ** attempt)
                print(f"Rate limited. Retrying in {delay}s...")
                time.sleep(delay)
            else:
                raise
                
        except APIConnectionError as e:
            if attempt < max_retries - 1:
                delay = base_delay * (2 ** attempt)
                print(f"Connection error. Retrying in {delay}s...")
                time.sleep(delay)
            else:
                raise
                
        except APIError as e:
            # Don't retry on client errors (4xx)
            if e.status_code and 400 <= e.status_code < 500:
                raise
            
            if attempt < max_retries - 1:
                delay = base_delay * (2 ** attempt)
                print(f"API error. Retrying in {delay}s...")
                time.sleep(delay)
            else:
                raise
    
    raise RuntimeError("Max retries exceeded")


# Usage
response = robust_claude_call([
    {"role": "user", "content": "Hello!"}
])
Claude API Best Practices
  • Use system prompts to establish consistent behavior
  • Stream responses for long outputs to improve UX
  • Implement proper error handling with exponential backoff
  • Track token usage for cost management
  • Use the appropriate model tier for your task complexity

Practice Exercise

Build a Claude-powered assistant with these features:
  1. Multi-turn conversation with memory
  2. Tool use for external data access
  3. Image analysis capabilities
  4. Cost tracking per session
  5. Graceful error handling
Focus on:
  • Effective system prompt design
  • Proper conversation state management
  • Robust error recovery
  • Usage monitoring and limits

Interview Deep-Dive

Strong Answer:
  • The API structures are similar but have important differences in how they handle conversations. OpenAI uses a flat messages array where the system prompt is a message with role “system.” Anthropic separates the system prompt into its own top-level parameter, which is a cleaner abstraction because the system prompt is architecturally different from conversation messages. This separation matters when you are managing conversation state: with Anthropic, you never accidentally trim the system prompt when truncating message history.
  • Tool use (function calling) works differently between the two. OpenAI returns tool calls as part of the assistant message and expects tool results as separate messages with role “tool.” Anthropic uses content blocks: the assistant response contains both text blocks and tool_use blocks, and tool results are sent as tool_result content blocks in the next user message. Anthropic’s block-based approach is more flexible for multi-tool calls but requires slightly different message construction logic.
  • For streaming, both support SSE, but Anthropic’s streaming API provides more structured events (message_start, content_block_start, content_block_delta, etc.) versus OpenAI’s simpler chunk-based stream. Anthropic’s approach gives you better control over rendering multi-modal responses (text + tool calls) in real time.
  • Extended thinking is a differentiator for Claude. When you need the model to work through complex reasoning before responding, the thinking parameter lets you allocate a token budget specifically for internal reasoning. The thinking content is returned separately from the answer, so you can log it for debugging without exposing it to users. OpenAI does not have a direct equivalent; o1 models have internal reasoning but do not expose it.
  • Cost and rate limits differ significantly by tier. For high-volume applications, Anthropic offers prompt caching for system prompts and frequently reused context, which can reduce costs by 90% on cached tokens. This is a major advantage when your system prompt is long or when you are using few-shot examples that repeat across requests.
Follow-up: How do you architect an application to support multiple LLM providers so you can switch between them?I define a common interface (an abstract class or protocol) with methods like complete, stream, and count_tokens. Each provider gets a concrete implementation that translates between the common interface and the provider-specific API. The key design decisions are: how to handle provider-specific features (like extended thinking) without leaking abstractions, and how to normalize the response format. I use a common response model with fields like text, tool_calls, usage, and stop_reason, and each provider adapter maps their native response into this format. For features unique to one provider, I use an “extras” dict that callers can optionally access. This lets you switch providers in config without changing application code, and it enables fallback chains where you try the primary provider and fall back to the secondary on errors.
Strong Answer:
  • Extended thinking is worth the cost for tasks where the reasoning process is complex and the quality of the final answer depends heavily on working through intermediate steps. Math problems, multi-constraint planning, complex code generation, and multi-hop reasoning questions all benefit significantly. For simple factual questions, summaries, or creative writing, extended thinking adds cost and latency without improving quality.
  • The budget_tokens parameter controls how many tokens the model can use for internal reasoning. Setting it too low (under 2000) can cut off the model’s reasoning mid-thought, producing worse results than no thinking at all. Setting it too high (over 20000) wastes tokens on tasks that do not need that much deliberation. My approach is to start with a moderate budget (5000-10000) and measure answer quality on a benchmark set, then adjust.
  • A practical pattern I use is adaptive thinking budgets. For a coding assistant, I classify the request difficulty using a lightweight model call: “Is this a simple syntax question, a moderate implementation task, or a complex architecture problem?” Simple questions get no extended thinking, moderate tasks get 5000 tokens, and complex problems get 15000. This keeps average cost low while giving hard problems the reasoning space they need.
  • One important nuance: the thinking tokens count toward your billing but are not deducted from the max_tokens for the actual response. So if you set budget_tokens=10000 and max_tokens=16000, the model can use up to 10000 tokens for thinking and up to 16000 for the response. You are paying for up to 26000 output tokens total. Plan your cost estimates accordingly.
  • The thinking content itself is valuable for debugging and transparency. I log it (with PII scrubbing) and use it to diagnose cases where the model gets the wrong answer. Often the thinking reveals exactly where the reasoning went wrong, which helps you improve the system prompt or add relevant context.
Follow-up: Can you use extended thinking with streaming, and how does it affect the user experience?Yes, and this is where the implementation gets interesting. With streaming enabled, Anthropic first streams the thinking blocks, then streams the text blocks. In a user-facing application, you typically do not want to show the raw thinking to the user, but you do want to show a loading indicator that something is happening. I display a “Thinking…” state while thinking blocks stream, then switch to streaming the actual response text when the first text content block starts. This way the user sees activity during the thinking phase and gets immediate streaming when the answer starts. The trade-off is that the thinking phase can take 5-15 seconds for complex problems, so you need to set user expectations. Some teams show a brief summary of the thinking process (extracted by looking at the first and last sentences of the thinking block) to give users confidence that the model is working on their problem.
Strong Answer:
  • The core challenge is that Claude’s context window has a hard limit, and every turn in the conversation consumes tokens from that budget. The system prompt, conversation history, and the current user message all compete for the same space. As conversations grow, you must choose what to keep and what to drop.
  • The simplest strategy is sliding window: keep the system prompt, the first user message (for task context), and the last N message pairs. The Conversation class in this chapter stores all messages but sends all of them to the API, which will fail once the context is exceeded. In production, I implement a token counter that measures the total tokens before each API call and trims from the middle of the conversation (keeping the beginning and end) when approaching the limit.
  • A more sophisticated approach is summarization. When the conversation exceeds a threshold (say, 70% of the context window), I take the oldest messages that are about to be trimmed and generate a summary using a cheap, fast model (claude-3-haiku or gpt-4o-mini). This summary replaces the old messages, compressing the history while preserving key information. The trade-off is that summaries lose detail, so if the user references something specific from an earlier message, the model might not have the exact wording.
  • For applications where conversation state matters (like a coding assistant that has seen multiple file edits), I use a structured memory approach. Instead of keeping raw message history, I maintain a running state object that tracks key decisions, code snippets, and user preferences. This state object is injected into the system prompt and stays compact regardless of conversation length.
  • Token counting with Claude is straightforward using the count_tokens endpoint, which gives you an exact count before making the actual API call. I call this before every message and implement the trimming logic if the count exceeds 80% of the model’s limit, leaving 20% headroom for the response.
  • One pitfall teams hit: they forget that the system prompt contributes to the token count. A 2000-token system prompt means you effectively have 2000 fewer tokens for conversation history. Keep system prompts concise and consider moving dynamic content (like few-shot examples) into the message history where it can be trimmed.
Follow-up: How do you handle the case where a user explicitly references something from early in the conversation that has been trimmed?This is the hard problem in conversation management. I handle it in two ways. First, proactive detection: when summarizing old messages, I extract and preserve any commitments, decisions, or specific facts that the user is likely to reference later. These go into the structured memory rather than the summary. Second, reactive recovery: if the model’s response indicates confusion or states it does not have context about something the user mentioned, I have a recovery mechanism that searches the full conversation history (stored separately from what is sent to the API) for relevant messages and injects them into the next turn. This is similar to RAG but applied to conversation history rather than a document corpus.