Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Function calling enables LLMs to interact with external systems by generating structured function calls that your application executes. Think of it like a voice assistant that can press buttons on your behalf: you say “check the weather in Tokyo,” the LLM decides to call get_weather(location="Tokyo"), your code actually fetches the weather, and the LLM weaves the result into a natural response. The model never actually executes code — it just decides which function to call and with what arguments. Your application is always in control of execution.

Function Schema Design

The schema is the menu you hand to the model. A clear, well-documented schema is the single biggest factor in reliable function calling. If your schema descriptions are vague, the model will guess wrong about which function to use and what arguments to pass. Practical tip: write function descriptions as if you are explaining to a new team member when to use each tool.

OpenAI Function Schema

from openai import OpenAI
from typing import Literal

client = OpenAI()

# Define function schemas
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location. Use this when the user asks about weather conditions.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country, e.g., 'London, UK'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"],
                "additionalProperties": False
            },
            "strict": True  # Enables structured outputs
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Search for products in the catalog",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "category": {
                        "type": "string",
                        "enum": ["electronics", "clothing", "books", "home"]
                    },
                    "max_price": {
                        "type": "number",
                        "description": "Maximum price filter"
                    },
                    "in_stock": {
                        "type": "boolean",
                        "description": "Filter for in-stock items only"
                    }
                },
                "required": ["query"],
                "additionalProperties": False
            },
            "strict": True
        }
    }
]

Pydantic Schema Generation

Manually writing JSON schemas is tedious and error-prone. A better approach is to define your parameters as Pydantic models and auto-generate the OpenAI schema. This gives you validation on both sides: the model is constrained to the schema when generating arguments, and Pydantic validates the result before your function runs.
from pydantic import BaseModel, Field
from typing import Optional, List
import json

class WeatherParams(BaseModel):
    """Parameters for weather lookup"""
    location: str = Field(description="City and country, e.g., 'London, UK'")
    unit: Literal["celsius", "fahrenheit"] = Field(
        default="celsius",
        description="Temperature unit"
    )

class ProductSearchParams(BaseModel):
    """Parameters for product search"""
    query: str = Field(description="Search query")
    category: Optional[str] = Field(
        default=None,
        description="Product category filter"
    )
    max_price: Optional[float] = Field(
        default=None,
        description="Maximum price"
    )
    limit: int = Field(
        default=10,
        ge=1,
        le=100,
        description="Number of results"
    )

def pydantic_to_openai_function(model: type[BaseModel], name: str = None) -> dict:
    """Convert Pydantic model to OpenAI function schema"""
    schema = model.model_json_schema()
    
    return {
        "type": "function",
        "function": {
            "name": name or model.__name__.lower(),
            "description": model.__doc__ or "",
            "parameters": {
                "type": "object",
                "properties": schema.get("properties", {}),
                "required": schema.get("required", []),
                "additionalProperties": False
            },
            "strict": True
        }
    }

# Generate tools from Pydantic models
tools = [
    pydantic_to_openai_function(WeatherParams, "get_weather"),
    pydantic_to_openai_function(ProductSearchParams, "search_products")
]

Function Execution Engine

The registry pattern decouples “what functions exist” from “how they get called.” This matters because the LLM returns a function name and arguments as strings — you need a clean way to look up the actual Python function, validate the arguments, execute it, and handle errors. Think of the registry as a switchboard operator connecting calls to the right department.
from dataclasses import dataclass
from typing import Callable, Any, Dict
import json
import inspect

@dataclass
class FunctionResult:
    name: str
    arguments: dict
    result: Any
    success: bool
    error: str = None
    execution_time_ms: float = 0  # Track this -- slow functions hurt UX

class FunctionRegistry:
    """Registry for callable functions.
    
    Register functions once at startup, then execute them by name
    when the LLM requests a tool call. The registry handles both
    sync and async functions transparently.
    """
    
    def __init__(self):
        self.functions: Dict[str, Callable] = {}
        self.schemas: Dict[str, dict] = {}
    
    def register(
        self,
        name: str = None,
        description: str = None,
        param_model: type[BaseModel] = None
    ):
        """Decorator to register a function"""
        def decorator(func: Callable):
            func_name = name or func.__name__
            
            # Generate schema from type hints or param_model
            if param_model:
                schema = pydantic_to_openai_function(param_model, func_name)
            else:
                schema = self._generate_schema_from_hints(func, func_name, description)
            
            self.functions[func_name] = func
            self.schemas[func_name] = schema
            
            return func
        return decorator
    
    def _generate_schema_from_hints(
        self,
        func: Callable,
        name: str,
        description: str = None
    ) -> dict:
        """Generate schema from function type hints"""
        sig = inspect.signature(func)
        hints = func.__annotations__
        
        properties = {}
        required = []
        
        for param_name, param in sig.parameters.items():
            if param_name == "self":
                continue
            
            param_type = hints.get(param_name, str)
            
            # Map Python types to JSON Schema
            type_map = {
                str: "string",
                int: "integer",
                float: "number",
                bool: "boolean",
                list: "array",
                dict: "object"
            }
            
            properties[param_name] = {
                "type": type_map.get(param_type, "string")
            }
            
            if param.default == inspect.Parameter.empty:
                required.append(param_name)
        
        return {
            "type": "function",
            "function": {
                "name": name,
                "description": description or func.__doc__ or "",
                "parameters": {
                    "type": "object",
                    "properties": properties,
                    "required": required
                }
            }
        }
    
    def get_tools(self) -> list:
        """Get all registered tools for OpenAI"""
        return list(self.schemas.values())
    
    async def execute(
        self,
        name: str,
        arguments: dict
    ) -> FunctionResult:
        """Execute a registered function"""
        import time
        
        if name not in self.functions:
            return FunctionResult(
                name=name,
                arguments=arguments,
                result=None,
                success=False,
                error=f"Function '{name}' not found"
            )
        
        start = time.time()
        func = self.functions[name]
        
        try:
            # Check if async
            if inspect.iscoroutinefunction(func):
                result = await func(**arguments)
            else:
                result = func(**arguments)
            
            return FunctionResult(
                name=name,
                arguments=arguments,
                result=result,
                success=True,
                execution_time_ms=(time.time() - start) * 1000
            )
        except Exception as e:
            return FunctionResult(
                name=name,
                arguments=arguments,
                result=None,
                success=False,
                error=str(e),
                execution_time_ms=(time.time() - start) * 1000
            )

# Usage
registry = FunctionRegistry()

@registry.register(description="Get current weather for a location")
async def get_weather(location: str, unit: str = "celsius") -> dict:
    # Actual implementation
    return {
        "location": location,
        "temperature": 22,
        "unit": unit,
        "conditions": "sunny"
    }

@registry.register(description="Search for products")
async def search_products(
    query: str,
    category: str = None,
    max_price: float = None
) -> list:
    # Actual implementation
    return [
        {"name": "Product 1", "price": 29.99},
        {"name": "Product 2", "price": 49.99}
    ]

Parallel Function Execution

import asyncio
from typing import List

class FunctionExecutor:
    """Execute function calls from LLM responses"""
    
    def __init__(self, registry: FunctionRegistry):
        self.registry = registry
    
    async def execute_tool_calls(
        self,
        tool_calls: list,
        parallel: bool = True
    ) -> List[FunctionResult]:
        """Execute multiple tool calls"""
        
        if parallel:
            # Execute all calls in parallel
            tasks = [
                self.registry.execute(
                    tc.function.name,
                    json.loads(tc.function.arguments)
                )
                for tc in tool_calls
            ]
            return await asyncio.gather(*tasks)
        else:
            # Execute sequentially
            results = []
            for tc in tool_calls:
                result = await self.registry.execute(
                    tc.function.name,
                    json.loads(tc.function.arguments)
                )
                results.append(result)
            return results
    
    def format_results_for_llm(
        self,
        tool_calls: list,
        results: List[FunctionResult]
    ) -> list:
        """Format results as tool messages for LLM"""
        messages = []
        
        for tc, result in zip(tool_calls, results):
            if result.success:
                content = json.dumps(result.result)
            else:
                content = json.dumps({"error": result.error})
            
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": content
            })
        
        return messages

# Complete conversation loop
async def function_calling_loop(
    client: OpenAI,
    messages: list,
    registry: FunctionRegistry,
    max_iterations: int = 10
) -> str:
    """Run function calling conversation loop"""
    
    executor = FunctionExecutor(registry)
    
    for _ in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=registry.get_tools(),
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        messages.append(message)
        
        # Check if done
        if not message.tool_calls:
            return message.content
        
        # Execute tool calls
        results = await executor.execute_tool_calls(message.tool_calls)
        
        # Add results to conversation
        tool_messages = executor.format_results_for_llm(
            message.tool_calls,
            results
        )
        messages.extend(tool_messages)
    
    raise Exception("Max iterations exceeded")

Argument Validation

from pydantic import BaseModel, ValidationError, validator
from typing import Optional

class ValidatedWeatherParams(BaseModel):
    location: str
    unit: Literal["celsius", "fahrenheit"] = "celsius"
    
    @validator("location")
    def validate_location(cls, v):
        if len(v) < 2:
            raise ValueError("Location must be at least 2 characters")
        if not any(c.isalpha() for c in v):
            raise ValueError("Location must contain letters")
        return v.strip()

class ArgumentValidator:
    """Validate function arguments before execution"""
    
    def __init__(self, schemas: Dict[str, type[BaseModel]]):
        self.schemas = schemas
    
    def validate(
        self,
        function_name: str,
        arguments: dict
    ) -> tuple[bool, dict, str]:
        """
        Validate arguments against schema.
        Returns: (is_valid, validated_args, error_message)
        """
        schema = self.schemas.get(function_name)
        
        if not schema:
            return True, arguments, None
        
        try:
            validated = schema.model_validate(arguments)
            return True, validated.model_dump(), None
        except ValidationError as e:
            error_messages = []
            for error in e.errors():
                field = ".".join(str(x) for x in error["loc"])
                error_messages.append(f"{field}: {error['msg']}")
            return False, arguments, "; ".join(error_messages)

# Usage with auto-correction
async def validated_function_call(
    client: OpenAI,
    registry: FunctionRegistry,
    validator: ArgumentValidator,
    messages: list,
    max_validation_retries: int = 2
) -> FunctionResult:
    """Execute function call with validation and retry"""
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=registry.get_tools()
    )
    
    tool_call = response.choices[0].message.tool_calls[0]
    func_name = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)
    
    # Validate
    is_valid, validated_args, error = validator.validate(func_name, arguments)
    
    if not is_valid:
        # Ask LLM to fix the arguments
        messages.append(response.choices[0].message)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps({
                "error": f"Validation failed: {error}",
                "hint": "Please correct the arguments and try again"
            })
        })
        
        # Retry
        return await validated_function_call(
            client, registry, validator, messages,
            max_validation_retries - 1
        )
    
    # Execute with validated arguments
    return await registry.execute(func_name, validated_args)

Error Handling Patterns

from enum import Enum
from dataclasses import dataclass

class FunctionErrorType(str, Enum):
    NOT_FOUND = "function_not_found"
    VALIDATION = "validation_error"
    EXECUTION = "execution_error"
    TIMEOUT = "timeout"
    PERMISSION = "permission_denied"

@dataclass
class FunctionError:
    type: FunctionErrorType
    message: str
    function_name: str
    recoverable: bool = True
    suggestion: str = None

class RobustFunctionExecutor:
    """Execute functions with comprehensive error handling"""
    
    def __init__(
        self,
        registry: FunctionRegistry,
        timeout: float = 30.0,
        max_retries: int = 2
    ):
        self.registry = registry
        self.timeout = timeout
        self.max_retries = max_retries
    
    async def execute_with_retry(
        self,
        name: str,
        arguments: dict
    ) -> tuple[Any, FunctionError]:
        """Execute with retry on transient errors"""
        
        last_error = None
        
        for attempt in range(self.max_retries + 1):
            try:
                result = await asyncio.wait_for(
                    self.registry.execute(name, arguments),
                    timeout=self.timeout
                )
                
                if result.success:
                    return result.result, None
                else:
                    last_error = FunctionError(
                        type=FunctionErrorType.EXECUTION,
                        message=result.error,
                        function_name=name
                    )
            
            except asyncio.TimeoutError:
                last_error = FunctionError(
                    type=FunctionErrorType.TIMEOUT,
                    message=f"Function timed out after {self.timeout}s",
                    function_name=name,
                    recoverable=True,
                    suggestion="Try with simpler parameters or split the request"
                )
            
            except Exception as e:
                last_error = FunctionError(
                    type=FunctionErrorType.EXECUTION,
                    message=str(e),
                    function_name=name,
                    recoverable=False
                )
                break
            
            # Exponential backoff for retries
            if attempt < self.max_retries:
                await asyncio.sleep(2 ** attempt)
        
        return None, last_error
    
    def format_error_for_llm(self, error: FunctionError) -> str:
        """Format error message for LLM understanding"""
        response = {
            "error": True,
            "type": error.type.value,
            "message": error.message
        }
        
        if error.suggestion:
            response["suggestion"] = error.suggestion
        
        if error.recoverable:
            response["hint"] = "You may retry with different parameters"
        
        return json.dumps(response)

Tool Choice Control

class ToolChoiceController:
    """Control when and how tools are used"""
    
    @staticmethod
    def force_tool(tool_name: str) -> dict:
        """Force the model to use a specific tool"""
        return {
            "type": "function",
            "function": {"name": tool_name}
        }
    
    @staticmethod
    def no_tools() -> str:
        """Prevent tool usage"""
        return "none"
    
    @staticmethod
    def auto() -> str:
        """Let model decide"""
        return "auto"
    
    @staticmethod
    def required() -> str:
        """Model must use at least one tool"""
        return "required"

# Usage patterns
def query_with_tool_control(
    client: OpenAI,
    messages: list,
    tools: list,
    intent: str
) -> str:
    """Query with appropriate tool control based on intent"""
    
    # Determine tool choice based on intent
    if intent == "lookup":
        # Force tool usage for lookups
        tool_choice = ToolChoiceController.required()
    elif intent == "conversation":
        # Allow but don't require tools
        tool_choice = ToolChoiceController.auto()
    elif intent == "specific_action":
        # Force specific tool
        tool_choice = ToolChoiceController.force_tool("execute_action")
    else:
        tool_choice = ToolChoiceController.auto()
    
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice=tool_choice
    )
    
    return response

Streaming with Function Calls

async def stream_with_functions(
    client: OpenAI,
    messages: list,
    tools: list
):
    """Handle streaming responses with function calls"""
    
    stream = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        stream=True
    )
    
    # Accumulate function call data
    tool_calls = {}
    current_content = ""
    
    for chunk in stream:
        delta = chunk.choices[0].delta
        
        # Handle content
        if delta.content:
            current_content += delta.content
            yield {"type": "content", "data": delta.content}
        
        # Handle tool calls
        if delta.tool_calls:
            for tc in delta.tool_calls:
                idx = tc.index
                
                if idx not in tool_calls:
                    tool_calls[idx] = {
                        "id": tc.id,
                        "function": {"name": "", "arguments": ""}
                    }
                
                if tc.function.name:
                    tool_calls[idx]["function"]["name"] += tc.function.name
                
                if tc.function.arguments:
                    tool_calls[idx]["function"]["arguments"] += tc.function.arguments
        
        # Check finish reason
        if chunk.choices[0].finish_reason == "tool_calls":
            yield {
                "type": "tool_calls",
                "data": list(tool_calls.values())
            }

Key Patterns

PatternUse CaseImplementation
Pydantic SchemasType-safe function definitionsConvert models to JSON Schema
Parallel ExecutionMultiple independent callsasyncio.gather
ValidationArgument correctnessPydantic validators
Retry LogicTransient failuresExponential backoff
Tool Choice ControlDirecting model behaviorforce/auto/none/required

What is Next

LLM Orchestration

Learn to orchestrate multiple LLM providers with unified APIs

Interview Deep-Dive

Strong Answer:
  • The function calling loop is a multi-turn conversation between your application and the LLM. Here is the exact flow. Step one: you send the user’s message plus a list of tool schemas (JSON Schema definitions of your available functions) to the model. Step two: instead of returning a text response, the model returns one or more tool call objects, each containing a function name and JSON arguments. Critically, the model does not execute anything — it just generates structured data describing what it wants to call. Step three: your application parses the tool calls, validates the arguments, executes the actual functions against your APIs or databases, and collects the results. Step four: you append the tool call results as tool-role messages back into the conversation and send it to the model again. Step five: the model either generates another tool call (if it needs more information) or produces a final text response to the user.
  • The loop continues until the model decides it has enough information to answer, or until you hit a max-iterations safety limit. In production I always set a max of 5-10 iterations to prevent runaway loops where the model keeps calling tools without converging on an answer. I have seen cases where a model gets stuck in a cycle calling the same search function with slightly different queries because none of the results satisfy it.
  • The key architectural insight is that the model never directly touches your systems. It is always your code executing the functions and deciding what happens with the results. This is what makes it safe — you control validation, permissions, rate limiting, and error handling at the execution layer, not the LLM layer.
Red Flags: Candidate thinks the LLM executes the functions directly, does not mention the loop structure (thinks it is a single request-response), or does not mention the need for a max-iterations safeguard.Follow-up: The model generates a tool call with invalid arguments — maybe it hallucinates a parameter name or puts a string where a number should go. How do you handle this?I validate every tool call’s arguments against a Pydantic model before execution. If validation fails, I do not crash or silently drop the call. Instead, I send a tool-role message back to the model with a structured error explaining what went wrong: which parameter failed, what was expected, and a hint about how to fix it. Then the model gets another chance to generate correct arguments. I allow up to 2 validation retries before giving up and returning a user-friendly error. In practice, GPT-4o with strict: true in the function schema rarely generates invalid arguments because strict mode uses constrained decoding to guarantee schema conformance. But with non-strict mode or weaker models, I see validation failures on about 2-3% of calls, so the retry mechanism is essential. The Pydantic validation layer also acts as a security boundary — it prevents the model from injecting unexpected parameters that your function was not designed to handle.
Strong Answer:
  • Tool selection problems almost always trace back to one of three root causes: poor tool descriptions, overlapping tool purposes, or missing routing signals.
  • First, I audit every tool’s name and description. The description is the single most important factor in tool selection — it is the model’s only guide for when to use each tool. Vague descriptions like “search for data” cause confusion. I rewrite descriptions to be specific about when to use the tool and when not to: “Search the product catalog by keyword. Use this when the user asks about specific products, pricing, or availability. Do NOT use this for general knowledge questions.” Including explicit negative guidance (“do not use when…”) reduces false selections significantly.
  • Second, I look for overlapping tools. If I have both search_products and get_product_details, the model might call search_products when the user asks about a specific product ID, because the descriptions are not clear about the boundary. I either consolidate overlapping tools or add explicit disambiguation: “Use search_products for keyword searches across the catalog. Use get_product_details only when you have a specific product ID.”
  • Third, I reduce the tool set. With 15 tools, the model spends significant context on schema parsing and the probability of mis-selection increases. I segment tools by intent: a routing step first determines the user’s intent category, then only the 3-5 relevant tools for that category are passed to the model. This two-stage approach cut our tool mis-selection rate from 12% to under 2% at a company I worked at.
  • Finally, I use tool_choice strategically. For lookup intents where I know a tool must be called, I use tool_choice: "required" or even force a specific tool. For conversational turns where tools are optional, I use auto.
Red Flags: Candidate suggests adding more tools to solve the problem, does not mention tool descriptions as the primary lever, or does not consider reducing the tool set per request.Follow-up: How does parallel function calling work, and what gotchas should you watch out for?When the model generates multiple tool calls in a single response (parallel tool calling), your application receives an array of tool call objects. You should execute them concurrently using asyncio.gather rather than sequentially, because they are independent by definition — the model generated them in parallel specifically because they do not depend on each other. The main gotchas are: first, error isolation — if one tool call fails, you need to return the error for that specific call while still returning results for the successful ones. Do not let one failure abort the entire batch. Second, rate limiting — five parallel API calls might hit your external service’s rate limit. I use a semaphore to cap concurrency at 3-5 simultaneous outbound calls. Third, response assembly — each tool result must be sent back with the correct tool_call_id matching the original call. If you mix these up, the model gets confused about which result corresponds to which request. I have debugged this exact issue where swapped IDs caused the model to synthesize nonsensical answers because it attributed a weather API response to a database query.
Strong Answer:
  • Both APIs follow the same conceptual pattern (model suggests tool calls, you execute, you return results), but the implementation details differ in important ways.
  • OpenAI’s function calling uses a tools array with JSON Schema definitions and returns tool calls as a structured tool_calls array on the assistant message. The standout feature is strict: true mode, which uses constrained decoding to guarantee the generated arguments conform exactly to your JSON Schema. This eliminates argument validation errors at the cost of slightly higher latency. OpenAI also supports tool_choice with fine-grained control: auto, required, none, or force-a-specific-tool.
  • Anthropic’s tool use follows a similar pattern but with different ergonomics. The tool definitions go in a top-level tools parameter, and tool calls come back as tool_use content blocks within the response. One key difference is how system messages work: Anthropic separates the system prompt from message history, which affects how you structure tool instructions. Anthropic does not have an equivalent to OpenAI’s strict mode as of my last check, so argument validation on your side is more important.
  • The practical difference that matters most in production is how each handles multi-turn tool conversations and streaming. OpenAI streams tool calls as deltas that you need to accumulate (function name and arguments arrive in chunks), which requires careful buffer management. Anthropic streams tool use blocks more atomically.
  • For choosing between them: if argument schema compliance is critical (financial calculations, database queries), OpenAI’s strict mode is a significant advantage. If you need the model to handle ambiguous, open-ended tool selection with good reasoning about when not to use tools, Claude tends to be more conservative and less trigger-happy with tool calls, which can be either a pro or a con depending on your use case.
Red Flags: Candidate has only used one provider and cannot discuss tradeoffs, confuses function calling with assistants API or agent frameworks, or does not know about strict mode and its implications.Follow-up: How do you design a function calling system that works across multiple providers?I abstract the tool definition layer using Pydantic models as the source of truth. Each tool is defined as a Pydantic BaseModel with typed fields and descriptions, then I have converter functions that generate the provider-specific schema format: pydantic_to_openai_function() and pydantic_to_anthropic_tool(). The execution layer is provider-agnostic — it receives a function name and a dictionary of arguments regardless of which provider generated them. The tricky part is handling provider-specific behaviors: OpenAI might generate null for optional fields while Anthropic omits them entirely, and the tool response format differs between providers. I normalize these differences in a thin adapter layer so the rest of my application code never knows or cares which LLM generated the tool call. This abstraction paid off when we switched from OpenAI to Anthropic for our primary agent — the tool definitions and execution logic stayed identical, and we only changed the adapter layer.