Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

LLM outputs can be unpredictable. The model might return perfectly formatted JSON one time and free-form prose the next. It might include extra fields, omit required ones, or wrap the JSON in markdown code blocks. In production, this unreliability is not just annoying — it is a source of runtime crashes, silent data corruption, and late-night pager alerts. This chapter covers techniques to validate, parse, and ensure structured outputs from language models, ranging from “just use a library” to “build your own validation pipeline.”

Validation Strategy Decision Framework

ScenarioRecommended ApproachWhy
OpenAI with Pydantic schemasresponse_format with strict: trueGuaranteed valid JSON from constrained decoding — no validation needed
OpenAI with business logic validationInstructor + Pydantic validatorsAuto-retry feeds validation errors back to model
Open-source models (Ollama, vLLM)Instructor or custom JSON extraction + PydanticOpen-source models do not support constrained decoding via API
Multi-provider setupInstructor (provider-agnostic)Works with OpenAI, Anthropic, Ollama, etc. via same interface
Legacy system / no libraryFallback parser chainTry direct JSON, code block, embedded, key-value, then LLM repair
Safety-critical outputsLLM-as-validator + deterministic checksBelt-and-suspenders: programmatic validation catches structure, LLM catches semantics
High-volume batch processingstrict: true + schema (no retries)Retries are too expensive at scale; fail fast and log for review

Instructor for Validated Outputs

Instructor is the most popular library for getting structured outputs from LLMs, and for good reason: it wraps the OpenAI client with Pydantic validation and automatic retries. When the model returns invalid data, Instructor automatically sends the validation error back to the model and asks it to fix the output. Think of it as a patient teacher who keeps handing back the assignment until the student gets it right.

Basic Usage

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional


# Patch the OpenAI client -- this wraps the API with Pydantic validation.
# After this line, you can pass response_model to any completion call.
client = instructor.from_openai(OpenAI())


class UserInfo(BaseModel):
    """Structured user information."""
    name: str = Field(description="Full name of the user")
    age: int = Field(ge=0, le=150, description="Age in years")
    email: Optional[str] = Field(None, description="Email address")
    occupation: str = Field(description="Current occupation")


def extract_user_info(text: str) -> UserInfo:
    """Extract structured user info from text."""
    return client.chat.completions.create(
        model="gpt-4o-mini",
        response_model=UserInfo,
        messages=[
            {
                "role": "user",
                "content": f"Extract user information from: {text}"
            }
        ]
    )


# Usage
text = """
Hi, I'm Sarah Johnson. I'm 28 years old and work as a software 
engineer at a tech startup. You can reach me at sarah.j@email.com
"""

user = extract_user_info(text)
print(f"Name: {user.name}")
print(f"Age: {user.age}")
print(f"Email: {user.email}")
print(f"Occupation: {user.occupation}")

Complex Nested Structures

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field
from typing import Optional
from enum import Enum


client = instructor.from_openai(OpenAI())


class Priority(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"
    CRITICAL = "critical"


class SubTask(BaseModel):
    """A subtask within a task."""
    title: str
    estimated_hours: float = Field(ge=0)
    completed: bool = False


class Task(BaseModel):
    """A structured task with subtasks."""
    title: str
    description: str
    priority: Priority
    assignee: Optional[str] = None
    subtasks: list[SubTask] = Field(default_factory=list)
    tags: list[str] = Field(default_factory=list)


class ProjectPlan(BaseModel):
    """Complete project plan."""
    project_name: str
    objective: str
    tasks: list[Task]
    total_estimated_hours: float = Field(ge=0)


def create_project_plan(description: str) -> ProjectPlan:
    """Generate a structured project plan."""
    return client.chat.completions.create(
        model="gpt-4o",
        response_model=ProjectPlan,
        messages=[
            {
                "role": "system",
                "content": "You are a project planning assistant. Create detailed, actionable project plans."
            },
            {
                "role": "user",
                "content": f"Create a project plan for: {description}"
            }
        ]
    )


# Usage
plan = create_project_plan(
    "Build a REST API for a todo application with user authentication"
)

print(f"Project: {plan.project_name}")
print(f"Objective: {plan.objective}")
print(f"Total Hours: {plan.total_estimated_hours}")

for task in plan.tasks:
    print(f"\nTask: {task.title} [{task.priority.value}]")
    for subtask in task.subtasks:
        print(f"  - {subtask.title} ({subtask.estimated_hours}h)")

Retry Logic with Validation

This is where Instructor really shines. You define Pydantic validators that encode your business rules, and when the model violates them, Instructor sends the validation error back as context so the model can self-correct. The combination of instructor.max_retries (validation retries) and tenacity.retry (network retries) gives you a robust two-layer retry strategy.
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, field_validator
from tenacity import retry, stop_after_attempt


client = instructor.from_openai(OpenAI())


class CodeReview(BaseModel):
    """Structured code review result."""
    summary: str = Field(min_length=10, max_length=500)
    issues: list[str] = Field(min_length=1)
    score: int = Field(ge=1, le=10)
    suggested_improvements: list[str]
    
    @field_validator("issues")
    @classmethod
    def validate_issues(cls, v):
        if not v:
            raise ValueError("At least one issue must be identified")
        return v
    
    @field_validator("score")
    @classmethod
    def validate_score(cls, v):
        if v < 1 or v > 10:
            raise ValueError("Score must be between 1 and 10")
        return v


@retry(stop=stop_after_attempt(3))
def review_code(code: str) -> CodeReview:
    """Review code with automatic retries on validation failure."""
    return client.chat.completions.create(
        model="gpt-4o",
        response_model=CodeReview,
        max_retries=3,
        messages=[
            {
                "role": "system",
                "content": "You are a senior code reviewer. Provide detailed, constructive feedback."
            },
            {
                "role": "user",
                "content": f"Review this code:\n\n```python\n{code}\n```"
            }
        ]
    )


# Usage
code = """
def get_user(id):
    query = f"SELECT * FROM users WHERE id = {id}"
    return db.execute(query)
"""

review = review_code(code)
print(f"Score: {review.score}/10")
print(f"Summary: {review.summary}")
print(f"Issues: {review.issues}")

Custom Validation Strategies

Sometimes you do not need the model to produce structured output — you need to extract structured data from whatever the model produces. This is common when you are working with models that do not support structured outputs natively (open-source models via Ollama, for example) or when you are post-processing outputs from a pipeline you do not fully control.

Regex-Based Extraction

import re
from dataclasses import dataclass
from typing import Optional


@dataclass
class ExtractedData:
    """Data extracted using regex patterns."""
    emails: list[str]
    phone_numbers: list[str]
    urls: list[str]
    dates: list[str]


class RegexExtractor:
    """Extract structured data using regex patterns."""
    
    PATTERNS = {
        "email": r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',
        "phone": r'\b(?:\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}\b',
        "url": r'https?://(?:www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b(?:[-a-zA-Z0-9()@:%_\+.~#?&//=]*)',
        "date": r'\b\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b|\b(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]* \d{1,2},? \d{4}\b',
    }
    
    def extract(self, text: str) -> ExtractedData:
        """Extract all structured data from text."""
        return ExtractedData(
            emails=re.findall(self.PATTERNS["email"], text),
            phone_numbers=re.findall(self.PATTERNS["phone"], text),
            urls=re.findall(self.PATTERNS["url"], text),
            dates=re.findall(self.PATTERNS["date"], text, re.IGNORECASE)
        )
    
    def extract_pattern(self, text: str, pattern_name: str) -> list[str]:
        """Extract specific pattern from text."""
        pattern = self.PATTERNS.get(pattern_name)
        if not pattern:
            raise ValueError(f"Unknown pattern: {pattern_name}")
        return re.findall(pattern, text)


# Usage
extractor = RegexExtractor()

llm_output = """
Contact us at support@example.com or sales@company.org.
Call 555-123-4567 or +1 (800) 555-0199.
Visit https://www.example.com for more info.
Meeting scheduled for 12/25/2024.
"""

data = extractor.extract(llm_output)
print(f"Emails: {data.emails}")
print(f"Phones: {data.phone_numbers}")
print(f"URLs: {data.urls}")
print(f"Dates: {data.dates}")

JSON Extraction and Validation

This is the Swiss Army knife of LLM output parsing. Models love to wrap JSON in markdown code blocks, prefix it with “Here’s the data:”, or include trailing commentary. The extractor below handles all of these cases gracefully by trying multiple parsing strategies in order.
import json
import re
from typing import Any, Optional, Type, TypeVar
from pydantic import BaseModel, ValidationError


T = TypeVar("T", bound=BaseModel)


class JSONExtractor:
    """Extract and validate JSON from LLM outputs."""
    
    @staticmethod
    def extract_json(text: str) -> Optional[dict]:
        """Extract JSON from text, handling various formats.
        
        Strategy: try the simplest approach first, then progressively
        more aggressive extraction. This ordering matters -- direct parse
        is cheapest, LLM-based repair is most expensive.
        """
        # Strategy 1: Maybe the model returned clean JSON
        try:
            return json.loads(text)
        except json.JSONDecodeError:
            pass
        
        # Strategy 2: JSON wrapped in markdown code blocks (very common)
        code_block = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
        if code_block:
            try:
                return json.loads(code_block.group(1).strip())
            except json.JSONDecodeError:
                pass
        
        # Strategy 3: JSON embedded in prose ("Here's the result: {...}")
        json_match = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', text)
        if json_match:
            try:
                return json.loads(json_match.group(1))
            except json.JSONDecodeError:
                pass
        
        return None
    
    @classmethod
    def extract_and_validate(
        cls,
        text: str,
        model: Type[T]
    ) -> tuple[Optional[T], list[str]]:
        """Extract JSON and validate against Pydantic model."""
        errors = []
        
        data = cls.extract_json(text)
        if data is None:
            errors.append("Could not extract JSON from response")
            return None, errors
        
        try:
            validated = model.model_validate(data)
            return validated, []
        except ValidationError as e:
            for error in e.errors():
                field = ".".join(str(loc) for loc in error["loc"])
                errors.append(f"{field}: {error['msg']}")
            return None, errors


# Usage
class ProductInfo(BaseModel):
    name: str
    price: float
    in_stock: bool
    categories: list[str]


llm_output = """
Here's the product information:

```json
{
    "name": "Wireless Headphones",
    "price": 99.99,
    "in_stock": true,
    "categories": ["electronics", "audio"]
}
""" product, errors = JSONExtractor.extract_and_validate(llm_output, ProductInfo) if product: print(f”Product: - $”) else: print(f”Errors: “)

### Multi-Step Validation Pipeline

For production systems, you often need multiple validation steps chained together: first check that the response is not empty, then extract JSON, then validate required fields, then check field types. This pipeline pattern makes each step composable and testable independently -- and when validation fails, the error message tells you exactly which step caught the problem.

```python
from dataclasses import dataclass
from typing import Callable, Any, Optional
from openai import OpenAI


@dataclass
class ValidationResult:
    """Result of a validation step."""
    valid: bool
    data: Any
    errors: list[str]


class ValidationPipeline:
    """Multi-step validation pipeline for LLM outputs."""
    
    def __init__(self):
        self.steps: list[tuple[str, Callable]] = []
    
    def add_step(
        self,
        name: str,
        validator: Callable[[Any], ValidationResult]
    ) -> "ValidationPipeline":
        """Add a validation step."""
        self.steps.append((name, validator))
        return self
    
    def validate(self, data: Any) -> ValidationResult:
        """Run all validation steps."""
        current_data = data
        all_errors = []
        
        for name, validator in self.steps:
            result = validator(current_data)
            
            if not result.valid:
                all_errors.extend([f"[{name}] {e}" for e in result.errors])
                return ValidationResult(
                    valid=False,
                    data=current_data,
                    errors=all_errors
                )
            
            current_data = result.data
        
        return ValidationResult(
            valid=True,
            data=current_data,
            errors=[]
        )


# Validation functions
def validate_not_empty(data: str) -> ValidationResult:
    if not data or not data.strip():
        return ValidationResult(False, data, ["Response is empty"])
    return ValidationResult(True, data.strip(), [])


def validate_json_format(data: str) -> ValidationResult:
    extracted = JSONExtractor.extract_json(data)
    if extracted is None:
        return ValidationResult(False, data, ["Invalid JSON format"])
    return ValidationResult(True, extracted, [])


def validate_required_fields(required: list[str]):
    def validator(data: dict) -> ValidationResult:
        missing = [f for f in required if f not in data]
        if missing:
            return ValidationResult(
                False, data, [f"Missing required field: {f}" for f in missing]
            )
        return ValidationResult(True, data, [])
    return validator


def validate_field_types(type_map: dict):
    def validator(data: dict) -> ValidationResult:
        errors = []
        for field, expected_type in type_map.items():
            if field in data and not isinstance(data[field], expected_type):
                errors.append(
                    f"Field '{field}' should be {expected_type.__name__}"
                )
        if errors:
            return ValidationResult(False, data, errors)
        return ValidationResult(True, data, [])
    return validator


# Usage
pipeline = ValidationPipeline()
pipeline.add_step("not_empty", validate_not_empty)
pipeline.add_step("json_format", validate_json_format)
pipeline.add_step("required_fields", validate_required_fields(["name", "value"]))
pipeline.add_step("field_types", validate_field_types({"name": str, "value": (int, float)}))

# Test with LLM output
llm_output = '{"name": "temperature", "value": 72.5}'

result = pipeline.validate(llm_output)
if result.valid:
    print(f"Valid data: {result.data}")
else:
    print(f"Validation errors: {result.errors}")

LLM-Based Validation

This is the “quis custodiet ipsos custodes” pattern — using an LLM to watch another LLM. It sounds circular, but it works because the validator model operates on a simpler, more constrained task (binary judgment) than the generator model (open-ended creation). The key insight: validation is easier than generation. A model that cannot reliably write a factually accurate summary can still reliably judge whether a given summary is factually accurate, especially when given the source material. Use an LLM to validate another LLM’s output.
from openai import OpenAI
import json


class LLMValidator:
    """Use LLM to validate outputs."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def validate_factual(
        self,
        claim: str,
        context: str = ""
    ) -> dict:
        """Check if a claim is factually accurate."""
        prompt = f"""Evaluate the factual accuracy of this claim:

Claim: {claim}

{f"Context: {context}" if context else ""}

Respond with JSON:
{{
    "is_accurate": true/false,
    "confidence": 0.0-1.0,
    "reasoning": "explanation",
    "corrections": ["list of corrections if inaccurate"]
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def validate_consistency(
        self,
        statements: list[str]
    ) -> dict:
        """Check if statements are consistent with each other."""
        prompt = f"""Check these statements for logical consistency:

Statements:
{chr(10).join(f"{i+1}. {s}" for i, s in enumerate(statements))}

Respond with JSON:
{{
    "is_consistent": true/false,
    "contradictions": [
        {{"statement_1": index, "statement_2": index, "explanation": "..."}}
    ],
    "overall_assessment": "summary"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def validate_format(
        self,
        output: str,
        expected_format: str
    ) -> dict:
        """Validate output matches expected format."""
        prompt = f"""Check if this output matches the expected format:

Output:
{output}

Expected format:
{expected_format}

Respond with JSON:
{{
    "matches_format": true/false,
    "issues": ["list of format issues"],
    "suggested_fix": "corrected version if needed"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def validate_safety(
        self,
        content: str
    ) -> dict:
        """Check content for safety issues."""
        prompt = f"""Analyze this content for safety issues:

Content:
{content}

Check for:
1. Harmful or dangerous instructions
2. Personal information exposure
3. Inappropriate content
4. Potential misuse

Respond with JSON:
{{
    "is_safe": true/false,
    "issues": [
        {{"type": "category", "severity": "low/medium/high", "description": "..."}}
    ],
    "recommendation": "action to take"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)


# Usage
validator = LLMValidator()

# Check factual accuracy
result = validator.validate_factual(
    "Python was created by Guido van Rossum in 1989."
)
print(f"Accurate: {result['is_accurate']}")
print(f"Confidence: {result['confidence']}")

# Check consistency
statements = [
    "The meeting is at 3 PM.",
    "Everyone should arrive by 2:30 PM.",
    "The meeting was rescheduled to 4 PM."
]
result = validator.validate_consistency(statements)
print(f"Consistent: {result['is_consistent']}")

Fallback Parsing Strategies

Real-world LLM outputs are messy. Sometimes the model returns clean JSON. Sometimes it wraps it in a code block. Sometimes it returns key-value pairs. And sometimes it returns something entirely unexpected. The fallback parser below tries each parsing strategy in order from cheapest to most expensive, and stops at the first success. The final fallback — calling another LLM to fix the output — is expensive but remarkably effective as a last resort.
from typing import Any, Optional, Callable
from dataclasses import dataclass
import json


@dataclass
class ParseResult:
    """Result of parsing attempt."""
    success: bool
    data: Any
    method: str
    error: Optional[str] = None


class FallbackParser:
    """Try multiple parsing strategies with fallbacks."""
    
    def __init__(self):
        self.parsers: list[tuple[str, Callable]] = []
    
    def add_parser(
        self,
        name: str,
        parser: Callable[[str], Any]
    ) -> "FallbackParser":
        """Add a parser to the fallback chain."""
        self.parsers.append((name, parser))
        return self
    
    def parse(self, text: str) -> ParseResult:
        """Try parsers in order until one succeeds."""
        for name, parser in self.parsers:
            try:
                result = parser(text)
                return ParseResult(
                    success=True,
                    data=result,
                    method=name
                )
            except Exception as e:
                continue
        
        return ParseResult(
            success=False,
            data=None,
            method="none",
            error="All parsers failed"
        )


# Parser functions
def parse_direct_json(text: str) -> dict:
    return json.loads(text)


def parse_code_block_json(text: str) -> dict:
    import re
    match = re.search(r'```(?:json)?\s*([\s\S]*?)```', text)
    if match:
        return json.loads(match.group(1))
    raise ValueError("No code block found")


def parse_embedded_json(text: str) -> dict:
    import re
    match = re.search(r'(\{[\s\S]*\})', text)
    if match:
        return json.loads(match.group(1))
    raise ValueError("No JSON object found")


def parse_key_value(text: str) -> dict:
    """Parse key: value format."""
    import re
    result = {}
    for line in text.split("\n"):
        match = re.match(r'^\s*["\']?(\w+)["\']?\s*[:=]\s*(.+)$', line)
        if match:
            key = match.group(1)
            value = match.group(2).strip().strip('"\'')
            # Try to convert to appropriate type
            try:
                value = json.loads(value)
            except:
                pass
            result[key] = value
    if not result:
        raise ValueError("No key-value pairs found")
    return result


def parse_with_llm(client, model: str = "gpt-4o-mini"):
    """Create an LLM-based parser as last resort."""
    def parser(text: str) -> dict:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {
                    "role": "system",
                    "content": "Extract structured data as JSON from the given text."
                },
                {"role": "user", "content": text}
            ],
            response_format={"type": "json_object"}
        )
        return json.loads(response.choices[0].message.content)
    return parser


# Usage
from openai import OpenAI

client = OpenAI()

parser = FallbackParser()
parser.add_parser("direct_json", parse_direct_json)
parser.add_parser("code_block", parse_code_block_json)
parser.add_parser("embedded", parse_embedded_json)
parser.add_parser("key_value", parse_key_value)
parser.add_parser("llm", parse_with_llm(client))

# Test with various formats
outputs = [
    '{"name": "test", "value": 42}',
    'Here is the data:\n```json\n{"name": "test"}\n```',
    'name: test\nvalue: 42',
    'The result is name=test and value=42',
]

for output in outputs:
    result = parser.parse(output)
    print(f"Method: {result.method}, Data: {result.data}")
Validation Best Practices
  • Always validate LLM outputs before using them
  • Use Pydantic for type-safe structured extraction
  • Implement fallback strategies for robustness
  • Consider LLM-based validation for complex checks
  • Log validation failures for debugging and improvement

Validation Edge Cases

Edge case — partial JSON from streaming: When streaming responses, the model sends tokens incrementally. If you try to validate mid-stream, you will always get invalid JSON. Buffer the full response before validating, or use a streaming JSON parser like ijson that can process incomplete JSON incrementally. Edge case — Unicode and encoding issues: Models sometimes produce Unicode escape sequences (\u00e9 instead of the actual character) or smart quotes that break strict JSON parsers. Always normalize Unicode before parsing: text.encode('utf-8').decode('utf-8'). Edge case — the model refuses to produce output: Content policy violations result in a refusal message instead of your expected JSON. Check finish_reason — if it is "content_filter", the response is not your structured output. Handle this gracefully rather than passing a refusal message to your JSON parser. Edge case — null vs. missing fields: There is a meaningful difference between {"email": null} and {} (no email key). Pydantic’s Optional[str] = None treats both the same, but your business logic might care. Use Pydantic’s model_config = ConfigDict(extra="forbid") to catch unexpected fields, and explicitly distinguish between None (user has no email) and missing (model forgot to extract it). Edge case — numeric precision: LLMs sometimes return "price": 19.99 as "price": 19.990000000000002. If you are doing financial calculations on extracted values, round explicitly in your Pydantic validator. Do not trust the model’s floating-point precision.

Practice Exercise

Build a validation system that:
  1. Extracts structured data from free-form LLM responses
  2. Validates against Pydantic schemas with custom validators
  3. Implements multiple fallback parsing strategies
  4. Uses LLM-based validation for complex checks
  5. Provides detailed error messages for failures
Focus on:
  • Handling edge cases and malformed outputs
  • Performance optimization for high-volume validation
  • Comprehensive logging of validation failures
  • Automatic retry and correction strategies