Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Summarization is a core AI capability used across applications from meeting notes to content curation. It sounds simple — “make this shorter” — but production summarization requires solving hard trade-offs: how much detail to preserve, how to handle documents longer than the context window, and how to avoid the model quietly inventing facts that weren’t in the source. The two fundamental approaches are extractive (pull out the most important sentences verbatim) and abstractive (generate new text that captures the key ideas). Extractive is safer for legal and compliance contexts; abstractive produces more readable summaries. Most production systems combine both.

Summarization Strategy Comparison

StrategyFaithfulnessReadabilitySpeedBest For
ExtractiveHighest (verbatim)Lower (choppy)FastLegal, compliance, audit trails
AbstractiveMedium (hallucination risk)Highest (natural prose)MediumNewsletters, executive briefs
Hybrid (extract then rephrase)HighHighSlowerMost production use cases
Map-ReduceMedium (cross-chunk loss)HighSlow (multiple LLM calls)Documents exceeding context window
HierarchicalHigh (preserves structure)HighSlowestStructured reports, books, legal filings
Decision framework:
  • Document fits in context window (under 100K tokens)? Use a single abstractive call — simpler, faster, and the model sees everything.
  • Document exceeds context window? Use map-reduce with 10-15% chunk overlap to prevent info loss at boundaries.
  • Accuracy is critical (legal, medical)? Use extractive or hybrid. Traceable quotes are worth the readability trade-off.
  • Multiple document types? Use content-specific prompts (news, research, meeting, code). Generic “summarize this” produces generic results.

Basic Summarization

Simple Text Summarization

from openai import OpenAI


def summarize_text(
    text: str,
    max_length: str = "medium",
    style: str = "informative"
) -> str:
    """Summarize text with configurable length and style."""
    client = OpenAI()
    
    length_instructions = {
        "brief": "in 1-2 sentences",
        "medium": "in a short paragraph (3-5 sentences)",
        "detailed": "in 2-3 paragraphs with key details"
    }
    
    style_instructions = {
        "informative": "Focus on key facts and information",
        "executive": "Focus on decisions, actions, and business implications",
        "casual": "Use a conversational, easy-to-read tone",
        "technical": "Preserve technical details and terminology"
    }
    
    prompt = f"""Summarize the following text {length_instructions.get(max_length, "in a paragraph")}.

{style_instructions.get(style, "")}

Text:
{text}

Summary:"""
    
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    
    return response.choices[0].message.content


# Usage
article = """
Artificial intelligence has made remarkable strides in recent years, 
particularly in the field of natural language processing. Large language 
models can now write code, analyze documents, and engage in nuanced 
conversations. These advances have led to widespread adoption in enterprise 
settings, with companies using AI for customer service, content creation, 
and data analysis. However, concerns about accuracy, bias, and job 
displacement continue to spark debate among policymakers and technologists.
"""

summary = summarize_text(article, max_length="brief", style="executive")
print(summary)

Extractive vs Abstractive Summarization

from openai import OpenAI
import json


class Summarizer:
    """Multi-mode summarization system."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def extractive_summary(
        self,
        text: str,
        num_sentences: int = 3
    ) -> list[str]:
        """Extract the most important sentences verbatim from text.
        
        Why extractive? When accuracy matters more than readability (legal 
        documents, compliance reports, quotes), you want the original words, 
        not the model's paraphrase. Extractive summaries can be verified 
        against the source with a simple string search.
        """
        prompt = f"""Identify the {num_sentences} most important sentences from this text.
Return them exactly as they appear, preserving the original wording.

Text:
{text}

Return as JSON: {{"sentences": ["sentence 1", "sentence 2", ...]}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        return data.get("sentences", [])
    
    def abstractive_summary(
        self,
        text: str,
        target_length: int = 100
    ) -> str:
        """Generate a new summary capturing key ideas."""
        prompt = f"""Write a summary of approximately {target_length} words.
Capture the main ideas in your own words. Be concise and clear.

Text:
{text}

Summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def hybrid_summary(
        self,
        text: str,
        extract_count: int = 3,
        abstract_length: int = 100
    ) -> dict:
        """Combine extractive and abstractive summarization."""
        # Get key sentences
        key_sentences = self.extractive_summary(text, extract_count)
        
        # Generate abstract summary
        abstract = self.abstractive_summary(text, abstract_length)
        
        return {
            "key_sentences": key_sentences,
            "summary": abstract,
            "method": "hybrid"
        }


# Usage
summarizer = Summarizer()

text = """
The global transition to renewable energy is accelerating faster than many 
analysts predicted. Solar and wind power now account for a significant 
portion of new electricity generation capacity worldwide. Investment in 
clean energy reached record levels last year, surpassing fossil fuel 
investments for the first time. Major automakers have announced ambitious 
plans to phase out internal combustion engines within the next decade. 
Despite this progress, challenges remain, including grid infrastructure 
upgrades, energy storage solutions, and ensuring a just transition for 
workers in traditional energy sectors.
"""

# Extractive
sentences = summarizer.extractive_summary(text, num_sentences=2)
print("Key sentences:")
for s in sentences:
    print(f"  - {s}")

# Abstractive
summary = summarizer.abstractive_summary(text, target_length=50)
print(f"\nAbstract summary:\n{summary}")

Long Document Summarization

When a document exceeds the model’s context window (or is large enough that stuffing it all in produces poor results), you need a chunking strategy. The map-reduce pattern is borrowed from distributed computing: split the problem into independent pieces (map), solve each piece, then combine the results (reduce). It’s the same idea as dividing a 500-page book among 10 interns, having each write a chapter summary, then writing an executive summary from those summaries.

Map-Reduce Pattern

from openai import OpenAI
from dataclasses import dataclass


@dataclass
class DocumentChunk:
    """A chunk of a document."""
    index: int
    text: str
    summary: str = None


class LongDocumentSummarizer:
    """Summarize long documents using map-reduce."""
    
    def __init__(
        self,
        chunk_size: int = 3000,
        model: str = "gpt-4o-mini"
    ):
        self.client = OpenAI()
        self.chunk_size = chunk_size
        self.model = model
    
    def _chunk_text(self, text: str) -> list[DocumentChunk]:
        """Split text into chunks."""
        words = text.split()
        chunks = []
        current_chunk = []
        current_length = 0
        chunk_index = 0
        
        for word in words:
            current_chunk.append(word)
            current_length += len(word) + 1
            
            if current_length >= self.chunk_size:
                chunks.append(DocumentChunk(
                    index=chunk_index,
                    text=" ".join(current_chunk)
                ))
                current_chunk = []
                current_length = 0
                chunk_index += 1
        
        if current_chunk:
            chunks.append(DocumentChunk(
                index=chunk_index,
                text=" ".join(current_chunk)
            ))
        
        return chunks
    
    def _summarize_chunk(self, chunk: DocumentChunk) -> str:
        """Summarize a single chunk (the 'map' step).
        
        Pitfall: Each chunk is summarized in isolation, so context that spans
        chunk boundaries can be lost. Mitigation: use overlapping chunks or
        include a brief description of the document topic in the prompt.
        """
        prompt = f"""Summarize this section of a document.
Capture all key points and important details.

Section {chunk.index + 1}:
{chunk.text}

Summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def _combine_summaries(
        self,
        summaries: list[str],
        final_length: str = "medium"
    ) -> str:
        """Combine chunk summaries into final summary."""
        combined = "\n\n".join([
            f"Section {i+1}: {s}" for i, s in enumerate(summaries)
        ])
        
        length_guide = {
            "brief": "2-3 sentences",
            "medium": "1-2 paragraphs",
            "detailed": "3-4 paragraphs with key details"
        }
        
        prompt = f"""Create a cohesive summary from these section summaries.
Target length: {length_guide.get(final_length, "1-2 paragraphs")}

Section summaries:
{combined}

Final summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def summarize(
        self,
        document: str,
        final_length: str = "medium"
    ) -> dict:
        """Summarize a long document."""
        # Map: chunk and summarize each
        chunks = self._chunk_text(document)
        
        for chunk in chunks:
            chunk.summary = self._summarize_chunk(chunk)
        
        # Reduce: combine summaries
        chunk_summaries = [c.summary for c in chunks]
        final_summary = self._combine_summaries(chunk_summaries, final_length)
        
        return {
            "final_summary": final_summary,
            "chunk_count": len(chunks),
            "chunk_summaries": chunk_summaries
        }


# Usage
summarizer = LongDocumentSummarizer(chunk_size=2000)

# Long document (imagine this is much longer)
document = """[Long document text here...]"""

result = summarizer.summarize(document, final_length="medium")
print(f"Processed {result['chunk_count']} chunks")
print(f"\nFinal Summary:\n{result['final_summary']}")

Hierarchical Summarization

from openai import OpenAI
from dataclasses import dataclass, field


@dataclass
class Section:
    """A document section with hierarchy."""
    title: str
    content: str
    level: int
    children: list = field(default_factory=list)
    summary: str = None


class HierarchicalSummarizer:
    """Summarize documents with hierarchical structure."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def parse_sections(self, document: str) -> list[Section]:
        """Parse document into sections using LLM."""
        prompt = f"""Analyze this document and identify its sections.
For each section, identify the title, content, and hierarchy level.

Document:
{document[:4000]}...

Return as JSON:
{{
    "sections": [
        {{"title": "Section Title", "content": "section content...", "level": 1}}
    ]
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        import json
        data = json.loads(response.choices[0].message.content)
        
        return [
            Section(
                title=s["title"],
                content=s["content"],
                level=s["level"]
            )
            for s in data.get("sections", [])
        ]
    
    def summarize_section(
        self,
        section: Section,
        child_summaries: list[str] = None
    ) -> str:
        """Summarize a section, incorporating child summaries if present."""
        context = ""
        if child_summaries:
            context = "\n\nSubsection summaries:\n" + "\n".join(child_summaries)
        
        prompt = f"""Summarize this section of a document.
{context}

Section: {section.title}
Content: {section.content}

Summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content
    
    def summarize_document(self, document: str) -> dict:
        """Create hierarchical summary of document."""
        sections = self.parse_sections(document)
        
        # Summarize from leaves up
        for section in reversed(sections):
            child_summaries = [c.summary for c in section.children if c.summary]
            section.summary = self.summarize_section(section, child_summaries)
        
        # Create executive summary from top-level sections
        top_level = [s for s in sections if s.level == 1]
        
        exec_prompt = f"""Create an executive summary from these section summaries:

{chr(10).join(f"- {s.title}: {s.summary}" for s in top_level)}

Executive Summary:"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": exec_prompt}]
        )
        
        return {
            "executive_summary": response.choices[0].message.content,
            "sections": [
                {"title": s.title, "summary": s.summary}
                for s in sections
            ]
        }

Meeting Summarization

Meeting summarization is one of the highest-ROI applications of LLMs. A 30-minute meeting produces a transcript that nobody reads. A structured summary with decisions, action items, and owners turns that dead text into something actionable. The key insight: structure the output as a JSON schema so downstream systems can automatically create tickets, calendar events, and follow-up emails.
from openai import OpenAI
from dataclasses import dataclass
import json


@dataclass
class MeetingSummary:
    """Structured meeting summary -- designed to be actionable, not just informative."""
    title: str
    date: str
    attendees: list[str]
    summary: str
    key_points: list[str]
    decisions: list[str]            # What was decided (the outcomes)
    action_items: list[dict]        # Who does what by when (the commitments)
    next_steps: list[str]           # What happens after this meeting


class MeetingSummarizer:
    """Summarize meeting transcripts."""
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
    
    def summarize(self, transcript: str, meeting_info: dict = None) -> MeetingSummary:
        """Create a comprehensive meeting summary."""
        meeting_info = meeting_info or {}
        
        prompt = f"""Analyze this meeting transcript and create a comprehensive summary.

Meeting Info:
- Title: {meeting_info.get('title', 'Unknown')}
- Date: {meeting_info.get('date', 'Unknown')}
- Attendees: {', '.join(meeting_info.get('attendees', ['Unknown']))}

Transcript:
{transcript}

Provide a detailed analysis as JSON:
{{
    "summary": "2-3 paragraph summary of the meeting",
    "key_points": ["main points discussed"],
    "decisions": ["decisions that were made"],
    "action_items": [
        {{"task": "description", "assignee": "person", "due_date": "if mentioned"}}
    ],
    "next_steps": ["follow-up items and next steps"],
    "topics_discussed": ["list of topics covered"]
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        data = json.loads(response.choices[0].message.content)
        
        return MeetingSummary(
            title=meeting_info.get("title", "Untitled Meeting"),
            date=meeting_info.get("date", "Unknown"),
            attendees=meeting_info.get("attendees", []),
            summary=data.get("summary", ""),
            key_points=data.get("key_points", []),
            decisions=data.get("decisions", []),
            action_items=data.get("action_items", []),
            next_steps=data.get("next_steps", [])
        )
    
    def format_as_markdown(self, summary: MeetingSummary) -> str:
        """Format meeting summary as markdown."""
        lines = [
            f"# {summary.title}",
            f"**Date:** {summary.date}",
            f"**Attendees:** {', '.join(summary.attendees)}",
            "",
            "## Summary",
            summary.summary,
            "",
            "## Key Points",
        ]
        
        for point in summary.key_points:
            lines.append(f"- {point}")
        
        lines.extend(["", "## Decisions"])
        for decision in summary.decisions:
            lines.append(f"- {decision}")
        
        lines.extend(["", "## Action Items"])
        for item in summary.action_items:
            assignee = item.get("assignee", "Unassigned")
            due = f" (Due: {item['due_date']})" if item.get("due_date") else ""
            lines.append(f"- [ ] **{assignee}**: {item['task']}{due}")
        
        lines.extend(["", "## Next Steps"])
        for step in summary.next_steps:
            lines.append(f"- {step}")
        
        return "\n".join(lines)


# Usage
summarizer = MeetingSummarizer()

transcript = """
Alice: Good morning everyone. Let's start with the Q3 planning meeting.
Bob: Thanks Alice. I wanted to discuss the new feature rollout timeline.
Alice: Sure. We need to finalize the launch date for the mobile app update.
Bob: I propose we aim for October 15th. That gives us three weeks for testing.
Carol: I agree with that timeline. My team can handle the QA by then.
Alice: Great, so we're agreed on October 15th. Bob, can you update the roadmap?
Bob: Will do. I'll have that ready by end of day tomorrow.
Carol: I'll also need design specs from David's team by next Monday.
Alice: David, can you make that happen?
David: Yes, I'll prioritize it. We'll have the designs ready by Monday morning.
Alice: Perfect. Any other items before we wrap up?
Bob: Just a reminder about the stakeholder meeting next Thursday.
Alice: Thanks Bob. Let's adjourn and reconvene next week for a progress check.
"""

summary = summarizer.summarize(
    transcript,
    {
        "title": "Q3 Planning Meeting",
        "date": "September 25, 2024",
        "attendees": ["Alice", "Bob", "Carol", "David"]
    }
)

markdown = summarizer.format_as_markdown(summary)
print(markdown)

Streaming Summarization

from openai import OpenAI


def stream_summary(
    text: str,
    max_length: int = 200
) -> str:
    """Stream a summary as it's generated."""
    client = OpenAI()
    
    prompt = f"""Summarize this text in approximately {max_length} words:

{text}

Summary:"""
    
    stream = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        stream=True
    )
    
    full_response = ""
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            content = chunk.choices[0].delta.content
            print(content, end="", flush=True)
            full_response += content
    
    print()  # Newline at end
    return full_response


class ProgressiveSummarizer:
    """Generate summaries at multiple detail levels progressively."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def summarize_progressive(
        self,
        text: str,
        levels: list[str] = None
    ):
        """Generate summaries at increasing detail levels."""
        levels = levels or ["one-sentence", "paragraph", "detailed"]
        
        level_prompts = {
            "one-sentence": "Summarize in exactly one sentence.",
            "paragraph": "Summarize in one paragraph (3-5 sentences).",
            "detailed": "Provide a detailed summary with key points (2-3 paragraphs)."
        }
        
        for level in levels:
            instruction = level_prompts.get(level, level)
            
            print(f"\n--- {level.upper()} ---")
            
            stream = self.client.chat.completions.create(
                model=self.model,
                messages=[{
                    "role": "user",
                    "content": f"{instruction}\n\nText:\n{text}"
                }],
                stream=True
            )
            
            for chunk in stream:
                if chunk.choices[0].delta.content:
                    print(chunk.choices[0].delta.content, end="", flush=True)
            
            print("\n")
            yield level


# Usage
summarizer = ProgressiveSummarizer()

text = """[Your long text here]"""

# Generate progressive summaries
for level in summarizer.summarize_progressive(text):
    pass  # Summaries are printed as they stream

Content-Specific Summarization

from openai import OpenAI
import json


class ContentSummarizer:
    """Summarize different types of content with specialized prompts."""
    
    def __init__(self, model: str = "gpt-4o-mini"):
        self.client = OpenAI()
        self.model = model
    
    def summarize_news(self, article: str) -> dict:
        """Summarize a news article."""
        prompt = f"""Summarize this news article:

{article}

Provide as JSON:
{{
    "headline": "suggested headline",
    "summary": "2-3 sentence summary",
    "key_facts": ["list of key facts"],
    "who": "key people/organizations involved",
    "what": "what happened",
    "when": "when it happened",
    "where": "where it happened",
    "why": "why it matters"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def summarize_research(self, paper: str) -> dict:
        """Summarize a research paper or technical document."""
        prompt = f"""Summarize this research document:

{paper}

Provide as JSON:
{{
    "title": "paper title if identifiable",
    "abstract": "brief abstract/overview",
    "problem": "problem being addressed",
    "methodology": "approach/methods used",
    "findings": ["key findings"],
    "contributions": ["main contributions"],
    "limitations": ["noted limitations"],
    "future_work": ["suggested future directions"],
    "practical_implications": "real-world applications"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def summarize_email_thread(self, emails: list[dict]) -> dict:
        """Summarize an email thread."""
        thread_text = "\n\n".join([
            f"From: {e['from']}\nTo: {e['to']}\nDate: {e['date']}\n\n{e['body']}"
            for e in emails
        ])
        
        prompt = f"""Summarize this email thread:

{thread_text}

Provide as JSON:
{{
    "subject": "thread subject",
    "summary": "brief summary of the discussion",
    "participants": ["list of participants"],
    "key_points": ["main points discussed"],
    "requests": ["any requests made"],
    "decisions": ["any decisions reached"],
    "pending_items": ["items still pending"],
    "sentiment": "overall tone of the thread"
}}"""
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def summarize_code(self, code: str, language: str = "python") -> dict:
        """Summarize code functionality."""
        prompt = f"""Summarize this {language} code:

```python
{code}
```

Provide as JSON:
{{
    "purpose": "what the code does",
    "summary": "brief functional summary",
    "main_components": ["key functions/classes"],
    "inputs": ["expected inputs"],
    "outputs": ["expected outputs"],
    "dependencies": ["external dependencies used"],
    "complexity": "simple/moderate/complex",
    "potential_issues": ["any noted concerns"]
}}""" 
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)


# Usage
summarizer = ContentSummarizer()

# News article
news_summary = summarizer.summarize_news("...")
print(f"Headline: {news_summary['headline']}")
print(f"Summary: {news_summary['summary']}")

# Research paper
research_summary = summarizer.summarize_research("...")
print(f"Problem: {research_summary['problem']}")
print(f"Findings: {research_summary['findings']}")
Summarization Best Practices
  • Choose extractive when accuracy and verifiability matter (legal, medical, compliance). The exact words from the source can be traced back.
  • Use abstractive when readability matters more (newsletters, executive briefs). The model synthesizes ideas into flowing prose.
  • Apply map-reduce for documents exceeding context limits. But beware: information that spans chunk boundaries can be lost. Use overlapping chunks to mitigate.
  • Tailor prompts to content type — a news summary needs who/what/when/where; a research summary needs methodology/findings/limitations. Generic “summarize this” prompts produce generic results.
  • Validate summaries against source — LLMs can hallucinate facts during summarization. For high-stakes use cases, programmatically check that key claims appear in the original text.
  • Pitfall: Models tend to over-represent information at the beginning and end of long documents (primacy/recency bias). Map-reduce mitigates this by giving each section equal treatment.

Practice Exercise

Build a summarization service that:
  1. Handles multiple content types (articles, meetings, code)
  2. Supports variable length outputs
  3. Preserves key information and accuracy
  4. Provides structured output with metadata
  5. Scales to long documents using chunking
Focus on:
  • Accuracy preservation in summaries
  • Appropriate level of detail
  • Coherent multi-chunk summarization
  • Specialized handling for different content types

Interview Deep-Dive

Strong Answer:
  • Map-reduce for summarization works like this: split the document into chunks that fit the context window, summarize each chunk independently (the “map” step), then combine all chunk summaries into a final coherent summary (the “reduce” step). It is the same distributed computing concept applied to text processing — break the problem into independent subproblems, solve each, then merge results.
  • The fundamental information loss happens at chunk boundaries. If a paragraph that starts on chunk 3 and concludes on chunk 4 contains a key argument, neither chunk summarizer sees the complete argument. Chunk 3’s summary captures the premise, chunk 4’s summary captures the conclusion, but the logical connection between them is lost. The reduce step cannot reconstruct this because it only sees the chunk summaries, not the original text.
  • Three mitigation strategies I have used in production. First, overlapping chunks: include 100-200 tokens of overlap between adjacent chunks so boundary content appears in both chunks. The trade-off is increased token cost (you are embedding and summarizing the overlap twice) and potential duplication in the chunk summaries. Second, semantic chunking: instead of splitting at fixed character counts, split at natural semantic boundaries — paragraph breaks, section headers, topic transitions. This way, complete ideas are more likely to be contained within a single chunk. Third, context injection: before summarizing each chunk, prepend a brief description of the document topic and what the previous chunk covered. This gives the chunk summarizer context about where it sits in the document, helping it prioritize boundary-spanning information.
  • There is also a hierarchical alternative to flat map-reduce. Instead of one map step and one reduce step, do multiple levels: summarize chunks into section summaries, section summaries into part summaries, part summaries into the final summary. This works better for very long documents (100+ pages) where a single reduce step has to compress too much information from too many chunk summaries.
Follow-up: For a 200-page document, your map-reduce pipeline makes 40 LLM calls (30 chunk summaries + 10 for the hierarchical reduce). This costs 2perdocumentandtakes3minutes.Theproductteamwantsitunder2 per document and takes 3 minutes. The product team wants it under 0.50 and 30 seconds. How do you optimize?Three levers to pull. First, use gpt-4o-mini instead of gpt-4o for the chunk summarization step. The map step is mechanical — summarize this chunk — and does not require the reasoning power of a large model. This alone cuts cost by 10-20x. Reserve gpt-4o for only the final reduce step where synthesis quality matters. Second, parallelize the map step: all 30 chunk summaries are independent and can run concurrently. With 30 parallel API calls, the map step takes the time of one call (~2 seconds) instead of 30 sequential calls (60 seconds). Third, reduce the number of chunks by increasing chunk size. If the model’s context window supports 128K tokens, you can fit 30-40 pages per chunk instead of 5. This reduces 30 chunks to 6-8, cutting cost and latency proportionally. With these three optimizations combined: 8 parallel chunk summaries with gpt-4o-mini (0.02total)+1finalreducewithgpt4o( 0.02 total) + 1 final reduce with `gpt-4o` (~0.10) = roughly $0.12 total, completing in under 10 seconds. That is well under both targets.
Strong Answer:
  • Raw transcripts are the hardest input for summarization because they violate every assumption about well-structured text. Filler words (“um,” “like,” “you know”) add noise, crosstalk creates garbled overlapping text, missing speaker labels make it impossible to attribute statements, and conversational style means key decisions are buried in casual phrasing: “yeah I think we should just go with option B” is a critical decision that looks like throwaway commentary.
  • My pipeline would have three stages. Stage one: transcript cleaning. Use an LLM to remove filler words, fix obvious transcription errors, and add paragraph breaks at topic transitions. This is a cheap preprocessing step using gpt-4o-mini that dramatically improves downstream extraction quality. Do not attempt speaker diarization at this stage — it is a separate, harder problem.
  • Stage two: structured extraction using a Pydantic schema. Define a model with fields for key_topics: List[str], decisions: List[Decision] (where Decision has description, made_by, context), action_items: List[ActionItem] (with task, assignee, due_date, context), and open_questions: List[str]. Use the cleaned transcript as input. The key prompt engineering insight: instruct the model to distinguish between “things that were discussed” and “things that were decided.” Meetings discuss many options but only decide on a few, and most summarizers conflate the two.
  • Stage three: validation and formatting. For each action item, verify that the assignee name actually appears in the transcript (prevents hallucinated assignees). For each decision, verify that the decision text is supported by the transcript content. Format the output as structured markdown with clear sections.
  • The speaker attribution problem: if speaker labels are unavailable, the model can often infer speakers from context (“Thanks Alice” reveals the previous speaker was Alice, “I’ll handle the design work” combined with context about who does design reveals the speaker). Include a prompt instruction: “Infer speaker identity where possible from contextual clues. Mark as ‘Unknown Speaker’ when identity cannot be determined.”
Follow-up: The meeting is 90 minutes long and the transcript is 25,000 tokens — too long for a single context window in some models but within range for others. Do you still use map-reduce, or stuff it all in one call?If the transcript fits within the context window with room for the system prompt and output, I would stuff it in one call rather than map-reduce. The reason: meeting transcripts have high information density at topic transitions, and chunking risks splitting a decision discussion across chunks. A single-call approach preserves the full conversational flow, making it easier for the model to correctly attribute action items to people and connect decisions to the discussions that led to them. However, I would add one guard: if the transcript is in the bottom third of the context window capacity (say, 25K tokens in a 128K window), use one call. If it fills more than 60% of the window, the model’s attention quality degrades on middle content, and map-reduce with overlapping chunks becomes the safer choice. The test is simple: run both approaches on 20 representative meetings and compare the extraction quality of action items and decisions. In my experience, single-call wins up to about 40K tokens of transcript, and map-reduce wins above that.