Summarization is a core AI capability used across applications from meeting notes to content curation. It sounds simple — “make this shorter” — but production summarization requires solving hard trade-offs: how much detail to preserve, how to handle documents longer than the context window, and how to avoid the model quietly inventing facts that weren’t in the source. The two fundamental approaches are extractive (pull out the most important sentences verbatim) and abstractive (generate new text that captures the key ideas). Extractive is safer for legal and compliance contexts; abstractive produces more readable summaries. Most production systems combine both.Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Summarization Strategy Comparison
| Strategy | Faithfulness | Readability | Speed | Best For |
|---|---|---|---|---|
| Extractive | Highest (verbatim) | Lower (choppy) | Fast | Legal, compliance, audit trails |
| Abstractive | Medium (hallucination risk) | Highest (natural prose) | Medium | Newsletters, executive briefs |
| Hybrid (extract then rephrase) | High | High | Slower | Most production use cases |
| Map-Reduce | Medium (cross-chunk loss) | High | Slow (multiple LLM calls) | Documents exceeding context window |
| Hierarchical | High (preserves structure) | High | Slowest | Structured reports, books, legal filings |
- Document fits in context window (under 100K tokens)? Use a single abstractive call — simpler, faster, and the model sees everything.
- Document exceeds context window? Use map-reduce with 10-15% chunk overlap to prevent info loss at boundaries.
- Accuracy is critical (legal, medical)? Use extractive or hybrid. Traceable quotes are worth the readability trade-off.
- Multiple document types? Use content-specific prompts (news, research, meeting, code). Generic “summarize this” produces generic results.
Basic Summarization
Simple Text Summarization
Extractive vs Abstractive Summarization
Long Document Summarization
When a document exceeds the model’s context window (or is large enough that stuffing it all in produces poor results), you need a chunking strategy. The map-reduce pattern is borrowed from distributed computing: split the problem into independent pieces (map), solve each piece, then combine the results (reduce). It’s the same idea as dividing a 500-page book among 10 interns, having each write a chapter summary, then writing an executive summary from those summaries.Map-Reduce Pattern
Hierarchical Summarization
Meeting Summarization
Meeting summarization is one of the highest-ROI applications of LLMs. A 30-minute meeting produces a transcript that nobody reads. A structured summary with decisions, action items, and owners turns that dead text into something actionable. The key insight: structure the output as a JSON schema so downstream systems can automatically create tickets, calendar events, and follow-up emails.Streaming Summarization
Content-Specific Summarization
- Choose extractive when accuracy and verifiability matter (legal, medical, compliance). The exact words from the source can be traced back.
- Use abstractive when readability matters more (newsletters, executive briefs). The model synthesizes ideas into flowing prose.
- Apply map-reduce for documents exceeding context limits. But beware: information that spans chunk boundaries can be lost. Use overlapping chunks to mitigate.
- Tailor prompts to content type — a news summary needs who/what/when/where; a research summary needs methodology/findings/limitations. Generic “summarize this” prompts produce generic results.
- Validate summaries against source — LLMs can hallucinate facts during summarization. For high-stakes use cases, programmatically check that key claims appear in the original text.
- Pitfall: Models tend to over-represent information at the beginning and end of long documents (primacy/recency bias). Map-reduce mitigates this by giving each section equal treatment.
Practice Exercise
Build a summarization service that:- Handles multiple content types (articles, meetings, code)
- Supports variable length outputs
- Preserves key information and accuracy
- Provides structured output with metadata
- Scales to long documents using chunking
- Accuracy preservation in summaries
- Appropriate level of detail
- Coherent multi-chunk summarization
- Specialized handling for different content types
Interview Deep-Dive
You are building a summarization system for legal contracts. A lawyer tells you that your summaries occasionally include clauses that do not exist in the original document. How do you diagnose and fix hallucination in summarization?
You are building a summarization system for legal contracts. A lawyer tells you that your summaries occasionally include clauses that do not exist in the original document. How do you diagnose and fix hallucination in summarization?
- Hallucination in summarization is particularly dangerous because it looks correct — the invented clause is written in the same legal style as the real ones, so a human reviewer might not catch it unless they cross-reference every sentence against the source. This is different from general LLM hallucination because the source document is right there in the context window, yet the model still invents content.
- Diagnosis first: implement automated hallucination detection. For each sentence in the summary, use either a natural language inference (NLI) model or a secondary LLM call to check: “Is this claim supported by the source document?” Score each sentence as supported, partially supported, or unsupported. Run this on your last 1,000 summaries. You will typically find that 3-8% of summary sentences contain some fabrication, and the rate increases with document length and complexity.
- Root cause: hallucination in summarization happens for three main reasons. First, the model is trained to produce fluent, complete-sounding text, so when it encounters a gap in its understanding, it fills it rather than acknowledging uncertainty. Second, long documents exceed the model’s effective attention span — even within the context window, the model attends less to content in the middle of very long inputs (the “lost in the middle” problem). Third, abstractive summarization by definition asks the model to generate new text, which gives it license to paraphrase loosely enough that meaning shifts.
- Fixes, in order of effectiveness: Switch to extractive summarization for legal use cases. Pull exact sentences from the source and present them as the summary. You lose readability but gain verifiability — every sentence can be traced back to the source with a simple string search. If extractive is too rigid, use a hybrid approach: extractive for key clauses and obligations, abstractive only for the high-level overview paragraph. Add a citation requirement: instruct the model to include section references for every claim (e.g., “Per Section 3.2, the termination clause requires 30 days notice”). Claims the model cannot cite are flagged for review. Finally, reduce context length per summarization call by using map-reduce: summarize each section independently where the full section fits easily in the attention window, then combine the section summaries.
Explain the map-reduce pattern for long document summarization. What information is lost at the chunk boundaries, and how do you mitigate that?
Explain the map-reduce pattern for long document summarization. What information is lost at the chunk boundaries, and how do you mitigate that?
- Map-reduce for summarization works like this: split the document into chunks that fit the context window, summarize each chunk independently (the “map” step), then combine all chunk summaries into a final coherent summary (the “reduce” step). It is the same distributed computing concept applied to text processing — break the problem into independent subproblems, solve each, then merge results.
- The fundamental information loss happens at chunk boundaries. If a paragraph that starts on chunk 3 and concludes on chunk 4 contains a key argument, neither chunk summarizer sees the complete argument. Chunk 3’s summary captures the premise, chunk 4’s summary captures the conclusion, but the logical connection between them is lost. The reduce step cannot reconstruct this because it only sees the chunk summaries, not the original text.
- Three mitigation strategies I have used in production. First, overlapping chunks: include 100-200 tokens of overlap between adjacent chunks so boundary content appears in both chunks. The trade-off is increased token cost (you are embedding and summarizing the overlap twice) and potential duplication in the chunk summaries. Second, semantic chunking: instead of splitting at fixed character counts, split at natural semantic boundaries — paragraph breaks, section headers, topic transitions. This way, complete ideas are more likely to be contained within a single chunk. Third, context injection: before summarizing each chunk, prepend a brief description of the document topic and what the previous chunk covered. This gives the chunk summarizer context about where it sits in the document, helping it prioritize boundary-spanning information.
- There is also a hierarchical alternative to flat map-reduce. Instead of one map step and one reduce step, do multiple levels: summarize chunks into section summaries, section summaries into part summaries, part summaries into the final summary. This works better for very long documents (100+ pages) where a single reduce step has to compress too much information from too many chunk summaries.
gpt-4o-mini instead of gpt-4o for the chunk summarization step. The map step is mechanical — summarize this chunk — and does not require the reasoning power of a large model. This alone cuts cost by 10-20x. Reserve gpt-4o for only the final reduce step where synthesis quality matters. Second, parallelize the map step: all 30 chunk summaries are independent and can run concurrently. With 30 parallel API calls, the map step takes the time of one call (~2 seconds) instead of 30 sequential calls (gpt-4o-mini (You are tasked with building a meeting summarization feature. The product team wants action items, decisions, and key topics -- but the input is a raw transcript with filler words, crosstalk, and no speaker labels. What is your approach?
You are tasked with building a meeting summarization feature. The product team wants action items, decisions, and key topics -- but the input is a raw transcript with filler words, crosstalk, and no speaker labels. What is your approach?
- Raw transcripts are the hardest input for summarization because they violate every assumption about well-structured text. Filler words (“um,” “like,” “you know”) add noise, crosstalk creates garbled overlapping text, missing speaker labels make it impossible to attribute statements, and conversational style means key decisions are buried in casual phrasing: “yeah I think we should just go with option B” is a critical decision that looks like throwaway commentary.
- My pipeline would have three stages. Stage one: transcript cleaning. Use an LLM to remove filler words, fix obvious transcription errors, and add paragraph breaks at topic transitions. This is a cheap preprocessing step using
gpt-4o-minithat dramatically improves downstream extraction quality. Do not attempt speaker diarization at this stage — it is a separate, harder problem. - Stage two: structured extraction using a Pydantic schema. Define a model with fields for
key_topics: List[str],decisions: List[Decision](where Decision hasdescription,made_by,context),action_items: List[ActionItem](withtask,assignee,due_date,context), andopen_questions: List[str]. Use the cleaned transcript as input. The key prompt engineering insight: instruct the model to distinguish between “things that were discussed” and “things that were decided.” Meetings discuss many options but only decide on a few, and most summarizers conflate the two. - Stage three: validation and formatting. For each action item, verify that the assignee name actually appears in the transcript (prevents hallucinated assignees). For each decision, verify that the decision text is supported by the transcript content. Format the output as structured markdown with clear sections.
- The speaker attribution problem: if speaker labels are unavailable, the model can often infer speakers from context (“Thanks Alice” reveals the previous speaker was Alice, “I’ll handle the design work” combined with context about who does design reveals the speaker). Include a prompt instruction: “Infer speaker identity where possible from contextual clues. Mark as ‘Unknown Speaker’ when identity cannot be determined.”