Triggers & Watermarks
Module Duration: 3-4 hours
Focus: Advanced timing control and late data handling
Prerequisites: Core Programming Model, Windowing & Time
Overview
Triggers determine when results are materialized for a window. They provide fine-grained control over when computations complete and emit results, enabling you to balance latency, completeness, and cost.Key Concepts
- Trigger: Determines when to emit aggregated results
- Watermark: System’s notion of event time progress
- Early Firing: Emit speculative results before window closes
- On-Time Firing: Emit results when watermark passes window end
- Late Firing: Handle data arriving after watermark
- Accumulation Mode: How to combine multiple firings
Understanding Watermarks
Watermarks are the foundation of Beam’s event-time processing. They represent a heuristic about event time completeness.Watermark Semantics
Definition: A watermark with value T indicates that no more elements with timestamps less than T should arrive. Properties:- Monotonically increasing
- Per-source tracking
- Heuristic-based (not a guarantee)
Watermark Behavior
Watermark Lag
Watermark lag represents how far behind the watermark is from real time.Default Trigger
The default trigger fires when the watermark passes the end of the window.Trigger Types
Beam provides several trigger types that can be combined to create sophisticated timing strategies.AfterWatermark
Fires when the watermark passes the end of the window.AfterProcessingTime
Fires after a certain amount of processing time has elapsed.AfterCount
Fires after a certain number of elements arrive in a pane.Repeatedly
Fires a trigger repeatedly.Early and Late Firings
Combine early, on-time, and late firings for flexible result emission.Early Firing Pattern
Emit speculative results before the watermark passes the window end.Late Firing Pattern
Handle data arriving after the watermark has passed.Complete Early/Late Pattern
Accumulation Modes
Accumulation modes determine how multiple firings of the same window are combined.Discarding Mode
Each firing contains only new data since the last firing.Accumulating Mode
Each firing contains all data seen so far for the window.Accumulating and Retracting
Similar to accumulating, but also emits retractions for previous firings.Composite Triggers
Combine multiple triggers with logical operators.AfterEach (Sequential)
Fire triggers in sequence.AfterAll (Conjunction)
Fire when all triggers have fired.AfterAny (Disjunction)
Fire when any trigger fires.Allowed Lateness
Control how long to keep window state after the watermark passes.Real-World Use Cases
Low-Latency Dashboard
Provide quick updates with early firings, refine with late data.Session Analysis with Late Data
Handle user sessions with realistic late-arrival handling.Trading Analytics
High-frequency updates with precise control.Best Practices
Choosing Trigger Strategies
Use Default Trigger When:- Latency is not critical
- You can wait for watermark
- Data completeness is more important than speed
- Low latency is required
- Users expect real-time updates
- Approximate results are acceptable
- Data frequently arrives late
- You need to update results
- Completeness is critical
Balancing Latency and Completeness
Resource Management
Summary
In this module, you learned:- Watermark semantics and behavior
- Trigger types: AfterWatermark, AfterProcessingTime, AfterCount
- Early, on-time, and late firings
- Accumulation modes: discarding vs accumulating
- Composite triggers for complex timing logic
- Allowed lateness for state management
- Real-world trigger strategies
Key Takeaways
- Triggers control when to emit results for a window
- Watermarks track event time progress
- Early firings reduce latency, late firings improve completeness
- Accumulation mode determines how multiple firings combine
- Allowed lateness balances completeness and resource usage
- Choose trigger strategy based on latency/completeness requirements