Kafka Streams
Build real-time stream processing applications with the Kafka Streams API and ksqlDB.What is Kafka Streams?
Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters.Library, not Cluster
Runs in your application (Java/Scala), not on Kafka brokers
Scalable
Elastic scaling based on partitions
Stateful
Built-in state management (RocksDB)
Exactly-Once
Guaranteed processing semantics
Core Concepts
Streams vs Tables
- KStream: An infinite stream of records (insert-only).
- Example: Credit card transactions.
- KTable: A changelog stream (upsert). Represents the current state.
- Example: User account balances.
Topology
A graph of processing nodes (sources, processors, sinks).Kafka Streams API (Java)
Stateless Transformations
Operations that don’t require memory of previous events (e.g., filter, map).Stateful Transformations
Operations that require state (e.g., count, aggregate, join).ksqlDB: Streaming SQL
ksqlDB allows you to write stream processing applications using SQL syntax.Create a Stream
Create a Table (State)
Stream-Table Join
Windowed Aggregation
Architecture
Partitioning & Scaling
Kafka Streams automatically handles load balancing. If you run multiple instances of your application, they will share the partitions of the input topics.State Stores
Stateful operations (likecount()) use local RocksDB instances to store state on disk. This state is also backed up to a Kafka “changelog topic” for fault tolerance.
Use Cases
- Real-time Fraud Detection: Filter and analyze transaction streams.
- Enrichment: Join data streams with static data (e.g., user profiles).
- Monitoring: Aggregate logs and metrics in real-time.
- ETL: Transform data before loading into a data warehouse.
Next: Kafka Operations →