Kafka Fundamentals
Master the core concepts of Apache Kafka event streaming and understand its distributed architecture.What is Kafka?
Apache Kafka is a distributed event streaming platform. Unlike traditional message queues (like RabbitMQ), Kafka is designed to handle massive streams of events, store them durably, and process them in real-time.Event Streaming
Continuous flow of data (events) as they happen
Distributed
Runs as a cluster of servers (brokers)
Durable
Stores events on disk for a configurable retention period
Scalable
Handles trillions of events per day
Core Architecture
1. Events (Messages)
An event records the fact that “something happened”.- Key: Optional, used for partitioning (e.g.,
user_123) - Value: The data payload (e.g., JSON
{"action": "login"}) - Timestamp: When it happened
- Headers: Optional metadata
2. Topics
A Topic is a logical category or feed name to which records are published.- Analogous to a table in a database or a folder in a filesystem.
- Multi-subscriber: Can have zero, one, or many consumers.
- Append-only: New events are always added to the end.
3. Partitions
Topics are split into Partitions.- Scalability: Partitions allow a topic to be spread across multiple servers.
- Ordering: Order is guaranteed only within a partition, not across the entire topic.
- Offset: Each message in a partition has a unique ID called an offset.
4. Brokers
A Kafka server is called a Broker.- Receives messages from producers.
- Assigns offsets.
- Commits messages to disk storage.
- Serves fetch requests from consumers.
- A Cluster consists of multiple brokers working together.
5. Replication
Kafka replicates partitions across multiple brokers for fault tolerance.- Replication Factor: Number of copies (usually 3).
- Leader: One broker is the leader for a partition; handles all reads/writes.
- Followers: Replicate data from the leader. If the leader fails, a follower becomes the new leader.
Producers & Consumers
Producers
Applications that publish (write) events to Kafka topics.- Partitioning Strategy: Decides which partition a message goes to.
- Round-robin: If no key is provided (load balancing).
- Hash-based: If key is provided (same key always goes to same partition).
Consumers
Applications that subscribe to (read) events from Kafka topics.- Consumer Groups: A set of consumers working together to consume a topic.
- Each partition is consumed by only one consumer in the group.
- Allows parallel processing of a topic.
- Offsets: Consumers track their progress by committing offsets.
Kafka vs RabbitMQ (Deep Dive)
| Feature | Apache Kafka | RabbitMQ |
|---|---|---|
| Design | Distributed Commit Log | Traditional Message Broker |
| Message Retention | Policy-based (e.g., 7 days), durable | Deleted after consumption (usually) |
| Throughput | Extremely High (Millions/sec) | High (Thousands/sec) |
| Ordering | Guaranteed per partition | Guaranteed per queue |
| Consumption | Pull-based (Consumer polls) | Push-based (Broker pushes) |
| Use Case | Event streaming, Log aggregation, Analytics | Complex routing, Task queues |
The Power of Pull-Based Consumption
Kafka’s pull-based model is a key architectural decision that distinguishes it from traditional push-based messaging systems.Consumer Control
Consumer Control
Consumers fetch messages at their own pace. Fast consumers aren’t held back by slow ones, and slow consumers aren’t overwhelmed by a flood of messages (backpressure is inherent).
Batching Efficiency
Batching Efficiency
Consumers can pull large batches of messages in a single request, significantly reducing network overhead and improving throughput (IOPS).
Rewind & Replay
Rewind & Replay
Since the broker doesn’t track “who read what” (consumers track their own offsets), consumers can easily rewind to an old offset and re-process past events. This is crucial for:
- Recovering from errors
- Testing new processing logic on old data
- Training ML models
ZooKeeper vs KRaft (Critical for Interviews!)
Kafka has traditionally relied on ZooKeeper for cluster coordination. KRaft (Kafka Raft) is the new consensus protocol that removes this dependency.ZooKeeper Mode (Legacy)
ZooKeeper responsibilities:- Broker registration and discovery
- Controller election
- Topic configuration storage
- ACLs and quotas
KRaft Mode (New Standard - Kafka 3.3+)
| Aspect | ZooKeeper | KRaft |
|---|---|---|
| Architecture | Separate cluster | Integrated with Kafka |
| Latency | Higher (two systems) | Lower (single system) |
| Scalability | Limited by ZK | Millions of partitions |
| Operational Complexity | Two systems to manage | Single system |
| Production Ready | Yes | Yes (Kafka 3.6+) |
Partition Internals Deep Dive
Understanding partition internals is crucial for performance tuning and interviews.Log Segments
Each partition is stored as a series of log segments:How Writes Work
- Producer sends message to partition leader
- Leader appends to active segment file (sequential I/O - very fast!)
- Leader replicates to followers (if acks=all)
- Leader responds to producer with offset
How Reads Work
- Consumer requests offset range
- Broker uses
.indexfile to find segment - Broker does sequential read from segment
- Returns batch of messages
In-Sync Replicas (ISR) Deep Dive
ISR is one of the most important concepts for durability.What is ISR?
The set of replicas that are “caught up” with the leader. A replica is removed from ISR if:- It falls behind by more than
replica.lag.time.max.ms(default 30s) - It’s disconnected from ZooKeeper/controllers
ISR Configuration
| Config | Description | Default |
|---|---|---|
min.insync.replicas | Minimum ISRs for write to succeed | 1 |
replica.lag.time.max.ms | Max lag before removal from ISR | 30000 |
Data Loss Scenarios
acks=1, leader fails
acks=1, leader fails
Scenario: Producer gets ack after leader writes, but leader crashes before replicating.
Result: Data loss when new leader is elected.
Fix: Use
acks=allAll ISRs fail simultaneously
All ISRs fail simultaneously
Scenario: All in-sync replicas fail at once.
Result: Either wait for ISR to recover, or allow non-ISR leader (data loss).
Config:
unclean.leader.election.enable=false (default)min.insync.replicas=1 with RF=2
min.insync.replicas=1 with RF=2
Scenario: Two replicas, one fails, producer still writes with acks=all.
Result: Only one copy exists. If it fails, data is lost.
Fix: RF=3, min.insync.replicas=2
The Golden Rule
Interview Questions & Answers
How does Kafka achieve high throughput?
How does Kafka achieve high throughput?
- Sequential I/O: Append-only writes, no random disk seeks
- OS Page Cache: Uses filesystem cache for zero-copy reads
- Batching: Groups messages to reduce network/disk overhead
- Compression: Reduces network bandwidth
- Partitioning: Parallelism across brokers
What happens when a broker fails?
What happens when a broker fails?
- Controller detects broker failure (via heartbeats)
- For each partition where failed broker was leader:
- Controller elects new leader from ISR
- Updates metadata in all brokers
- Producers/consumers get metadata update and reconnect
- Data is safe if replica was in ISR
How do you choose the number of partitions?
How do you choose the number of partitions?
Formula:
partitions = max(throughput/producer_throughput, throughput/consumer_throughput)Considerations:- More partitions = more parallelism
- More partitions = more memory/file handles
- Can’t decrease partitions (only increase)
- Each partition = at most one consumer in a group
What is the difference between a Topic and a Partition?
What is the difference between a Topic and a Partition?
- Topic: Logical category/feed (like a table)
- Partition: Physical subdivision of a topic (like a shard)
- Scalability (spread across brokers)
- Parallelism (multiple consumers)
- Ordering (guaranteed within partition)
How does Kafka handle consumer failures?
How does Kafka handle consumer failures?
- Consumer stops sending heartbeats
- After
session.timeout.ms, consumer is considered dead - Rebalance is triggered in the consumer group
- Partitions are reassigned to remaining consumers
- New consumer starts from last committed offset
What is the Controller in Kafka?
What is the Controller in Kafka?
One broker is elected as the Controller:
- Monitors broker liveness
- Elects partition leaders when brokers fail
- Updates cluster metadata
- Manages partition reassignments
Common Pitfalls
Installation & Quick Start
- Docker (Recommended)
- Local Install
Using Confluent Platform (includes Zookeeper/Kraft and tools):Run:
docker-compose up -dCLI Power User Commands
Topic Management
Producing & Consuming
Consumer Groups
Key Takeaways
- Topics are logs of events, divided into Partitions.
- Partitions allow Kafka to scale and guarantee ordering.
- Brokers form a cluster to provide durability and availability.
- Consumer Groups allow parallel processing of topics.
- Kafka is pull-based and stores data for a set retention period.
Next: Kafka Producers & Consumers →