Kafka Fundamentals
What is Kafka?
Core Architecture
1. Events (Messages)
2. Topics
3. Partitions
4. Brokers
5. Replication
Producers & Consumers
Producers
Consumers
Kafka vs RabbitMQ (Deep Dive)
The Power of Pull-Based Consumption
ZooKeeper vs KRaft (Critical for Interviews!)
ZooKeeper Mode (Legacy)
KRaft Mode (New Standard - Kafka 3.3+)
Partition Internals Deep Dive
Log Segments
How Writes Work
How Reads Work
In-Sync Replicas (ISR) Deep Dive
What is ISR?
ISR Configuration
Data Loss Scenarios
The Golden Rule
Interview Questions & Answers
Common Pitfalls
Installation & Quick Start
CLI Power User Commands
Topic Management
Producing & Consuming
Consumer Groups
Key Takeaways

Kafka Fundamentals

Master the core concepts of Apache Kafka event streaming and understand its distributed architecture.

What is Kafka?

Apache Kafka is a distributed event streaming platform. Unlike traditional message queues (like RabbitMQ), Kafka is designed to handle massive streams of events, store them durably, and process them in real-time.

Event Streaming

Continuous flow of data (events) as they happen

Distributed

Runs as a cluster of servers (brokers)

Durable

Stores events on disk for a configurable retention period

Scalable

Handles trillions of events per day

Core Architecture

1. Events (Messages)

An event records the fact that “something happened”.

Key: Optional, used for partitioning (e.g., user_123)
Value: The data payload (e.g., JSON {"action": "login"})
Timestamp: When it happened
Headers: Optional metadata

2. Topics

A Topic is a logical category or feed name to which records are published.

Analogous to a table in a database or a folder in a filesystem.
Multi-subscriber: Can have zero, one, or many consumers.
Append-only: New events are always added to the end.

3. Partitions

Topics are split into Partitions.

Scalability: Partitions allow a topic to be spread across multiple servers.
Ordering: Order is guaranteed only within a partition, not across the entire topic.
Offset: Each message in a partition has a unique ID called an offset.

4. Brokers

A Kafka server is called a Broker.

Receives messages from producers.
Assigns offsets.
Commits messages to disk storage.
Serves fetch requests from consumers.
A Cluster consists of multiple brokers working together.

5. Replication

Kafka replicates partitions across multiple brokers for fault tolerance.

Replication Factor: Number of copies (usually 3).
Leader: One broker is the leader for a partition; handles all reads/writes.
Followers: Replicate data from the leader. If the leader fails, a follower becomes the new leader.

Producers & Consumers

Producers

Applications that publish (write) events to Kafka topics.

Partitioning Strategy: Decides which partition a message goes to.
- Round-robin: If no key is provided (load balancing).
- Hash-based: If key is provided (same key always goes to same partition).

Consumers

Applications that subscribe to (read) events from Kafka topics.

Consumer Groups: A set of consumers working together to consume a topic.
- Each partition is consumed by only one consumer in the group.
- Allows parallel processing of a topic.
Offsets: Consumers track their progress by committing offsets.

Kafka vs RabbitMQ (Deep Dive)

Feature	Apache Kafka	RabbitMQ
Design	Distributed Commit Log	Traditional Message Broker
Message Retention	Policy-based (e.g., 7 days), durable	Deleted after consumption (usually)
Throughput	Extremely High (Millions/sec)	High (Thousands/sec)
Ordering	Guaranteed per partition	Guaranteed per queue
Consumption	Pull-based (Consumer polls)	Push-based (Broker pushes)
Use Case	Event streaming, Log aggregation, Analytics	Complex routing, Task queues

The Power of Pull-Based Consumption

Kafka’s pull-based model is a key architectural decision that distinguishes it from traditional push-based messaging systems.

Consumer Control

Consumers fetch messages at their own pace. Fast consumers aren’t held back by slow ones, and slow consumers aren’t overwhelmed by a flood of messages (backpressure is inherent).

Batching Efficiency

Consumers can pull large batches of messages in a single request, significantly reducing network overhead and improving throughput (IOPS).

Rewind & Replay

Since the broker doesn’t track “who read what” (consumers track their own offsets), consumers can easily rewind to an old offset and re-process past events. This is crucial for:

Recovering from errors
Testing new processing logic on old data
Training ML models

ZooKeeper vs KRaft (Critical for Interviews!)

Kafka has traditionally relied on ZooKeeper for cluster coordination. KRaft (Kafka Raft) is the new consensus protocol that removes this dependency.

ZooKeeper Mode (Legacy)

ZooKeeper responsibilities:

Broker registration and discovery
Controller election
Topic configuration storage
ACLs and quotas

KRaft Mode (New Standard - Kafka 3.3+)

Aspect	ZooKeeper	KRaft
Architecture	Separate cluster	Integrated with Kafka
Latency	Higher (two systems)	Lower (single system)
Scalability	Limited by ZK	Millions of partitions
Operational Complexity	Two systems to manage	Single system
Production Ready	Yes	Yes (Kafka 3.6+)

Interview Tip: Know that KRaft is the future and understand why Kafka moved away from ZooKeeper (simpler operations, better scalability, lower latency).

Partition Internals Deep Dive

Understanding partition internals is crucial for performance tuning and interviews.

Log Segments

Each partition is stored as a series of log segments:

partition-0/
├── 00000000000000000000.log    # Segment files
├── 00000000000000000000.index  # Offset index
├── 00000000000000000000.timeindex  # Time index
├── 00000000000005000000.log    # Next segment (starts at offset 5000000)
└── ...

How Writes Work

Producer sends message to partition leader
Leader appends to active segment file (sequential I/O - very fast!)
Leader replicates to followers (if acks=all)
Leader responds to producer with offset

How Reads Work

Consumer requests offset range
Broker uses .index file to find segment
Broker does sequential read from segment
Returns batch of messages

Interview Insight: Kafka achieves high throughput through:

Sequential I/O (no random seeks)
OS page cache (zero-copy)
Batching and compression

In-Sync Replicas (ISR) Deep Dive

ISR is one of the most important concepts for durability.

What is ISR?

The set of replicas that are “caught up” with the leader. A replica is removed from ISR if:

It falls behind by more than replica.lag.time.max.ms (default 30s)
It’s disconnected from ZooKeeper/controllers

ISR Configuration

Config	Description	Default
`min.insync.replicas`	Minimum ISRs for write to succeed	1
`replica.lag.time.max.ms`	Max lag before removal from ISR	30000

Data Loss Scenarios

acks=1, leader fails

Scenario: Producer gets ack after leader writes, but leader crashes before replicating. Result: Data loss when new leader is elected. Fix: Use acks=all

All ISRs fail simultaneously

Scenario: All in-sync replicas fail at once. Result: Either wait for ISR to recover, or allow non-ISR leader (data loss). Config: unclean.leader.election.enable=false (default)

min.insync.replicas=1 with RF=2

Scenario: Two replicas, one fails, producer still writes with acks=all. Result: Only one copy exists. If it fails, data is lost. Fix: RF=3, min.insync.replicas=2

The Golden Rule

Replication Factor = 3
min.insync.replicas = 2
acks = all

This ensures: 2 replicas must acknowledge, can survive 1 broker failure without data loss.

Interview Questions & Answers

How does Kafka achieve high throughput?

Sequential I/O: Append-only writes, no random disk seeks
OS Page Cache: Uses filesystem cache for zero-copy reads
Batching: Groups messages to reduce network/disk overhead
Compression: Reduces network bandwidth
Partitioning: Parallelism across brokers

What happens when a broker fails?

Controller detects broker failure (via heartbeats)
For each partition where failed broker was leader:
- Controller elects new leader from ISR
- Updates metadata in all brokers
Producers/consumers get metadata update and reconnect
Data is safe if replica was in ISR

How do you choose the number of partitions?

Formula: partitions = max(throughput/producer_throughput, throughput/consumer_throughput)Considerations:

More partitions = more parallelism
More partitions = more memory/file handles
Can’t decrease partitions (only increase)
Each partition = at most one consumer in a group

Rule of thumb: Start with 3-6 partitions per topic, scale as needed.

What is the difference between a Topic and a Partition?

Topic: Logical category/feed (like a table)
Partition: Physical subdivision of a topic (like a shard)

A topic is split into partitions for:

Scalability (spread across brokers)
Parallelism (multiple consumers)
Ordering (guaranteed within partition)

How does Kafka handle consumer failures?

Consumer stops sending heartbeats
After session.timeout.ms, consumer is considered dead
Rebalance is triggered in the consumer group
Partitions are reassigned to remaining consumers
New consumer starts from last committed offset

What is the Controller in Kafka?

One broker is elected as the Controller:

Monitors broker liveness
Elects partition leaders when brokers fail
Updates cluster metadata
Manages partition reassignments

In KRaft mode, there’s a quorum of controllers using Raft consensus.

Common Pitfalls

1. Too Few Partitions: Can’t increase consumer parallelism. Start with enough partitions.2. acks=1 in Production: Risk of data loss. Always use acks=all for critical data.3. Single Consumer for High Volume: One consumer can only handle one partition. Use consumer groups.4. Ignoring Consumer Lag: Lag indicates consumers are too slow. Monitor and alert on lag.5. Auto-Create Topics in Production: Set auto.create.topics.enable=false. Typos create unwanted topics.

Installation & Quick Start

Docker (Recommended)
Local Install

Using Confluent Platform (includes Zookeeper/Kraft and tools):

# docker-compose.yml
version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.3.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000

  broker:
    image: confluentinc/cp-kafka:7.3.0
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: 'zookeeper:2181'
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://broker:29092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Run: docker-compose up -d

Download Kafka from apache.org.
Extract the archive.
Start Zookeeper (if not using KRaft): bin/zookeeper-server-start.sh config/zookeeper.properties
Start Kafka Broker: bin/kafka-server-start.sh config/server.properties

CLI Power User Commands

Topic Management

# Create a topic with 3 partitions and replication factor of 1
bin/kafka-topics.sh --create \
    --topic user-events \
    --bootstrap-server localhost:9092 \
    --partitions 3 \
    --replication-factor 1

# List topics
bin/kafka-topics.sh --list --bootstrap-server localhost:9092

# Describe topic details (leader, replicas, ISR)
bin/kafka-topics.sh --describe \
    --topic user-events \
    --bootstrap-server localhost:9092

# Delete topic
bin/kafka-topics.sh --delete \
    --topic user-events \
    --bootstrap-server localhost:9092

Producing & Consuming

# Console Producer (Type messages and hit Enter)
bin/kafka-console-producer.sh \
    --topic user-events \
    --bootstrap-server localhost:9092 \
    --property "parse.key=true" \
    --property "key.separator=:"

# Example input:
# user1:login
# user2:logout

# Console Consumer (Read from beginning)
bin/kafka-console-consumer.sh \
    --topic user-events \
    --from-beginning \
    --bootstrap-server localhost:9092 \
    --property "print.key=true"

Consumer Groups

# List consumer groups
bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

# Describe group (See lag, current offset)
bin/kafka-consumer-groups.sh --describe \
    --group my-group \
    --bootstrap-server localhost:9092

Key Takeaways

Topics are logs of events, divided into Partitions.
Partitions allow Kafka to scale and guarantee ordering.
Brokers form a cluster to provide durability and availability.
Consumer Groups allow parallel processing of topics.
Kafka is pull-based and stores data for a set retention period.

Next: Kafka Producers & Consumers →

Kafka Overview Producers & Consumers

Overview

Testing & Code Quality

Crash Courses

AI Engineering

Math for ML - Understanding Linear Algebra

Probability & Statistics for ML

Math for ML - Understanding Calculus

ML Mastery

Deep Learning Mastery

NestJS Mastery

Microservices Mastery

Low Level Design

OOP Concepts

SOLID Principles

Design Patterns

LLD Case Studies

System Design (HLD)

Senior Level (L5+/Staff)

HLD Case Studies

Engineering Fundamentals

DevOps & Operations

Azure Cloud Engineering

AWS Cloud

AWS Monitoring & Observability

AWS Security Services

AWS Serverless

AWS Operations

AWS Advanced

AWS Case Studies

GCP Cloud Engineering

DevOps Tools

Database Engineering

HIPAA Compliance Mastery

Operating Systems

Linux Internals

Distributed Systems

Networking Mastery

Build Your Own X

Go Lang Mastery

C Programming

Classic Research Papers

Distributed System Tools

​Kafka Fundamentals

​What is Kafka?

Event Streaming

Distributed

Durable

Scalable

​Core Architecture

​1. Events (Messages)

​2. Topics

​3. Partitions

​4. Brokers

​5. Replication

​Producers & Consumers

​Producers

​Consumers

​Kafka vs RabbitMQ (Deep Dive)

​The Power of Pull-Based Consumption

​ZooKeeper vs KRaft (Critical for Interviews!)

​ZooKeeper Mode (Legacy)

​KRaft Mode (New Standard - Kafka 3.3+)

​Partition Internals Deep Dive

​Log Segments

​How Writes Work

​How Reads Work

​In-Sync Replicas (ISR) Deep Dive

​What is ISR?

​ISR Configuration

​Data Loss Scenarios

​The Golden Rule

​Interview Questions & Answers

​Common Pitfalls

​Installation & Quick Start

​CLI Power User Commands

​Topic Management

​Producing & Consuming

​Consumer Groups

​Key Takeaways

Kafka Fundamentals

What is Kafka?

Core Architecture

1. Events (Messages)

2. Topics

3. Partitions

4. Brokers

5. Replication

Producers & Consumers

Producers

Consumers

Kafka vs RabbitMQ (Deep Dive)

The Power of Pull-Based Consumption

ZooKeeper vs KRaft (Critical for Interviews!)

ZooKeeper Mode (Legacy)

KRaft Mode (New Standard - Kafka 3.3+)

Partition Internals Deep Dive

Log Segments

How Writes Work

How Reads Work

In-Sync Replicas (ISR) Deep Dive

What is ISR?

ISR Configuration

Data Loss Scenarios

The Golden Rule

Interview Questions & Answers

Common Pitfalls

Installation & Quick Start

CLI Power User Commands

Topic Management

Producing & Consuming

Consumer Groups

Key Takeaways