Documentation Index
Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
Use this file to discover all available pages before exploring further.
Kafka Crash Course
“Kafka is the central nervous system for real-time data.” - Jay Kreps, Co-creator of KafkaWhen LinkedIn needed to handle billions of events per day, traditional message queues collapsed under the load. So they built Kafka — a distributed commit log designed from the ground up for high-throughput, fault-tolerant event streaming. Today, it powers the real-time pipelines at Netflix, Uber, LinkedIn, and thousands of other companies. This course takes you from event-curious to production-ready, covering everything from the fundamentals of topics and partitions to the operational reality of running Kafka clusters at scale.
Why Kafka Matters
High Throughput
Millions of messages per second
Scalability
Horizontal scaling, distributed by design
Durability
Messages persist on disk, replicated
Real-Time
Process streams in real-time
The Story Behind Kafka
2011: LinkedIn created Kafka to handle their massive data pipeline needs. The Problem:- Traditional messaging couldn’t handle LinkedIn’s scale
- Needed to process billions of events daily
- Real-time analytics requirements
- Data integration across systems
- Distributed, partitioned, replicated log
- High throughput (millions of messages/sec)
- Horizontal scalability
- Fault-tolerant and durable
- LinkedIn: 7+ trillion messages/day
- Netflix: Real-time recommendations
- Uber: Trip data and analytics
- Airbnb: Payment processing
- Twitter: Real-time analytics
What You’ll Learn
Fundamentals
Topics, partitions, brokers, producers, consumers. The distributed log abstraction that makes Kafka special.
Start Here
Internals Deep Dive
Log segments, ISR mechanics, leader election, consumer rebalancing. If you love understanding how things actually work, this one is for you.
Explore Internals
Producers and Consumers
Publishing messages, consuming streams, serialization, idempotency. The APIs you will use every day.
Learn APIs
Stream Processing
Kafka Streams API, transformations, aggregations, joins. Real-time processing without Spark or Flink.
Process Streams
Operations
Clustering, replication tuning, monitoring, capacity planning. Running Kafka in production.
Run in Production
Ecosystem
Kafka Connect, Schema Registry, ksqlDB. The tools that make Kafka a complete platform.
Explore Ecosystem
Kafka vs RabbitMQ
The most common question in system design interviews: “When would you choose Kafka over RabbitMQ?” The short answer: Kafka is for events (things that happened), RabbitMQ is for commands (things you want to happen). If you need to replay last week’s events for a new consumer, Kafka. If you need to distribute tasks to workers and ensure each task is processed exactly once, RabbitMQ.| Feature | Kafka | RabbitMQ |
|---|---|---|
| Use Case | Event streaming, logs, CDC | Task queues, RPC, complex routing |
| Throughput | Very high (millions/sec) | High (thousands/sec) |
| Message Retention | Configurable (days/weeks/forever) | Until consumed and acknowledged |
| Ordering | Per partition | Per queue |
| Consumers | Pull model (consumer controls pace) | Push model (broker controls pace) |
Course Structure
Module 1: Fundamentals (2-3 hours)
The distributed commit log, topics, partitions, brokers, offsets. Understanding why Kafka is different from traditional message queues.Module 2: Internals Deep Dive (2-3 hours)
Log segments and indexes, ISR and replication mechanics, leader election, consumer group coordination, ZooKeeper vs KRaft. If you love internals, continue. If not, skip to Module 3.Module 3: Producers and Consumers (2-3 hours)
Producer batching and compression, consumer groups and rebalancing, exactly-once semantics, offset management.Module 4: Stream Processing (2 hours)
Kafka Streams API, stateless transformations, stateful processing, windowing, joins. Stream processing without the complexity of Spark.Module 5: Operations (2 hours)
Cluster sizing, replication factor tuning, monitoring with JMX, capacity planning, performance tuning.Module 6: Ecosystem (1-2 hours)
Kafka Connect for data integration, Schema Registry for schema evolution, ksqlDB for SQL-based stream processing.Ready to master Kafka? Start with Kafka Fundamentals or jump to Internals Deep Dive if you want to understand the distributed log that powers trillions of events per day.