Skip to main content

Documentation Index

Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt

Use this file to discover all available pages before exploring further.

Kafka Architecture Overview

Kafka Crash Course

“Kafka is the central nervous system for real-time data.” - Jay Kreps, Co-creator of Kafka
When LinkedIn needed to handle billions of events per day, traditional message queues collapsed under the load. So they built Kafka — a distributed commit log designed from the ground up for high-throughput, fault-tolerant event streaming. Today, it powers the real-time pipelines at Netflix, Uber, LinkedIn, and thousands of other companies. This course takes you from event-curious to production-ready, covering everything from the fundamentals of topics and partitions to the operational reality of running Kafka clusters at scale.

Why Kafka Matters

High Throughput

Millions of messages per second

Scalability

Horizontal scaling, distributed by design

Durability

Messages persist on disk, replicated

Real-Time

Process streams in real-time

The Story Behind Kafka

2011: LinkedIn created Kafka to handle their massive data pipeline needs. The Problem:
  • Traditional messaging couldn’t handle LinkedIn’s scale
  • Needed to process billions of events daily
  • Real-time analytics requirements
  • Data integration across systems
The Solution: Apache Kafka
  • Distributed, partitioned, replicated log
  • High throughput (millions of messages/sec)
  • Horizontal scalability
  • Fault-tolerant and durable
Today: Kafka powers:
  • LinkedIn: 7+ trillion messages/day
  • Netflix: Real-time recommendations
  • Uber: Trip data and analytics
  • Airbnb: Payment processing
  • Twitter: Real-time analytics
Open Sourced: 2011, became Apache project

What You’ll Learn

1

Fundamentals

Topics, partitions, brokers, producers, consumers. The distributed log abstraction that makes Kafka special. Start Here
2

Internals Deep Dive

Log segments, ISR mechanics, leader election, consumer rebalancing. If you love understanding how things actually work, this one is for you. Explore Internals
3

Producers and Consumers

Publishing messages, consuming streams, serialization, idempotency. The APIs you will use every day. Learn APIs
4

Stream Processing

Kafka Streams API, transformations, aggregations, joins. Real-time processing without Spark or Flink. Process Streams
5

Operations

Clustering, replication tuning, monitoring, capacity planning. Running Kafka in production. Run in Production
6

Ecosystem

Kafka Connect, Schema Registry, ksqlDB. The tools that make Kafka a complete platform. Explore Ecosystem

Kafka vs RabbitMQ

The most common question in system design interviews: “When would you choose Kafka over RabbitMQ?” The short answer: Kafka is for events (things that happened), RabbitMQ is for commands (things you want to happen). If you need to replay last week’s events for a new consumer, Kafka. If you need to distribute tasks to workers and ensure each task is processed exactly once, RabbitMQ.
FeatureKafkaRabbitMQ
Use CaseEvent streaming, logs, CDCTask queues, RPC, complex routing
ThroughputVery high (millions/sec)High (thousands/sec)
Message RetentionConfigurable (days/weeks/forever)Until consumed and acknowledged
OrderingPer partitionPer queue
ConsumersPull model (consumer controls pace)Push model (broker controls pace)

Course Structure

Module 1: Fundamentals (2-3 hours)

The distributed commit log, topics, partitions, brokers, offsets. Understanding why Kafka is different from traditional message queues.

Module 2: Internals Deep Dive (2-3 hours)

Log segments and indexes, ISR and replication mechanics, leader election, consumer group coordination, ZooKeeper vs KRaft. If you love internals, continue. If not, skip to Module 3.

Module 3: Producers and Consumers (2-3 hours)

Producer batching and compression, consumer groups and rebalancing, exactly-once semantics, offset management.

Module 4: Stream Processing (2 hours)

Kafka Streams API, stateless transformations, stateful processing, windowing, joins. Stream processing without the complexity of Spark.

Module 5: Operations (2 hours)

Cluster sizing, replication factor tuning, monitoring with JMX, capacity planning, performance tuning.

Module 6: Ecosystem (1-2 hours)

Kafka Connect for data integration, Schema Registry for schema evolution, ksqlDB for SQL-based stream processing.
Ready to master Kafka? Start with Kafka Fundamentals or jump to Internals Deep Dive if you want to understand the distributed log that powers trillions of events per day.