> ## Documentation Index
> Fetch the complete documentation index at: https://resources.devweekends.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Kafka Crash Course

> Master event streaming - the backbone of real-time data pipelines

<Frame>
  <img src="https://mintcdn.com/devweeekends/emzPt-9B_R8UKdqm/images/courses/devops-tools/kafka-architecture.svg?fit=max&auto=format&n=emzPt-9B_R8UKdqm&q=85&s=e568470c2dda7345bb0706983895fa43" alt="Kafka Architecture Overview" width="1080" height="1080" data-path="images/courses/devops-tools/kafka-architecture.svg" />
</Frame>

# Kafka Crash Course

> **"Kafka is the central nervous system for real-time data."** - Jay Kreps, Co-creator of Kafka

When LinkedIn needed to handle billions of events per day, traditional message queues collapsed under the load. So they built Kafka -- a distributed commit log designed from the ground up for high-throughput, fault-tolerant event streaming. Today, it powers the real-time pipelines at Netflix, Uber, LinkedIn, and thousands of other companies. This course takes you from event-curious to production-ready, covering everything from the fundamentals of topics and partitions to the operational reality of running Kafka clusters at scale.

***

## Why Kafka Matters

<CardGroup cols={2}>
  <Card title="High Throughput" icon="gauge-high">
    Millions of messages per second
  </Card>

  <Card title="Scalability" icon="arrows-maximize">
    Horizontal scaling, distributed by design
  </Card>

  <Card title="Durability" icon="hard-drive">
    Messages persist on disk, replicated
  </Card>

  <Card title="Real-Time" icon="bolt">
    Process streams in real-time
  </Card>
</CardGroup>

***

## The Story Behind Kafka

**2011**: LinkedIn created Kafka to handle their massive data pipeline needs.

**The Problem**:

* Traditional messaging couldn't handle LinkedIn's scale
* Needed to process billions of events daily
* Real-time analytics requirements
* Data integration across systems

**The Solution**: Apache Kafka

* Distributed, partitioned, replicated log
* High throughput (millions of messages/sec)
* Horizontal scalability
* Fault-tolerant and durable

**Today**: Kafka powers:

* **LinkedIn**: 7+ trillion messages/day
* **Netflix**: Real-time recommendations
* **Uber**: Trip data and analytics
* **Airbnb**: Payment processing
* **Twitter**: Real-time analytics

**Open Sourced**: 2011, became Apache project

***

## What You'll Learn

<Steps>
  <Step title="Fundamentals">
    Topics, partitions, brokers, producers, consumers. The distributed log abstraction that makes Kafka special.
    [Start Here](/courses/devops-tools/kafka-fundamentals)
  </Step>

  <Step title="Internals Deep Dive">
    Log segments, ISR mechanics, leader election, consumer rebalancing. If you love understanding how things actually work, this one is for you.
    [Explore Internals](/courses/devops-tools/kafka-internals)
  </Step>

  <Step title="Producers and Consumers">
    Publishing messages, consuming streams, serialization, idempotency. The APIs you will use every day.
    [Learn APIs](/courses/devops-tools/kafka-producers-consumers)
  </Step>

  <Step title="Stream Processing">
    Kafka Streams API, transformations, aggregations, joins. Real-time processing without Spark or Flink.
    [Process Streams](/courses/devops-tools/kafka-streams)
  </Step>

  <Step title="Operations">
    Clustering, replication tuning, monitoring, capacity planning. Running Kafka in production.
    [Run in Production](/courses/devops-tools/kafka-operations)
  </Step>

  <Step title="Ecosystem">
    Kafka Connect, Schema Registry, ksqlDB. The tools that make Kafka a complete platform.
    [Explore Ecosystem](/courses/devops-tools/kafka-ecosystem)
  </Step>
</Steps>

***

## Kafka vs RabbitMQ

The most common question in system design interviews: "When would you choose Kafka over RabbitMQ?" The short answer: Kafka is for events (things that happened), RabbitMQ is for commands (things you want to happen). If you need to replay last week's events for a new consumer, Kafka. If you need to distribute tasks to workers and ensure each task is processed exactly once, RabbitMQ.

| Feature               | Kafka                               | RabbitMQ                          |
| --------------------- | ----------------------------------- | --------------------------------- |
| **Use Case**          | Event streaming, logs, CDC          | Task queues, RPC, complex routing |
| **Throughput**        | Very high (millions/sec)            | High (thousands/sec)              |
| **Message Retention** | Configurable (days/weeks/forever)   | Until consumed and acknowledged   |
| **Ordering**          | Per partition                       | Per queue                         |
| **Consumers**         | Pull model (consumer controls pace) | Push model (broker controls pace) |

***

## Course Structure

### Module 1: Fundamentals (2-3 hours)

The distributed commit log, topics, partitions, brokers, offsets. Understanding why Kafka is different from traditional message queues.

### Module 2: Internals Deep Dive (2-3 hours)

Log segments and indexes, ISR and replication mechanics, leader election, consumer group coordination, ZooKeeper vs KRaft. **If you love internals, continue. If not, skip to Module 3.**

### Module 3: Producers and Consumers (2-3 hours)

Producer batching and compression, consumer groups and rebalancing, exactly-once semantics, offset management.

### Module 4: Stream Processing (2 hours)

Kafka Streams API, stateless transformations, stateful processing, windowing, joins. Stream processing without the complexity of Spark.

### Module 5: Operations (2 hours)

Cluster sizing, replication factor tuning, monitoring with JMX, capacity planning, performance tuning.

### Module 6: Ecosystem (1-2 hours)

Kafka Connect for data integration, Schema Registry for schema evolution, ksqlDB for SQL-based stream processing.

***

Ready to master Kafka? Start with [Kafka Fundamentals](/courses/devops-tools/kafka-fundamentals) or jump to [Internals Deep Dive](/courses/devops-tools/kafka-internals) if you want to understand the distributed log that powers trillions of events per day.
