If you love understanding how things actually work, this chapter is for you. If you just want to send and receive messages, feel free to skip ahead. No judgment.
This chapter takes you inside RabbitMQ. We will explore how Erlang enables RabbitMQ’s reliability, understand the complete message flow, and demystify clustering and high availability. This knowledge is what allows you to build truly resilient messaging systems.
If a queue process crashes, its supervisor restarts it. If multiple children crash, supervisor may restart the whole subtree. This “let it crash” philosophy is why RabbitMQ is remarkably stable.
Routing Key: "orders.us.new"Binding: "orders.#" -> MATCH (# = zero or more words)Binding: "orders.*.new" -> MATCH (* = exactly one word)Binding: "orders.eu.*" -> NO MATCH (eu != us)Binding: "*.us.*" -> MATCH
Erlang processes use credit flow between each other:
Connection Process ──[credits]──▶ Channel Process ──[credits]──▶ Queue ProcessWhen credits run out:- Upstream process blocks- Wait for credits to replenish- This propagates backpressure to publishers
Answer: Three things must be durable: 1) Queue declared with durable=true (survives restart), 2) Messages published with persistent=true (written to disk), 3) Publisher confirms enabled (know when written). For HA, use quorum queues (Raft consensus) or classic mirrored queues. Even with all this, messages can be lost if acked by consumer but not processed.
What is the difference between quorum and mirrored queues?
Answer: Mirrored queues use synchronous replication (all mirrors must sync before ack), which is slow and complex. Quorum queues use Raft consensus (majority must agree), which is safer and handles partitions better. Quorum queues are the recommended approach for HA in RabbitMQ 3.8+. Mirrored queues are deprecated.
Explain prefetch and why it matters
Answer: Prefetch (QoS) limits unacknowledged messages per consumer. Default is unlimited (dangerous). With prefetch=10, consumer gets up to 10 messages before acking. Benefits: 1) Load balancing - slow consumers get fewer messages, 2) Memory control - limits messages in flight, 3) Fairness - no consumer hogs the queue. Set via basic.qos(prefetch_count=N).
What happens when a RabbitMQ node fails?
Answer: Classic queues: messages on that node are unavailable until node recovers (unless mirrored). Quorum queues: if leader fails, Raft elects new leader from followers, queue continues serving (with majority). Cluster: other nodes detect failure, client connections to dead node drop, clients should reconnect to surviving nodes.
How does RabbitMQ handle message ordering?
Answer: Messages are ordered within a queue (FIFO). But: 1) With multiple consumers, messages are distributed - no ordering across consumers, 2) Requeued messages (nack with requeue) go to front or back (configurable), 3) Dead letter exchange changes order. For strict ordering: single consumer, or partition by key (consistent hashing exchange), or use streams.
When would you use RabbitMQ vs Kafka?
Answer: RabbitMQ: complex routing (topic, headers), request-reply (RPC), task queues where messages are deleted after processing, lower latency for small messages. Kafka: high-throughput event streaming, replay capability, longer retention, log aggregation, when consumers need to read same messages multiple times. RabbitMQ streams blur this line.
# List queues with message counts -- the first command to run when "messages are stuck"# messages_ready = waiting to be delivered, messages_unacknowledged = delivered but not acked# If messages_ready is growing: consumers are too slow or disconnected# If messages_unacknowledged is growing: consumers are receiving but not acking (bug or stall)rabbitmqctl list_queues name messages messages_ready messages_unacknowledged# List connections -- check for stale or blocked connections# "blocking" state means the connection is paused due to a memory or disk alarmrabbitmqctl list_connections name state channels# List consumers -- verify your consumers are actually connected# An empty list when you expect consumers means they crashed or lost connectionrabbitmqctl list_consumers# Check for alarms -- memory and disk alarms block publishers# If publishers are mysteriously stuck, this is often the causerabbitmqctl status | grep -A5 "Alarms"# Trace messages (DEVELOPMENT ONLY -- this generates enormous output in production)# Shows every message publish and deliver in real timerabbitmqctl trace_on
Production gotcha: Running rabbitmqctl trace_on on a production broker can generate gigabytes of log data per minute and significantly impact performance. Use it only in development or on a test broker. For production debugging, use the management UI’s message rates and queue depths, or enable per-queue tracing on a single low-traffic queue.