Guide
RabbitMQ explained
Your checkout service needs to email a receipt, update inventory, and push a warehouse pick list — but you cannot block the HTTP response on three downstream APIs that might time out. RabbitMQ is an open-source AMQP message broker that decouples producers from consumers with durable queues, flexible routing via exchanges, and delivery guarantees you control through acknowledgments and publisher confirms. Unlike a distributed log like Kafka, RabbitMQ excels at task queues and competing consumers: one message goes to exactly one worker, work is deleted after ack, and routing rules can fan out or filter by pattern. This guide covers the AMQP model (connections, channels, exchanges, queues, bindings), exchange types, durability and persistence, consumer ack modes and prefetch, publisher confirms, dead-letter exchanges, clustering and quorum queues, a Harbor Fleet order-notification worked example, a broker decision table, common pitfalls, and a practitioner checklist alongside our message queues overview and dead letter queue guide.
The AMQP model: producers, exchanges, queues, consumers
RabbitMQ implements the Advanced Message Queuing Protocol (AMQP). Producers never publish directly to a queue — they publish to an exchange, which routes copies to zero or more queues according to bindings (rules linking an exchange to a queue, often with a routing key). Consumers subscribe to queues and process messages at their own pace.
That indirection is RabbitMQ's superpower. One order.created event can
fan out to an email queue, an analytics queue, and a fulfillment queue without the
producer knowing how many subscribers exist. Add a fraud-scoring consumer next month
by binding a new queue — no deploy to the checkout service.
Connections and channels
Clients open a TCP connection (optionally over TLS) and multiplex
lightweight channels on top. Each thread or goroutine should use its
own channel — channels are not thread-safe, but creating one is cheap. Heartbeats
detect dead peers; tune heartbeat and connection_timeout for
cloud NAT gateways that drop idle TCP sessions after 350 seconds.
Virtual hosts
A vhost is a logical namespace (like a database schema). Production
typically runs prod, staging, and dev vhosts on
the same cluster with separate users and permissions — staging misconfiguration cannot
drain production queues.
Exchange types and routing patterns
| Exchange type | Routing behavior | Typical use |
|---|---|---|
direct |
Routing key must match binding key exactly | Point-to-point task queues, RPC reply queues |
topic |
Pattern match: * one word, # zero or more |
Domain events (orders.us.created), multi-tenant routing |
fanout |
Ignores routing key; copies to every bound queue | Broadcast cache invalidation, fan-out notifications |
headers |
Matches message header key-value pairs | Complex routing without string keys (rare in practice) |
default |
Routing key equals queue name | Simple single-queue apps, tutorials |
Topic exchanges are the workhorse for event-driven microservices.
Binding orders.*.created catches US and EU order events; binding
orders.# catches everything under the orders namespace. Producers publish
with a routing key like orders.us.created and optional headers for
trace IDs.
Alternate exchanges catch unroutable messages instead of silently dropping them — essential for debugging misconfigured bindings in production.
Queues: durability, TTL, priority, and limits
Declare queues with flags that match your reliability needs:
- Durable — queue metadata survives broker restart (messages still
need
persistentdelivery mode). - Exclusive — only the declaring connection can consume; dies when connection closes (RPC reply queues).
- Auto-delete — queue removed when last consumer unsubscribes.
Message persistence (delivery_mode=2) writes body to disk
on durable queues. Throughput drops versus transient messages — use persistence for
work that cannot be regenerated (payment intents), transient for idempotent
recomputation (thumbnail generation from S3).
TTL and dead-lettering
Per-message or per-queue TTL expires stale jobs (abandoned cart emails after 24 hours). Pair with a dead-letter exchange (DLX): expired or rejected messages route to an inspection queue instead of vanishing. This is RabbitMQ's native DLQ pattern — see our dead letter queues guide for redrive runbooks.
Priority queues (max 255 levels) let urgent messages jump ahead within a single queue — useful sparingly; most teams prefer separate queues per SLA tier to avoid starvation analysis headaches.
Quorum queues vs classic mirrored queues
Modern RabbitMQ (3.8+) recommends quorum queues (Raft-based replication) over legacy mirrored classic queues. Quorum queues trade some feature flags (no priority, limited TTL semantics) for predictable failover. New deployments should default to quorum for any queue that must survive node loss.
Consumer acknowledgments and prefetch
RabbitMQ tracks whether each message was processed. In automatic ack
mode, the broker considers a message delivered the instant it hits the consumer —
crash mid-processing and the work is lost. Production consumers use
manual ack: call basic.ack after successful processing,
basic.nack or basic.reject with requeue=true for
transient failures, requeue=false to dead-letter after max retries.
Prefetch (basic.qos prefetch_count) limits unacked
messages per consumer. Without prefetch, one slow consumer hoards thousands of messages
while peers sit idle. Set prefetch to match realistic concurrency — often 10–50 for
fast I/O-bound tasks, 1–5 for heavy CPU jobs.
Delivery is at-least-once when you ack after processing: broker redelivers if the consumer dies before ack. Consumers must be idempotent — use deduplication keys, upserts, or outbox patterns so duplicate delivery does not double-charge a customer.
Publisher confirms and transactions
Producers need confidence the broker accepted the message. Publisher
confirms (async ack/nack per message or batch) are the modern standard —
enable confirm.select, publish, and wait for broker confirmation before
telling the user "order submitted." Unroutable messages can trigger a nack when
mandatory=true is set.
AMQP transactions (tx.select) are synchronous and slow —
avoid them. Confirms plus persistent messages on quorum queues give you durable,
verifiable handoff without blocking the entire channel per publish.
For request-reply over RabbitMQ, use a dedicated reply-to queue
(often exclusive, auto-delete) and correlation_id in message properties
to match responses — a pattern Celery and many RPC wrappers abstract away.
Worked example: Harbor Fleet order notifications
Harbor Fleet's checkout service publishes order events after payment succeeds. Topology
on vhost harbor-prod:
- Exchange
orders.topic(topic, durable). - Queues bound:
email.receiptsbindingorders.*.paid;warehouse.picksbindingorders.*.paid;analytics.eventsbindingorders.#;fraud.reviewbindingorders.*.paidwith headerx-match=all, risk=highon a headers exchange fed by a shovel (edge case). - Publish: routing key
orders.us.paid, JSON body,delivery_mode=2,message_id= order UUID, publisher confirms enabled. - Email worker: prefetch 20, manual ack after SendGrid 202, DLX
orders.dlxafter 3 nacks withrequeue=false. - Warehouse worker: prefetch 5 (heavier WMS API), idempotent on
order_idvia Postgres upsert.
Cluster: three-node RabbitMQ 3.13 on Kubernetes (Helm chart), all business queues as
quorum queues, management plugin for queue depth alerts. Peak Black Friday: email queue
depth hits 40k; autoscaled consumers from 4 to 20 pods based on
rabbitmq_queue_messages_ready Prometheus metric — see our
Prometheus monitoring guide
for scrape patterns.
RabbitMQ vs Kafka vs SQS decision table
| Need | Best fit | Why |
|---|---|---|
| Task queue, competing consumers, complex routing | RabbitMQ | Flexible exchanges, message deleted after ack, low-latency push to consumers |
| High-throughput event log, replay, stream processing | Apache Kafka | Partitioned commit log, consumer groups with offset rewind, retention by time/size |
| Managed queue, no cluster ops, AWS-native | Amazon SQS (+ SNS) | Serverless scaling, visibility timeout DLQ, pay per request |
| RPC-style work distribution with priorities | RabbitMQ | Per-message priority, reply-to queues, mature client libraries |
| Analytics pipeline ingesting billions/day | Kafka | Disk-backed sequential writes, Flink/Spark connectors, compaction topics |
| Exactly-once end-to-end (hard everywhere) | Idempotent consumers + outbox | Broker choice matters less than dedup keys and transactional outbox |
Common pitfalls
- Auto-ack consumers — silent message loss on crash; always manual ack in production.
- Unbounded prefetch — one greedy consumer starves the pool; set
prefetch_countexplicitly. - Non-idempotent handlers — at-least-once delivery doubles side effects;
dedupe on
message_idor business key. - Durable queue, transient messages — queue survives restart but messages vanish; match persistence flags to intent.
- Classic mirrored queues on new clusters — use quorum queues unless a legacy feature forces classic.
- No alternate exchange — misrouted publishes disappear; add
alternate-exchangeand alert on depth. - Giant messages — default 128 MB limit still blows heap; pass S3 URLs in body, not multi-MB PDFs.
- Shared channels across threads — protocol errors and subtle corruption; one channel per consumer thread.
- Ignoring memory and disk alarms — broker blocks publishers when
vm_memory_high_watermarktrips; monitor and scale consumers before alarms.
Practitioner checklist
- Separate vhosts per environment; least-privilege users per service.
- Use topic or direct exchanges; avoid publishing to the default exchange in microservices.
- Declare durable quorum queues for business-critical work.
- Set
delivery_mode=2for messages that cannot be replayed from source. - Enable publisher confirms; handle nacks and mandatory unroutable messages.
- Consume with manual ack; tune prefetch to worker capacity.
- Implement idempotent handlers with deduplication keys.
- Configure DLX and max-retry policy for poison messages.
- Monitor queue depth, consumer utilization, and memory/disk alarms.
- Load-test failover: kill a broker node and verify quorum election recovers within SLA.
Key takeaways
- RabbitMQ routes messages through exchanges and bindings to queues — producers stay decoupled from consumer topology.
- Exchange type determines routing: direct for tasks, topic for events, fanout for broadcast.
- Reliability comes from durable quorum queues, persistent messages, publisher confirms, and manual consumer acks.
- At-least-once is the practical guarantee — design idempotent consumers.
- Choose RabbitMQ for flexible task queues; reach for Kafka when you need a replayable event log at massive scale.
Related reading
- Apache Kafka explained — partitioned event logs and stream processing at scale
- Message queues explained — broker comparison and delivery guarantee fundamentals
- Dead letter queues explained — poison messages, DLX routing, and safe replay
- Idempotency explained — deduplication patterns for at-least-once consumers