Real-World

RabbitMQ

Deployments

Enterprise customers

Years helping & learning from

Open-source users

RabbitMQ broker works in every case

  • mature & flexible - sometimes too flexible
  • getting better with every patch release - montlhy
  • big improvements every minor release - yearly
  • limits are better understood
  • but...

Applications, Runtimes & Clouds

  • some are engineered better than others
  • no opportunity to improve once built
  • diverse runtimes (Java, Golang, etc.)
  • containers in VMs in servers
  • CapEx vs OpEx
  • and...

Messages must always flow

RabbitMQ is a river, not a lake

May your queues be always empty

Low-Latency RabbitMQ

Financial Trading

A real-world low-latency RabbitMQ

  • 4 financial markets
  • 7,900 financial instruments
  • 30,000 messages/s
  • 1KB message payload
I've lost money if a message is over 1ms

Publisher → broker raw network latency

Cloud AVG MAX STD
A 0.49ms 1.62ms 0.12ms
B 0.25ms 0.69ms 0.07ms
C 0.12ms 0.15ms 0.02ms

publisher:~$ ping -s 1000 -c 100 broker

Broker → consumer raw network latency

IaaS AVG MAX STD
A 0.38ms 1.52ms 0.12ms
B 0.17ms 1.00ms 0.10ms
C 0.08ms 0.09ms 0.01ms

broker:~$ ping -s 1000 -c 100 consumer

Publisher → Broker → Consumer

# IaaS LATENCY
3 A 0.87ms
2 B 0.42ms
1 C 0.24ms

1KB Message, 1 Publisher → 1 Queue → 1 Consumer

PUBLISHER CONFIRMS MESSAGES/s MAX 99th
disabled 68,000 3,000ms
every 1 message 8,100 0.19ms
every 5 messages 22,000 0.35ms
every 10 messages 28,000 0.75ms
every 13 messages 30,000 0.98ms
every 20 messages 34,000 1.50ms

1 Queue, confirm every message, 1KB message

PUBLISHERS
CONSUMERS
MESSAGES/s MAX 99th
1 8,100 0.19ms
5 30,000 0.27ms
10 50,000 0.34ms
20 65,000 0.59ms
50 58,000 1.50ms

30,000 messages/s with queue mirroring

QUEUE MIRRORS MAX 99th
Master + 1 19.92ms
Master + 2 39.84ms
RabbitMQ workloads in a customer environment

Bless a RabbitMQ

Wayne Lund

High-Throughput RabbitMQ

Vehicle telemetry & events

A real-world high-throughput RabbitMQ

  • 1,000,000 vehicles during peak rush hours
  • 10,000 messages/s @ 64KiB each
  • 3,000 messages/s logging service
  • 3,000 messages/s analysis service
  • 300,000,000 message buffer

Does RabbitMQ have enough network capacity?

  • Count ingress as well as egress
  • 5Gbps + 3Gbps = 20Gbps
  • Remember load balancer(s)
Do you need load balancers in front of RabbitMQ?

Send messages straight to fast disks

Persistent Message Store

  • 1 Erlang process per node v3.6.x
  • 1 Erlang process per vhost since v3.7.0
  • Queues in flow state = backpressure

Metrics when many connections, channels & queues

  • Distributed metrics since RabbitMQ v3.6.7
  • Generate metrics less often & expire sooner
  • Extract metrics into purpose-built systems

Use sharded queues for high throughput

  • Logical queue with shards on every node
  • Queue is available if at least 1 running node
  • Publishers & consumers use local shards
rabbitmq-sharding, built-in plugin

High-Scale RabbitMQ

Medical equipment

A real-world high-scale RabbitMQ

  • 100,000 medical devices, 1 queue per device
  • 20,000 long-lived connections
  • 300 messages/s @ 4KiB each

Fewer nodes is best

  • Every node in a cluster communicates with every other node
  • When a node goes away, all remaining nodes clean up
  • RabbitMQ metadata needs to synchronise across all nodes

Connections are not free

  • Linux TCP sockets start with ~100KB buffers
  • Free up many GBs of memory by tuning

Are default metrics right for you?

  • Disable metrics which are not required
  • Generate metrics less often & expire sooner
  • Extract metrics into purpose-built systems

Exchange & queue type differences

EXCHANGE QUEUE BINDS/S 100,000
Topic Durable 22 75'
Topic Non-Durable 35 47'
Direct Durable 120 14'
Direct Non-Durable 266 6'

Ask me for screenshots

In summary

  • Know what you are trying to achieve
  • Don't mix workloads in a production cluster
  • You can't improve what you can't measure
  • 90% of RabbitMQ issues are down to applications

How can you help?

Contribute your observations

Tell us about your workload

What metrics are important?

real-world.rabbitmq.pivotal.io