RabbitMQ monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new RabbitMQ extension for improved functionality and support.

RabbitMQ server monitoring provides a high-level overview of all RabbitMQ components within your cluster.

With RabbitMQ message-related metrics, you’ll immediately know when something is wrong. And when problems occur, it’s easy to see which nodes are affected. It’s then simple to drill down into the metrics of individual nodes to find the root cause of problems and potential bottlenecks.

Prerequisites

  • Dynatrace OneAgent version 1.100+
  • OneAgent must be installed on a node that has a statistics database. We recommend that you install OneAgent on all RabbitMQ nodes.
  • Rabbitmq-management extension installed and enabled on all nodes you want to monitor.
  • A RabbitMQ management extension user with monitoring privileges and access to all virtual hosts that you want to monitor.
  • Linux OS or Windows
  • RabbitMQ version 3.4.0+
  • A single RabbitMQ cluster
  • Statistics available on the localhost interface via HTTP

Enable RabbitMQ monitoring globally

With RabbitMQ monitoring enabled globally, Dynatrace automatically collects RabbitMQ metrics whenever a new host running RabbitMQ is detected in your environment.

  1. Go to Settings.
  2. Select Monitoring > Monitored technologies.
  3. On the Supported technologies tab, find the RabbitMQ entry and click in the Edit column to expand the row.
  4. Set the User and Password.
    All RabbitMQ instances must have the same username and password.
  5. Set the Port. The default port is 15672.
  6. Define the Queues to be reported.
    Provide the queue names separated by commas. If you leave this field empty, all queues will be reported.
    The wildcard character * replaces any combination of characters. For example, queue\* will report any name starting with queue.
  7. Select Save.
  8. Turn on the Global monitoring switch for RabbitMQ.
    RabbitMQ monitoring is enabled globally.

Enable RabbitMQ monitoring for individual hosts

Dynatrace provides the option of enabling RabbitMQ monitoring for specific hosts rather than globally.

  1. If RabbitMQ monitoring is currently switched on, switch it off: go to Settings > Monitoring > Monitored technologies and turn off the RabbitMQ switch.
  2. Go to Hosts or Hosts Classic (latest Dynatrace).
  3. Find the host you want to configure.
    Use the filter at the top of the list to help you locate the host.
  4. Select the host to open the host page.
  5. Select More () > Settings to open the Host settings page.
  6. In the Monitored technologies list, find the RabbitMQ row and turn on the Monitoring switch.
    RabbitMQ monitoring is enabled for the selected host.

To view RabbitMQ monitoring insights

  1. Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
  2. Select the RabbitMQ tile.
  3. To view cluster metrics, expand the Details section of the RabbitMQ process group.
  4. Select Process group details.
  5. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics.

RabbitMQ cluster ("process group") overview pages provide an overview of RabbitMQ cluster health. From here, it’s easy to identify problematic nodes. Just select a relevant time interval for the timeline, select a node metric from the metric drop list, and compare the values of all nodes in the sortable table.

Further down the page, you’ll find a number of other cluster-specific charts.

RabbitMQ cluster charts

Metric
Description

Queued messages

RabbitMQ’s queues are most efficient when they’re empty, so the lower the Queued messages count, the better.

Message rates

The Message rates chart is the best indicator of RabbitMQ performance.

Nodes health

Presents number of nodes in given state. Please be aware that this chart will be available not for every RabbitMQ version.

Queues health

The Queues health chart shows more than just queue health. RabbitMQ can handle a high volume of queues, but each queue requires additional resources, so watch these queue numbers carefully. If the queues begin to pile up, you may have a queue leak. If you can’t find the leakage, consider adding a queue-ttl policy.

Cluster summary

The Cluster summary chart provides an overview of all RabbitMQ cluster elements.

For more RabbitMQ performance tips, have a look at this article about avoiding high CPU and memory usage.

RabbitMQ cluster monitoring metrics

Metric
Description

Messages ready

The number of messages that are ready to be delivered. This is the sum of messages in the messages_ready status.

Messages unacknowledged

The number of messages delivered to clients, but not yet acknowledged. This is the sum of messages in the messages_unacknowledged status.

Acknowledged

The rate at which messages are acknowledged by the client/consumer.

Deliver and Get

The rate per second of the sum of messages: (1) delivered in acknowledgment mode to consumers, (2) delivered in n0-acknowledgment mode to consumers, (3) delivered in acknowledgment mode in response to basic.get, (4) delivered in n0-acknowledgment mode in response to basic.get.

Publish

The rate at which messages are incoming to the RabbitMQ cluster.

Failed

The number of unhealthy nodes. Please be aware that not every RabbitMQ version provides this metric.

Ok

The number of healthy nodes. Please be aware that not every RabbitMQ version provides this metric.

Queues health chart

The number of queues in a given state.

Channels

The number of channels (virtual connections). If the number of channels is high, you may have a memory leak in your client code.

Connections

The number of TCP connections to the message broker. Frequently opened and closed connections can result in high CPU usage. Connections should be long-lived. Channels can be opened and closed more frequently.

Consumers

The number of consumers

Exchanges

The number of exchanges

RabbitMQ node monitoring

To access valuable RabbitMQ node metrics:

  1. Select Hosts from the menu.
  2. On the Hosts page, select your RabbitMQ host.
  3. In the Processes section of the Hosts page, select the RabbitMQ process.
  4. Expand the Properties pane and select the RabbitMQ process group link.
  5. Select a node from the Process list on the Process group details page.
  6. Click the RabbitMQ metrics tab.

Valuable RabbitMQ node metrics are displayed on each RabbitMQ process page on the RabbitMQ metrics tab.

  • The Messages chart indicates how many messages are queued (the fewer the better).
  • The next two charts present the number of RabbitMQ elements that work on the current node.
  • On the process/node page, all metrics are per node. The following metrics are available: Messages ready, Messages unacknowledged, number of Consumers, Queues, Channels, and Connections.

To return to the cluster level, expand the Properties section of the RabbitMQ Processes page and select the cluster.

Additional RabbitMQ node monitoring metrics

More RabbitMQ monitoring metrics are available from individual Process pages. Select the Further details tab for more monitoring insights.

On the Further details tab you’ll find the following additional charts.

Chart
Description

Memory usage

The percentage of available RabbitMQ memory. 100% means that the RabbitMQ memory limit vm_memory_high_watermark has been reached. (by default, vm_memory_high_watermark is set to 40% of installed RAM). Once the RabbitMQ server has used up all available memory, all new connections are blocked. Note that this doesn’t prevent the RabbitMQ server from using more than its limit—this is merely the point at which publishers are throttled.

Available disk space

The percentage of available RabbitMQ disk space. Indicates how much available disk space remains before the disk_free_limit is reached. Once all available disk space is used up, RabbitMQ blocks producers and prevents memory-based messages from being paged to disk. This reduces, but doesn’t eliminate, the likelihood of a crash due to the exhaustion of disk space.

File descriptors usage

The percentage of available file descriptors. RabbitMQ installations running production workloads may require system limits and kernel-parameter tuning to handle a realistic number of concurrent connections and queues. RabbitMQ recommends allowing for at least 65,536 file descriptors when using RabbitMQ in production environments. 4,096 file descriptors is sufficient for most development workloads. RabbitMQ documentation suggests that you set your file descriptor limit to 1.5 times the maximum number of connections you expect.

Erlang processes usage

The percentage of available Erlang processes. The maximum number of processes can be changed via the RABBITMQ_SERVER_ERL_ARGS environment variable.

Sockets usage

The percentage of available Erlang sockets. The required number of sockets is correlated with the required number of file descriptors. For more details, see the Controlling System Limits on Linux section at www.rabbitmq.com.

For more information about RabbitMQ statistics, see www.rabbitmq.com.