Cassandra monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new Apache Cassandra extension for improved functionality and support.

Apache Cassandra server monitoring in Dynatrace provides information about database exceptions, failed requests, performance, and more. If Cassandra is underperforming or a problem occurs, Dynatrace lets you know immediately and shows you which nodes are affected.

This is a JMX (Java Management Extension) Dynatrace extension. JMX is ideal for monitoring applications built using Java.

Prerequisites

  • Cassandra 2.xx
  • Linux or Windows

Enabling Cassandra monitoring globally

With Cassandra monitoring enabled globally, Dynatrace automatically collects Cassandra metrics whenever a new host running Cassandra is detected in your environment.

  1. Go to Settings.
  2. Select Monitoring > Monitored technologies.
  3. In the Supported technologies list, find the Cassandra JMX row.
  4. Turn on the Cassandra JMX switch.

Monitoring Cassandra in Dynatrace

  1. Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
  2. Select the Apache Cassandra tile.
  3. To view Cassandra cluster metrics, select the cluster in the Process group table under the tiles.
    The chart displays the selected process group (cluster) metric over time. You can select a different metric from the list.
  4. In the expanded row, select Process group details to see details on the selected Cassandra cluster.
  5. On the Process group details page, select the Technology-specific metrics tab to identify any problematic nodes.
  6. To display node-specific metrics, select a node from the Process list under the chart.
  7. Select the Cassandra metrics tab to see valuable node-specific Cassandra metrics.
    • The Exceptions and Failed requests charts show you if there’s a problem with the node. Pay particular attention to the Unavailable - Read, Unavailable - Write, and Unavailable - RangeSlice counts in Failed requests.
    • The Operation count and Latency 95th percentile charts can help you monitor performance. Increased latency while the number of operations remains stable typically indicates a performance issue.
  8. Select the Further details tab to see charts on a variety of additional Cassandra metrics.

Cassandra cluster metrics

Select the Technology-specific metrics tab on the Process group details page to display aggregated Cassandra cluster metrics. Use the Show chart for list to change a different chart to display. All metrics are plotted against the number of process group instances. Hover your pointer over the chart to see an instance count and the minimum, maximum, and average for the selected metric at that time.

  • Suspension
  • JVM threads
  • Java memory pool commits
  • Java memory pool used
  • GC time (garbage collection time)
  • Exception count
  • Files open
  • RangeSlice latency
  • RangeSlices
  • Read latency
  • Reads
  • Storage load
  • Write latency
  • Writes

Cassandra node metrics

Cassandra metrics tab

The Cassandra metrics tab shows key metrics for Cassandra on the node level.

Chart

Metric

Description

Exceptions

Exception count

Number of internal Cassandra exceptions detected. Under normal conditions, this metric should be zero.

Failed requests

Unavailable – Read

Number of Unavailable – Read exceptions encountered.

Unavailable – Write

Number of Unavailable – Write exceptions encountered.

Unavailable – RangeSlice

Number of Unavailable – RangeSlice exceptions encountered.

Timeout – Read

Number of Timeout – Read exceptions encountered.

Timeout – Write

Number of Timeout – Write exceptions encountered.

Timeout – RangeSlice

Number of Timeout – RangeSlice exceptions encountered.

Failure – Read

Number of Failure – Read exceptions encountered.

Failure – Write

Number of Failure – Write exceptions encountered.

Failure – RangeSlice

Number of Failure – RangeSlice exceptions encountered.

Operation count

Read

Average number of reads per second.

Write

Average number of writes per second.

RangeSlice

Average number of RangeSlices per second.

Latency 95th percentile

Read

Average 95th percentile of transaction read latency.

Write

Average 95th percentile of transaction write latency.

RangeSlice

Average 95th percentile of transaction RangeSlice latency.

Further details tab

The Further details tab shows additional metrics for Cassandra on the node level: Cache, Disk usage, Hints, Java managed memory, Load, and Pending tasks.

Chart

Metric

Description

Cache: Hit rate

Row cache hit rate

2m row cache hit rate.

Key cache hit rate

2m key cache row hit rate.

Disk usage: Storage load

Load

Size, in bytes, of the on-disk data the node manages.

Disk usage: Bytes compacted

Bytes compacted

Total number of bytes compacted since server start.

Disk usage: Compaction tasks pending

Pending tasks

Estimated number of compactions remaining to perform.

Disk usage: Compaction tasks completed

Completed tasks

Number of completed compactions since server start.

Disk usage: SSTable count

SSTable count

Number of SSTables on disk for this table.

Hints

Hints

Number of hint messages written to this node since start. Includes one entry for each host to be hinted per hint.

Java managed memory: poolname

Used memory

Java used memory.

Committed memory

Java committed memory.

Maximum memory

Java maximum memory.

Garbage collection count

Java garbage collection count.

Garbage collection time

Java garbage collection time.

Load: Read latency

Average

Average 95th percentile of transaction read latency.

Maximum

Maximum 95th percentile of transaction read latency.

Load: Write latency

Average

Average 95th percentile of transaction write latency.

Maximum

Maximum 95th percentile of transaction write latency.

Load: RangeSlice latency

Average

Average 95th percentile of transaction RangeSlice latency.

Maximum

Maximum 95th percentile of transaction RangeSlice latency.

Load: Read throughput

Average

Average number of reads per second.

Maximum

Maximum number of reads per second.

Load: Write throughput

Average

Average number of writes per second.

Maximum

Maximum number of writes per second.

Load: RangeSlice throughput

Average

Average number of RangeSlices per second.

Maximum

Maximum number of RangeSlices per second.

Pending tasks: Read pending tasks

Read pending tasks

Number of read mutation tasks.

Pending tasks: ReadRepair pending tasks

ReadRepair pending tasks

Number of ReadRepair mutation tasks.

Pending tasks: Mutation pending tasks

Mutation pending tasks

Number of queued mutation tasks.

Pending tasks: Compaction pending tasks

Compaction tasks pending

Estimated number of compactions remaining to perform.