Cassandra monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new Apache Cassandra extension for improved functionality and support.

Apache Cassandra server monitoring in Dynatrace provides information about database exceptions, failed requests, performance, and more. If Cassandra is underperforming or a problem occurs, Dynatrace lets you know immediately and shows you which nodes are affected.

This is a JMX (Java Management Extension) Dynatrace extension. JMX is ideal for monitoring applications built using Java.

Prerequisites

Cassandra 2.xx
Linux or Windows

Enabling Cassandra monitoring globally

With Cassandra monitoring enabled globally, Dynatrace automatically collects Cassandra metrics whenever a new host running Cassandra is detected in your environment.

Go to Settings.
Select Monitoring > Monitored technologies.
In the Supported technologies list, find the Cassandra JMX row.
Turn on the Cassandra JMX switch.

Monitoring Cassandra in Dynatrace

Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
Select the Apache Cassandra tile.
To view Cassandra cluster metrics, select the cluster in the Process group table under the tiles.
The chart displays the selected process group (cluster) metric over time. You can select a different metric from the list.
In the expanded row, select Process group details to see details on the selected Cassandra cluster.
On the Process group details page, select the Technology-specific metrics tab to identify any problematic nodes.
To display node-specific metrics, select a node from the Process list under the chart.
Select the Cassandra metrics tab to see valuable node-specific Cassandra metrics.
- The Exceptions and Failed requests charts show you if there’s a problem with the node. Pay particular attention to the Unavailable - Read, Unavailable - Write, and Unavailable - RangeSlice counts in Failed requests.
- The Operation count and Latency 95th percentile charts can help you monitor performance. Increased latency while the number of operations remains stable typically indicates a performance issue.
Select the Further details tab to see charts on a variety of additional Cassandra metrics.

Cassandra cluster metrics

Select the Technology-specific metrics tab on the Process group details page to display aggregated Cassandra cluster metrics. Use the Show chart for list to change a different chart to display. All metrics are plotted against the number of process group instances. Hover your pointer over the chart to see an instance count and the minimum, maximum, and average for the selected metric at that time.

Suspension
JVM threads
Java memory pool commits
Java memory pool used
GC time (garbage collection time)
Exception count
Files open
RangeSlice latency
RangeSlices
Read latency
Reads
Storage load
Write latency
Writes

Cassandra node metrics

Cassandra metrics tab

The Cassandra metrics tab shows key metrics for Cassandra on the node level.

Chart	Metric	Description
Exceptions	Exception count	Number of internal Cassandra exceptions detected. Under normal conditions, this metric should be zero.
Failed requests	Unavailable – Read	Number of `Unavailable – Read` exceptions encountered.
	Unavailable – Write	Number of `Unavailable – Write` exceptions encountered.
	Unavailable – RangeSlice	Number of `Unavailable – RangeSlice` exceptions encountered.
	Timeout – Read	Number of `Timeout – Read` exceptions encountered.
	Timeout – Write	Number of `Timeout – Write` exceptions encountered.
	Timeout – RangeSlice	Number of `Timeout – RangeSlice` exceptions encountered.
	Failure – Read	Number of `Failure – Read` exceptions encountered.
	Failure – Write	Number of `Failure – Write` exceptions encountered.
	Failure – RangeSlice	Number of `Failure – RangeSlice` exceptions encountered.
Operation count	Read	Average number of reads per second.
	Write	Average number of writes per second.
	RangeSlice	Average number of RangeSlices per second.
Latency 95th percentile	Read	Average 95th percentile of transaction read latency.
	Write	Average 95th percentile of transaction write latency.
	RangeSlice	Average 95th percentile of transaction RangeSlice latency.

Further details tab

The Further details tab shows additional metrics for Cassandra on the node level: Cache, Disk usage, Hints, Java managed memory, Load, and Pending tasks.

Chart	Metric	Description
Cache: Hit rate	Row cache hit rate	2m row cache hit rate.
	Key cache hit rate	2m key cache row hit rate.
Disk usage: Storage load	Load	Size, in bytes, of the on-disk data the node manages.
Disk usage: Bytes compacted	Bytes compacted	Total number of bytes compacted since server start.
Disk usage: Compaction tasks pending	Pending tasks	Estimated number of compactions remaining to perform.
Disk usage: Compaction tasks completed	Completed tasks	Number of completed compactions since server start.
Disk usage: SSTable count	SSTable count	Number of SSTables on disk for this table.
Hints	Hints	Number of hint messages written to this node since start. Includes one entry for each host to be hinted per hint.
Java managed memory: poolname	Used memory	Java used memory.
	Committed memory	Java committed memory.
	Maximum memory	Java maximum memory.
	Garbage collection count	Java garbage collection count.
	Garbage collection time	Java garbage collection time.
Load: Read latency	Average	Average 95th percentile of transaction read latency.
	Maximum	Maximum 95th percentile of transaction read latency.
Load: Write latency	Average	Average 95th percentile of transaction write latency.
	Maximum	Maximum 95th percentile of transaction write latency.
Load: RangeSlice latency	Average	Average 95th percentile of transaction RangeSlice latency.
	Maximum	Maximum 95th percentile of transaction RangeSlice latency.
Load: Read throughput	Average	Average number of reads per second.
	Maximum	Maximum number of reads per second.
Load: Write throughput	Average	Average number of writes per second.
	Maximum	Maximum number of writes per second.
Load: RangeSlice throughput	Average	Average number of RangeSlices per second.
	Maximum	Maximum number of RangeSlices per second.
Pending tasks: Read pending tasks	Read pending tasks	Number of read mutation tasks.
Pending tasks: ReadRepair pending tasks	ReadRepair pending tasks	Number of ReadRepair mutation tasks.
Pending tasks: Mutation pending tasks	Mutation pending tasks	Number of queued mutation tasks.
Pending tasks: Compaction pending tasks	Compaction tasks pending	Estimated number of compactions remaining to perform.