Improve Apache Cassandra observability
The Apache Cassandra JMX server monitoring extension in Dynatrace provides information about database exceptions, failed requests, performance, and more.
Dynatrace automatically detects all applications and microservices deployed in your system and allows you to track how they use your database. It provides automatic end-to-end tracing down to a single database statement, database server metrics and log insights. Dynatrace visualizes application to database dependencies for SQL and noSQL databases as well as for cloud databases and self-hosted databases. It also diagnoses anomalies in real-time using AI and pinpoints the root cause down to the slow performing or erroneous SQL statements. Deep code-level insights combined with cloud-native database server monitoring will help you maintain a robust production environment.
If Cassandra is underperforming or a problem occurs, Dynatrace lets you know immediately and shows you which nodes are affected.
This is a JMX (Java Management Extension) Dynatrace extension. JMX is ideal for monitoring applications built using Java. Make sure you are monitoring your Cassandra process as the extension does not support gathering metrics from the client-side.
For Apache Cassandra database clients:
If your client application runs on a virtual machine or bare-metal, install OneAgent on it to get started.
If your client application runs as a workload in Kubernetes or OpenShift, set up Dynatrace on Kubernetes or OpenShift.
Activate the following OneAgent features to get tracing insight:
Activate log monitoring to get log insight.
For Apache Cassandra database servers:
If your database server runs on a virtual machine or bare-metal, install OneAgent on it to get started.
Activate the Cassandra JMX extension to get insight into database server's health and performance combined with metrics and events. Select Add to envrionment.
Activate log monitoring to get full log insight.
The dashboard Cassandra JMX Overview will be included in the extension, where you can see an overview of your Cassandra JXM metrics.
Additionally, all metrics captured by the extension will be appended to the process group instance Unified Analysis screen in three new sections: Exceptions, Usage and Thread Pool. Make sure to be on the new screen for process group instances to see them.
All metrics can also be viewed with the data explorer.
The extension consumes DDU Units. However, they are elgible for the free tier included with every host. The amount of DDUs units depends on the number of instances monitored.
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly, the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.
| Metric name | Metric key | Description |
|---|---|---|
| Live SSTable count | cassandra.columnFamily.liveSSTableCount | The number of live SSTables in the column family. |
| Metric name | Metric key | Description |
|---|---|---|
| Exception count | cassandra.storage.exceptions | The number of storage exceptions. |
| Storage load | cassandra.storage.load | The current load on the storage system. |
| Storage total hints | cassandra.storage.totalHints | The total number of hints in the storage system. |
| Metric name | Metric key | Description |
|---|---|---|
| Write failure rate | cassandra.clientRequest.write.failures | The rate of write failures per second. A write failure occurs when a write request cannot be completed successfully. This can be due to various reasons such as timeouts, unavailable replicas, or other issues in the Cassandra cluster. |
| Write timeout rate | cassandra.clientRequest.write.timeout | The rate of write timeouts per second. A write timeout occurs when a write request cannot be completed within the specified time. |
| Write unavailable rate | cassandra.clientRequest.write.unavailables | The rate of write unavailables per second. A write unavailable occurs when a write request cannot be completed due to unavailable replicas. |
| Write latency rate | cassandra.clientRequest.write.latency.rate | The rate of write latencies per second. |
| Write latency 95th percentile | cassandra.clientRequest.write.latency.95thPercentile | The 95th percentile of write latencies. |
| Metric name | Metric key | Description |
|---|---|---|
| Compaction pending tasks | cassandra.compaction.pendingTasks | The number of compaction tasks that are pending. |
| Compaction completed tasks | cassandra.compaction.completedTasks | The number of compaction tasks that have been completed. |
| Compaction rate | cassandra.compaction.bytesCompacted | The rate of bytes compacted per second. |
| Metric name | Metric key | Description |
|---|---|---|
| KeyCache hit rate | cassandra.cache.keyCache.hits | The rate of key cache hits per second. |
| RowCache hit rate | cassandra.cache.rowCache.hits | The rate of row cache hits per second. |
| Metric name | Metric key | Description |
|---|---|---|
| Read stage pending tasks | cassandra.threadPool.request.read.pending.fix | The number of pending tasks in the read stage. |
| Read stage active tasks | cassandra.threadPool.request.read.active.fix | The number of active tasks in the read stage. |
| Read stage blocked tasks total | cassandra.threadPool.request.read.totalBlocked.fix | The total number of blocked tasks in the read stage. |
| Read stage currently blocked tasks | cassandra.threadPool.request.read.currentlyBlocked.fix | The number of currently blocked tasks in the read stage. |
| ReadRepair stage pending tasks | cassandra.threadPool.request.readRepair.pending.fix | The number of pending tasks in the ReadRepair stage. |
| ReadRepair stage active tasks | cassandra.threadPool.request.readRepair.active.fix | The number of active tasks in the ReadRepair stage. |
| ReadRepair stage blocked tasks total | cassandra.threadPool.request.readRepair.totalBlocked.fix | The total number of blocked tasks in the ReadRepair stage. |
| ReadRepair stage currently blocked tasks | cassandra.threadPool.request.readRepair.currentlyBlocked.fix | The number of currently blocked tasks in the ReadRepair stage. |
| Mutation stage pending tasks | cassandra.threadPool.request.mutation.pending.fix | The number of pending tasks in the Mutation stage. |
| Mutation stage active tasks | cassandra.threadPool.request.mutation.active.fix | The number of active tasks in the Mutation stage. |
| Mutation stage blocked tasks total | cassandra.threadPool.request.mutation.totalBlocked.fix | The total number of blocked tasks in the Mutation stage. |
| Mutation stage currently blocked tasks | cassandra.threadPool.request.mutation.currentlyBlocked.fix | The number of currently blocked tasks in the Mutation stage. |
| Metric name | Metric key | Description |
|---|---|---|
| RangeSlice timeout rate | cassandra.clientRequest.rangeSlice.timeout | The rate of RangeSlice timeouts per second. A RangeSlice timeout occurs when a RangeSlice request cannot be completed within the specified time. |
| RangeSlice failure rate | cassandra.clientRequest.rangeSlice.failures | The rate of RangeSlice failures per second. A RangeSlice failure occurs when a RangeSlice request cannot be completed successfully. |
| RangeSlice unavailable rate | cassandra.clientRequest.rangeSlice.unavailables | The rate of RangeSlice unavailables per second. A RangeSlice unavailable occurs when a RangeSlice request cannot be completed due to unavailable replicas. |
| RangeSlice latency rate | cassandra.clientRequest.rangeSlice.latency.rate | The rate of RangeSlice latencies per second. |
| RangeSlice latency 95th percentile | cassandra.clientRequest.rangeSlice.latency.95thPercentile | The 95th percentile of RangeSlice latencies. |
| Metric name | Metric key | Description |
|---|---|---|
| Read failure rate | cassandra.clientRequest.read.failures | The rate of read failures per second. A read failure occurs when a read request cannot be completed successfully. This can be due to various reasons such as timeouts, unavailable replicas, or other issues in the Cassandra cluster. |
| Read timeout rate | cassandra.clientRequest.read.timeout | The rate of read timeouts per second. A read timeout occurs when a read request cannot be completed within the specified time. |
| Read unavailable rate | cassandra.clientRequest.read.unavailables | The rate of read unavailables per second. A read unavailable occurs when a read request cannot be completed due to unavailable replicas. |
| Read latency rate | cassandra.clientRequest.read.latency.rate | The rate of read latencies per second. |
| Read latency 95th percentile | cassandra.clientRequest.read.latency.95thPercentile | The 95th percentile of read latencies. |