Monitor Prometheus metrics exposed by Redpanda platform.
Monitor the health and performance of your Redpanda clusters through their exposed Prometheus endpoints.
To capture consumer group metrics, enable them in your Redpanda configuration.
Activate the extension in your environment using the in-product Hub, provide the necessary device configuration, and you're all set up. For more details, see Manage Prometheus extensions.
The extension package contains:
Calculations are based on the assumption that you monitor all metrics for every feature set every minute. The inclusion of histogram-type metrics may cause differences in the actual result:
DDUs: (20 + (4* Nº Storage shards) + (4 * Nº Servers) + (5 * Nº Consumer groups) + Nº Namespaces * (3 * Nº Topics/namespace) * (2 * Nº Partitions/topic)) * 525.6 DDUs/year, per Redpanda instance
DPS (Metric data points): (20 + (4* Nº Storage shards) + (4 * Nº Servers) + (5 * Nº Consumer groups) + Nº Namespaces * (3 * Nº Topics/namespace) * (2 * Nº Partitions/topic)) * 525,600 metric data points/year, per Redpanda instance
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda rest_proxy request errors | redpanda.rest_proxy.request_errors_total.count | Total number of rest_proxy server errors |
| Redpanda REST proxy request latency | redpanda.rest_proxy.request_latency_seconds | Internal latency of request for REST proxy (Histogram of observed events) |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda storage free bytes | redpanda.storage.free | Disk storage bytes free |
| Redpanda storage total bytes | redpanda.storage.total | Total size of attached storage, in bytes |
| Redpanda storage space alert | redpanda.storage.alert | Status of low storage space alert. 0-OK, 1-Low Space 2-Degraded |
| Redpanda total CPU busy time in seconds | redpanda.cpu.busy.count | Total time (in seconds) the CPU has been actively processing tasks |
| Redpanda allocated memory size | redpanda.memory.allocated | Allocated memory size in bytes |
| Redpanda available memory size | redpanda.memory.available | Total shard memory potentially available in bytes (free_memory plus reclaimable) |
| Redpanda RPC active connections | redpanda.rpc.active_connections | Count of currently active connections |
| Redpanda RPC latency | redpanda.rpc.request_latency_seconds | Latency (in seconds) for RPC requests (Histogram of observed events) |
| Redpanda I/O queue read operations | redpanda.io_queue.read_ops.count | Count of read operations processed by the I/O queue |
| Redpanda I/O queue write operations | redpanda.io_queue.write_ops.count | Count of write operations processed by the I/O queue. |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda Kafka consumer group commited offset | redpanda.kafka_consumer_group.committed_offset | Consumer group committed offset. |
| Redpanda Kafka consumer group max lag | redpanda.kafka_consumer_group.lag.max | Maximum lag observed among all partitions for a consumer group |
| Redpanda Kafka consumer group aggregated lag | redpanda.kafka_consumer_group.lag.sum | Aggregated lag across all partitions for a consumer group |
| Redpanda Kafka consumer group consumers | redpanda.kafka_consumer_group.consumers | Number of consumers per consumer group |
| Redpanda Kafka consumer group topics | redpanda.kafka_consumer_group.topics | Number of topics per consumer group |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda Kafka request latency | redpanda.kafka_broker.request_latency_seconds | Latency of produce/consume requests per broker (Histogram of observed events) |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda cluster brokers | redpanda.cluster.brokers | Number of configured brokers in the cluster (that is, the size of the cluster) |
| Redpanda cluster partitions | redpanda.cluster.partitions | Number of partitions managed by the cluster. This includes partitions of the controller topic, but not replicas. |
| Redpanda cluster topics | redpanda.cluster.topics | Number of topics in the cluster |
| Redpanda cluster unavailable partitions | redpanda.cluster.unavailable_partitions | Number of partitions that lack quorum among replicants |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda Kafka max offset | redpanda.kafka_partition.max_offset | Latest committed offset for the partition (that is, the offset of the last message safely persisted on most replicas). |
| Redpanda Kafka under replicated replicas | redpanda.kafka_partition.under_replicated_replicas | Number of under-replicated replicas for the partition (that is, replicas that are live but are not at the latest offset.) |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda Kafka replicas | redpanda.kafka_topic.replicas | Number of configured replicas per topic |
| Redpanda Kafka request bytes total | redpanda.kafka_topic.request_bytes_total.count | Total number of bytes produced or consumed per topic. |
| Redpanda Kafka leadership transfers | redpanda.kafka_topic.raft_leadership_changes.count | Number of won leader elections across all partitions in given topic |
| Metric name | Metric key | Description |
|---|---|---|
| Redpanda application build | redpanda.application.build | Redpanda build information |
| Redpanda application uptime | redpanda.application.uptime | Redpanda uptime in seconds |
Message Queues