NetApp OnTap (Remote) extension

Latest Dynatrace
Extension

Collect NetApp OnTap metrics via OnTap 9.6+ API to monitor your clusters.

Get started

Overview

Collect NetApp OnTap metrics via OnTap 9.6+ API and Dynatrace Intelligence to monitor your clusters.

The NetApp OnTap (Remote) extension allows you to collect, view, and analyze metrics from your NetApp OnTap clusters both on the cluster level and for each of your nodes and storage virtual machines (SVMs).

Use cases

Collect and analyze metrics from your NetApp OnTap clusters in context of your hosts, applications, and services.
Gain additional insight by using charting and dashboarding capabilities
Use Dynatrace Intelligence to generate baselines and alert you on anomaly detection in designated metrics.

Requirements

NetApp OnTap version 9.6+ with a reachable REST API
Enabled the following metric event configurations:
- OnTap Cluster monitoring unavailable
- OnTap FRU in error state
- High Temperature on OnTap Node
Active VMware extension (version 3.5.1+) in your environment
Active NetApp OnTap (Remote) extension in your environment
An OnTap user with the http application access that is assigned a rest-role with at least read-only access to the following API paths:
- /api/cluster
- /api/cluster/nodes
- /api/snapmirror/relationships
- /api/storage/aggregates
- /api/storage/cluster
- /api/storage/disks
- /api/storage/luns
- /api/storage/pools
- /api/storage/volumes
- /api/svm/svms

Compatibility information

This extension requires a connection to the NetApp OnTap API and, thus, supports OnTap only from version 9.6+.
A NetApp OnTap Overview dashboard is a part of the extension. This includes links used to access the various entities detected by the OnTap.

Activation and setup

Select the desired ActiveGate group that will run the monitoring configuration. Each monitoring configuration can have one or more OnTap clusters configured.
Configure a NetApp OnTap Extension Endpoint for each of the chosen clusters:
- OnTap REST API URL: enter the URL (including protocol) of your OnTap API address, for example <https://ontap-prod>/.
- Cluster name: enter the name of your cluster entity (by default, cluster name uses the detected hostname).
- Username: enter the username used for the API access.
- Password: enter the username used for the API access (check #requirements for the list of required permissions).
- Proxy
  - Address: enter the address of your proxy, for example <http://proxy.example.com:8080>.
  - Proxy username: enter the username used for your proxy.
  - Proxy password: enter the password used for your proxy.
- Verify SSL certificate
- Frequency: frequency of metric collection, by default set to once per minute. You can configure it to increase the timeframe between metric collection in large clusters where collecting all requested data would take longer than one minute.
- Log level: set at the monitoring configuration level and will apply to all endpoints. By default, set to INFO. We recommend using DEBUG logging only when investigating issues with support.
Enable the desired feature sets (refer to the Details tab for what metrics are associated with which feature sets).

Details

Metrics are associated with different feature sets that can be enabled or disabled as needed. The extension collects metrics once per minute.
Starting with version 2.3.2 of the NetApp OnTap (Remote) extension, rules are included to allow linking of this extension's netapp_ontap:volume entity to the vmware:datastore from the VMware remote monitoring extension. This linking is done via the same as relationship and is based on the volume's name matching the datastore's NAS remote path property.

Licensing and cost

The extension ingests metrics and events. The details of license consumption will depend on which licensing model you are using. For more information about licensing costs, see Dynatrace classic licensing or the Dynatrace Platform Subscription (DPS) depending on your license model.

License consumption is based on the number of metric data points ingested. You can calculate the approximate annual data points ingested by using the following formula:

(16 + (4 x nodes) + (1 x frus) + (1 x svms) + (2 x disks) + (5 x aggregates) + (20 x volumes) +(4 x volume of svm with qos policy) + (3 x snapmirror relationships)) x 60 min x 24 h x 365 days data points/year

The above formula assumes that all feature sets are enabled. You'll need to adjust the formula if you reconfigured the frequency of metric collection.

In the classic licensing model, metric ingestion will consume Davis Data Units (DDUs) at the rate of .001 DDUs per metric data point.

Multiply the above formula for annual data points by .001 to estimate annual DDU usage.

This extension will additionally report log events in 2 situations:

When a cluster node restart is detected.
When the extension cannot connect to the configured cluster API endpoint.
- Each minute will have another event reported until the issue is resolved and a successful connection occurs.

License consumption is based on the size (in bytes) of data ingested & processed, retained, and queried. To learn more about the dimensions affecting license consumption, see Log Analytics (DPS).

For the information about the log record ingestion in the classic licensing model, see Davis Data Units (DDUs).

Feature sets

When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly, the extension has to collect at least one metric after the activation.

In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.

All metrics that aren't categorized into any feature set are considered to be the default and are always reported.

A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.

default

Metric name	Metric key	Description
Cluster availability	netapp.ontap.cluster.availability	Connectivity to the configured OnTap cluster URL as detected by the extension

frus

Metric name	Metric key	Description
FRU state	netapp.ontap.node.fru.state	State of the field replaceable unit (100% for OK 0% for ERROR))

nodes

Metric name	Metric key	Description
Node uptime	netapp.ontap.node.uptime	How long the node reports it has been running
Over temperature	netapp.ontap.node.over_temperature	Specifies whether the hardware is currently operating outside of its recommended temperature range (0 = "normal", 1 = "over").
Node membership	netapp.ontap.node.membership	Membership status of the cluster node
Node processor utilization	netapp.ontap.node.processor_utilization	Average CPU Utilization for the node

luns

Metric name	Metric key	Description
LUN state	netapp.ontap.lun.state	The state of the LUN. Normal states for a LUN are online and offline. Other states indicate errors
LUN container state	netapp.ontap.lun.container_state	The state of the volume and aggregate that contain the LUN. LUNs are only available when their containers are available
LUN enabled state	netapp.ontap.lun.enabled	The enabled state of the LUN. LUNs can be disabled to prevent access to the LUN. 1 = enabled, 0 = disabled
LUN space used	netapp.ontap.lun.used	The amount of space consumed by the main data stream of the LUN
LUN size	netapp.ontap.lun.size	The total provisioned size of the LUN
LUN space used percentage	netapp.ontap.lun.used_percentage	Space used in the LUN as a percentage

storage-pools

Metric name	Metric key	Description
Storage pool total capacity	netapp.ontap.pool.total_capacity	Total size of the flash pool, in bytes.
Storage pool usable capacity	netapp.ontap.pool.usable_capacity	Remaining usable capacity in the flash pool, in bytes.
Storage pool used capacity	netapp.ontap.pool.used_capacity	Used capacity in the flash pool, in bytes.
Storage pool total capacity	netapp.ontap.pool.used_percentage	Percentage of capacity used in the flash pool.

volumes

Metric name	Metric key	Description
Volume state	netapp.ontap.volume.state	Volume state: error, mixed, offline, or online
Volume throughput (other)	netapp.ontap.volume.throughput.other.count	The volume's rate of throughput bytes observed at the storage object (other)
Volume throughput (read)	netapp.ontap.volume.throughput.read.count	The volume's rate of throughput bytes observed at the storage object (read)
Volume throughput (write)	netapp.ontap.volume.throughput.write.count	The volume's rate of throughput bytes observed at the storage object (write)
Volume throughput (total)	netapp.ontap.volume.throughput.total.count	The volume's rate of throughput bytes observed at the storage object (total)
Volume IOPS (other)	netapp.ontap.volume.iops.other.count	The volume's number of I/O operations observed at the storage object (other)
Volume IOPS (read)	netapp.ontap.volume.iops.read.count	The volume's number of I/O operations observed at the storage object (read)
Volume IOPS (write)	netapp.ontap.volume.iops.write.count	The volume's number of I/O operations observed at the storage object (write)
Volume IOPS (total)	netapp.ontap.volume.iops.total.count	The volume's number of I/O operations observed at the storage object (total)
Volume latency (total)	netapp.ontap.volume.latency.total	The volume's raw latency in microseconds observed at the storage object (total)
Volume latency (read)	netapp.ontap.volume.latency.read	The volume's raw latency in microseconds observed at the storage object (read)
Volume latency (write)	netapp.ontap.volume.latency.write	The volume's raw latency in microseconds observed at the storage object (write)
Volume latency (other)	netapp.ontap.volume.latency.other	The volume's raw latency in microseconds observed at the storage object (other)
Volume size	netapp.ontap.volume.size	Total provisioned size
Volume space available	netapp.ontap.volume.available	The available space
Volume space used	netapp.ontap.volume.used	Volume space used (including data and metadata)
Volume space used percentage	netapp.ontap.volume.used_percent	Percentage of volume space used (including data and metadata)
—	netapp.ontap.volume.files.maxiumum	—
Files (inodes)	netapp.ontap.volume.files.used	Number of files (inodes) used for user-visible data permitted on the volume.
Files (inodes) used percentage	netapp.ontap.volume.files.used_percentage	Percentage of the maximum number of files used on the volume.

svms

Metric name	Metric key	Description
SVM state	netapp.ontap.svm.state	Current SVM state: starting, running, stopping, stopped,or deleting

snapmirror-relationships

Metric name	Metric key	Description
Lag time	netapp.ontap.snapmirror.relationship.lag_time	The time since the exported snapshot was created
Relationship state	netapp.ontap.snapmirror.relationship.state	The state of the relationship
Relationship health	netapp.ontap.snapmirror.relationship.health	Is the relationship healthy?

qos

Metric name	Metric key	Description
Volume QOS minimum throughput (IOPS)	netapp.ontap.volume.qos.min_throughput_iops	The minimum throughput in IOPS (volumes)
Volume QOS maximum throughput (IOPS)	netapp.ontap.volume.qos.max_throughput_iops	The maximum throughput in IOPS (volumes)
Volume QOS maximum throughput (Mbps)	netapp.ontap.volume.qos.max_throughput_mbps	The maximum throughput in Mbps (volumes)
Volume QOS minimum throughput (Mbps)	netapp.ontap.volume.qos.min_throughput_mbps	The minimum throughput in Mbps (volumes)
SVM QOS minimum throughput (IOPS)	netapp.ontap.svm.qos.min_throughput_iops	The minimum throughput in IOPS (svms)
SVM QOS maximum throughput (IOPS)	netapp.ontap.svm.qos.max_throughput_iops	The maximum throughput in IOPS (svms)
—	netapp.ontap.svm.qos.max_throughput_mbps	—
SVM QOS minimum throughput (Mbps)	netapp.ontap.svm.qos.min_throughput_mbps	The minimum throughput in Mbps (svms)

aggregates

Metric name	Metric key	Description
Aggregate state	netapp.ontap.aggregate.state	Current aggregate state: online, onlining, offline, offlining, relocating, unmounted, restricted, inconsistent, failed, or unknown
Aggregate block storage used	netapp.ontap.aggregate.block_storage_used	Space used or reserved in bytes. Includes volume guarantees and aggregate metadata.
Aggregate block storage available	netapp.ontap.aggregate.block_storage_available	Space available in bytes
Aggregate block storage size	netapp.ontap.aggregate.block_storage_size	Total usable space in bytes, not including WAFL reserve and aggregate Snapshot copy reserve.
Aggregate block storage used percentage	netapp.ontap.aggregate.block_storage_used_percent	Percentage of block storage used

disks

Metric name	Metric key	Description
Rated life used	netapp.ontap.disk.rated_life_used_percentage	Percentage of rated life used
Disk state	netapp.ontap.disk.state	Current disk state: broken, copy, maintenance, partner, pending, present, reconstructing, removed, spare, unfail, or zeroing

clusters

Metric name	Metric key	Description
Cluster IOPS (other)	netapp.ontap.cluster.iops_other.count	The cluster's number of I/O operations observed at the storage object (other)
Cluster IOPS (read)	netapp.ontap.cluster.iops_read.count	The cluster's number of I/O operations observed at the storage object (read)
Cluster IOPS (total)	netapp.ontap.cluster.iops_total.count	The cluster's number of I/O operations observed at the storage object (total)
Cluster IOPS (write)	netapp.ontap.cluster.iops_write.count	The cluster's number of I/O operations observed at the storage object (write)
Cluster throughput (other)	netapp.ontap.cluster.throughput_other.count	The cluster's rate of throughput bytes observed at the storage object (other)
Cluster throughput (read)	netapp.ontap.cluster.throughput_read.count	The cluster's rate of throughput bytes observed at the storage object (read)
Cluster throughput (total)	netapp.ontap.cluster.throughput_total.count	The cluster's rate of throughput bytes observed at the storage object (total)
Cluster throughput (write)	netapp.ontap.cluster.throughput_write.count	The cluster's rate of throughput bytes observed at the storage object (write)
Cluster latency (other)	netapp.ontap.cluster.latency_other.count	The cluster's raw latency in microseconds observed at the storage object (other)
Cluster latency (read)	netapp.ontap.cluster.latency_read.count	The cluster's raw latency in microseconds observed at the storage object (read)
Cluster latency (total)	netapp.ontap.cluster.latency_total.count	The cluster's raw latency in microseconds observed at the storage object (total)
Cluster latency (write)	netapp.ontap.cluster.latency_write.count	The cluster's raw latency in microseconds observed at the storage object (write)
Cluster block storage size	netapp.ontap.cluster.block_storage_size	The size of the cluster's block storage
Cluster block storage used	netapp.ontap.cluster.block_storage_used	Amount of block storage on the cluster in use
Cluster block storage used percentage	netapp.ontap.cluster.block_storage_used_percentage	The percentage of the cluster's block storage that is currently in use

Explore in Dynatrace Hub

Collect NetApp OnTap metrics via OnTap 9.6+ API to monitor your clusters.