Host anomaly detection

How-to guide
4-min read

Dynatrace automatically detects infrastructure-related performance anomalies such as high CPU saturation and memory outages.

You can configure host anomaly detection, including problem and event thresholds, to meet your specific needs. For example, if you have a system with:

Low resource consumption and you want to be alerted immediately in case your system exceeds the thresholds.
High resource consumption and you don't want to be constantly alerted.

Access anomaly detection

In Dynatrace, you can configure anomaly detection at multiple levels—environment, host, host group, or for specific disks.

Go to Hosts (previous Dynatrace) or Hosts Classic.
Find and select your host to display the host overview page.
In the upper-right corner of the host overview page, select More (…) > Settings.

In the host settings, select Anomaly detection > Infrastructure.

Go to Deployment Status and then select OneAgents.
On the OneAgent deployment page, turn off Show new OneAgent deployments.
Filter the table by Host group and select the host group you want to configure.
The Host group property is not displayed when the selected host doesn't belong to any host group.
This displays the OneAgent deployment page filtered by the selected host group. Each listed host has a Host group: <group name> link, where <group name> is the name of the host group that you want to configure.
Select the host group name in any row.
As you have filtered by host group, all displayed hosts go to the same host group.

In the host group settings, select Anomaly detection > Infrastructure.

Go to Settings > Anomaly detection > Infrastructure > Hosts.

Set problem-creation thresholds

You can configure the detection sensitivity and enable alerting for your hosts and networks.

Hosts

Go to Settings > Anomaly detection > Infrastructure > Hosts.
Turn on or off the available options for each setting on the page or select Use defaults in the upper-right corner of the page.
- If you choose Automatic from the drop-down menu, the built-in settings will be used.
- If you choose Based on custom settings, configure Alerting event thresholds and Dealerting event thresholds. For details, refer to Set event thresholds.
Select Save changes.

Networks

Go to Settings > Anomaly detection > Infrastructure > Networks.
Turn on or off the available options for each setting on the page or select Use defaults in the upper-right corner of the page.
- If you choose Automatic from the drop-down menu, the built-in settings will be used.
- If you choose Based on custom settings, configure Alerting event thresholds and Dealerting event thresholds. For details, refer to Set event thresholds.
Select Save changes.

Set event thresholds

You can set the alerting/dealerting event thresholds for the anomaly detection settings.

Alerting event thresholds

Violating samples—The number of violating 10-second samples that raise an alert. The value must be higher than the number of samples.
Evaluation window size for violating samples—The number of 10-second samples that form the sliding evaluation window to detect violating samples.

Examples

1-minute window
- Violating samples: 3
- Evaluation window size for violating samples: 6
5-minute window
- Violating samples: 15
- Evaluation window size for violating samples: 30
10-minute window
- Violating samples: 30
- Evaluation window size for violating samples: 60

Dealerting event thresholds

Dealerting samples—The number of non-violating 10-second samples that deactivate the alert. The value must be lower than the number of samples.
Evaluation window size for dealerting samples—The number of 10-second samples that form the sliding evaluation window to detect deactivated samples.

The event thresholds are not available for the Detect host or monitoring connection lost problems and Detect high retransmission rate settings.

Examples

1-minute window
- Dealerting samples: 3
- Evaluation window size for dealerting samples: 6
5-minute window
- Dealerting samples: 15
- Evaluation window size for dealerting samples: 30
10-minute window
- Dealerting samples: 30
- Evaluation window size for dealerting samples: 60

Set thresholds for specific disks

Server-side disk alerting for new tenants

Starting with SaaS version 1.308, server-side disk alerting is disabled for new tenants by default. We recommend using Disk Edge alerting instead. Disk Edge alerting allows you to create more complex and specific rules using:

Metrics to alert on (available disk space, is read-only file system, read time, write time, and available inodes)
Operating system to which the policy should be applied
Disk name filters
Host custom metadata conditions
Custom-defined properties attached to the triggered event

Keep in mind that Disk Edge alerting requires OneAgent version 1.293+.

Davis automatically detects disk anomalies such as low available disk space or slow disks. There are different kinds of disks on a host, such as a boot disk, a disk holding all the logs, or a disk for storing business data. While alerting on low disk space would not make any sense for a fixed-sized boot disk image, it makes perfect sense for a disk containing critical business data.

With custom disk detection rules, you can provide fine-tuned rules for individual groups (groups are based on disk name patterns and/or host tags) of disks. Disk-level thresholds override global thresholds for matching disks, while global settings still apply to other disks.

To change threshold settings for a group of disks

Go to Settings > Anomaly detection.
In the Infrastructure section, select Custom disk-detection rules.
Select Add item.
Select the metric to be monitored and provide a meaningful name for the rule.
Specify the threshold for the metric and the number of samples that must violate the threshold to trigger an alert.
Optional Specify the name pattern of the disk.

The individual rules aren't logically bound and are applied separately. For example, if one rule matches all disks not containing A and another matches all disks not containing B, then every disk will be matched by either the first, the second, or both rules simultaneously. Note that it is also not possible to add multiple values, wildcards, or regular expressions within a single rule filter.
Optional To further narrow down the disk usage, list the tags that the host must have.
Select Save changes.

Disk Edge alerting

OneAgent version 1.293+

Use Disk Edge to set up alerts for automatic detection of performance anomalies related to disk infrastructure.

Disk Edge provides automatic detection of performance anomalies related to disk infrastructure. Use these settings to tailor detection sensitivity to a specific disk's name and/or custom metadata. Defining custom properties can help with the post-processing of the event.

You can define policies on the host, host group, and environment levels.

Go to Settings > Anomaly detection > Infrastructure > Disk Edge.
Select Add policy.
Define the policy.
- Policy name: The name under which your policy will be listed.
- Operating system: Detection rules can be specified for selected OS systems (if none are selected, settings will not be matched).
- Alerts: One rule can have up to seven alerts, one for each of the event types. See the table below.
- Disk name filters: Rules can be filtered by disk name.
- Host custom metadata conditions: Rules can be filtered by host custom metadata.
- Properties: Properties can be added to the sent event using the supported placeholders:
  - disk.all_mountpoints
  - disk.device_name
  - disk.mountpoint
  - dt.entity_host
  - dt.host_group.id
  - host.name

Event

Severity

Related OneAgent metric

Available disk space (%) below

Smaller value is more severe

DiskStats object field: availPercentage
Mintv2 metric: dt.host.disk.free
Timeseries: builtin:host.disk.free

Available disk space (MiB) below

Smaller value is more severe

DiskStats object field: avail
Mintv2 metric: dt.host.disk.avail
Timeseries: builtin:host.disk.avail

Available inodes (%) below

Smaller value is more severe

DiskStats object field: availINodesPercentag
Mintv2 metric: dt.host.disk.inodes_avail
Timeseries: builtin:host.disk.inodesAvail

Available inodes (number) below

Smaller value is more severe

Calculated from Diskstats: totalINodes * availINodesPercentage
Mintv2 metric: dt.host.disk.inodes_avail * dt.host.disk.inodes_total
Timeseries: builtin:host.disk.inodesTotal * builtin:host.disk.inodesAvail

Is read only file system

N/A

Disk object field: readOnly
Mintv2 metric: N/A
Timeseries: N/A

Read time (ms) exceeding

Larger value is more severe

Disk object field: readTime
Mintv2 metric: dt.host.disk.read_time
Timeseries: builtin:host.disk.readTime

Write time (ms) exceeding

Larger value is more severe

Disk object field: writeTime
Mintv2 metric: dt.host.disk.write_time
Timeseries: builtin:host.disk.writeTime

Anomaly detection configuration hierarchy

You can configure anomaly detection rules and policies on multiple levels—host, host group, environment.

When you have multiple rules affecting the same entity, the most specific rule prevails over more generic rules.

Disk rules

Custom rules are evaluated from top to bottom, and the first matching rule applies, so be sure to place your rule in the correct position on the list.

Disk is assigned to the first policy it matches to (based on disk name and/or metadata) according to the policies hierarchy.

Disk Edge policies

For Disk Edge, the order of policies is also evaluated from the more specific scope (more priority) to a more generic one (less priority).

Adjust the sensitivity of anomaly detection for infrastructure

Host anomaly detection

Access anomaly detection

Set problem-creation thresholds

Hosts

Networks

Set event thresholds

Alerting event thresholds

Dealerting event thresholds

Set thresholds for specific disks

Disk Edge alerting

Anomaly detection configuration hierarchy

Disk rules

Disk Edge policies

Related topics