Host anomaly detection

Dynatrace automatically detects infrastructure-related performance anomalies such as high CPU saturation and memory outages.

You can configure host anomaly detection, including problem and event thresholds, to meet your specific needs. For example, if you have a system with:

  • Low resource consumption and you want to be alerted immediately in case your system exceeds the thresholds.
  • High resource consumption and you don't want to be constantly alerted.

Access anomaly detection

In Dynatrace, you can configure anomaly detection at multiple levels—environment, host, host group, or for specific disks.

Set problem-creation thresholds

You can configure the detection sensitivity and enable alerting for your hosts and networks.

Hosts

  1. Go to Settings > Anomaly detection > Infrastructure > Hosts.

  2. Turn on or off the available options for each setting on the page or select Use defaults in the upper-right corner of the page.

    • If you choose Automatic from the drop-down menu, the built-in settings will be used.

    • If you choose Based on custom settings, configure Alerting event thresholds and Dealerting event thresholds. For details, refer to Set event thresholds.

  3. Select Save changes.

Networks

  1. Go to Settings > Anomaly detection > Infrastructure > Networks.

  2. Turn on or off the available options for each setting on the page or select Use defaults in the upper-right corner of the page.

    • If you choose Automatic from the drop-down menu, the built-in settings will be used.

    • If you choose Based on custom settings, configure Alerting event thresholds and Dealerting event thresholds. For details, refer to Set event thresholds.

  3. Select Save changes.

Set event thresholds

You can set the alerting/dealerting event thresholds for the anomaly detection settings.

Alerting event thresholds

  • Violating samples—The number of violating 10-second samples that raise an alert. The value must be higher than the number of samples.
  • Evaluation window size for violating samples—The number of 10-second samples that form the sliding evaluation window to detect violating samples.
  • 1-minute window

    • Violating samples: 3
    • Evaluation window size for violating samples: 6
  • 5-minute window

    • Violating samples: 15
    • Evaluation window size for violating samples: 30
  • 10-minute window

    • Violating samples: 30
    • Evaluation window size for violating samples: 60

Dealerting event thresholds

  • Dealerting samples—The number of non-violating 10-second samples that deactivate the alert. The value must be lower than the number of samples.
  • Evaluation window size for dealerting samples—The number of 10-second samples that form the sliding evaluation window to detect deactivated samples.

The event thresholds are not available for the Detect host or monitoring connection lost problems and Detect high retransmission rate settings.

  • 1-minute window

    • Dealerting samples: 3
    • Evaluation window size for dealerting samples: 6
  • 5-minute window

    • Dealerting samples: 15
    • Evaluation window size for dealerting samples: 30
  • 10-minute window

    • Dealerting samples: 30
    • Evaluation window size for dealerting samples: 60

Set thresholds for specific disks

Davis automatically detects disk anomalies such as low available disk space or slow disks. There are different kinds of disks on a host, such as a boot disk, a disk holding all the logs, or a disk for storing business data. While alerting on low disk space would not make any sense for a fixed-sized boot disk image, it makes perfect sense for a disk containing critical business data.

With custom disk detection rules, you can provide fine-tuned rules for individual groups (groups are based on disk name patterns and/or host tags) of disks. Disk-level thresholds override global thresholds for matching disks, while global settings still apply to other disks.

To change threshold settings for a group of disks

  1. Go to Settings > Anomaly detection.

  2. Go to Infrastructure > Custom disk-detection rules section and select Add item.

  3. Select the metric to be monitored and provide a meaningful name for the rule.

  4. Specify the threshold for the metric and the number of samples that must violate the threshold to trigger an alert.

  5. optional Specify the name pattern of the disk.

    The individual rules aren't logically bound and are applied separately. For example, if one rule matches all disks not containing A and another matches all disks not containing B, then every disk will be matched by either the first, the second, or both rules simultaneously. Note that it is also not possible to add multiple values, wildcards, or regular expressions within a single rule filter.

  6. optional To further narrow down the disk usage, list the tags that the host must have.

  7. Select Save changes.

Disk Edge alerting

OneAgent version 1.293+

Use Disk Edge to set up alerts for automatic detection of performance anomalies related to disk infrastructure.

Disk Edge provides automatic detection of performance anomalies related to disk infrastructure. Use these settings to tailor detection sensitivity to a specific disk's name and/or custom metadata. Defining custom properties can help with the post-processing of the event.

You can define policies on the host, host group, and environment levels.

  1. Go to Settings > Anomaly detection > Infrastructure > Disk Edge.
  2. Select Add policy.
  3. Define the policy.
    • Policy name: The name under which your policy will be listed.
    • Operating system: Detection rules can be specified for selected OS systems (if none are selected, settings will not be matched).
    • Alerts: One rule can have up to seven alerts, one for each of the event types. See the table below.
    • Disk name filters: Rules can be filtered by disk name.
    • Host custom metadata conditions: Rules can be filtered by host custom metadata.
    • Properties: Properties can be added to the sent event using the supported placeholders:
      • disk.all_mountpoints
      • disk.device_name
      • disk.mountpoint
      • dt.entity_host
      • dt.host_group.id
      • host.name

Event

Severity

Related OneAgent metric

Available disk space (%) below

Smaller value is more severe

DiskStats object field: availPercentage
Mintv2 metric: dt.host.disk.free
Timeseries: builtin:host.disk.free

Available disk space (MiB) below

Smaller value is more severe

DiskStats object field: avail
Mintv2 metric: dt.host.disk.avail
Timeseries: builtin:host.disk.avail

Available inodes (%) below

Smaller value is more severe

DiskStats object field: availINodesPercentag
Mintv2 metric: dt.host.disk.inodes_avail
Timeseries: builtin:host.disk.inodesAvail

Available inodes (number) below

Smaller value is more severe

Calculated from Diskstats: totalINodes * availINodesPercentage
Mintv2 metric: dt.host.disk.inodes_avail * dt.host.disk.inodes_total
Timeseries: builtin:host.disk.inodesTotal * builtin:host.disk.inodesAvail

Is read only file system

N/A

Disk object field: readOnly
Mintv2 metric: N/A
Timeseries: N/A

Read time (ms) exceeding

Larger value is more severe

Disk object field: readTime
Mintv2 metric: dt.host.disk.read_time
Timeseries: builtin:host.disk.readTime

Write time (ms) exceeding

Larger value is more severe

Disk object field: writeTime
Mintv2 metric: dt.host.disk.write_time
Timeseries: builtin:host.disk.writeTime

Anomaly detection configuration hierarchy

You can configure anomaly detection rules and policies on multiple levels—host, host group, environment.

When you have multiple rules affecting the same entity, the most specific rule prevails over more generic rules.

Disk rules

Custom rules are evaluated from top to bottom, and the first matching rule applies, so be sure to place your rule in the correct position on the list.

Disk is assigned to the first policy it matches to (based on disk name and/or metadata) according to the policies hierarchy.

Disk Edge policies

For Disk Edge, the order of policies is also evaluated from the more specific scope (more priority) to a more generic one (less priority).