Host anomaly detection

Dynatrace automatically detects infrastructure-related performance anomalies such as high CPU saturation and memory outages. This page describes how to configure host anomaly detection, including problem and event thresholds, that will be useful if:

  • You have a system with low resource consumption and you want to be alerted immediately in case your system exceeds the thresholds.
  • You have a system with high resource consumption and you don't want to be constantly alerted.

Requirements

  • OneAgent version 1.253+

Access anomaly detection

In Dynatrace, go to Anomaly detection for the level you are configuring.

  1. Go to Hosts or Hosts Classic (latest Dynatrace).
  2. Find and select your host to display the host overview page.
  3. In the upper-right corner of the host overview page, select More () > Settings.
  1. In the host settings, select Anomaly detection > Infrastructure.
  1. Go to Deployment Status and then select OneAgents.
  2. On the OneAgent deployment page, turn off Show new OneAgent deployments.
  3. Filter the table by Host group and select the host group you want to configure.

    The Host group property is not displayed when the selected host doesn't belong to any host group.

    This displays the OneAgent deployment page filtered by the selected host group. Each listed host has a Host group: <group name> link, where <group name> is the name of the host group that you want to configure.
  4. Select the host group name in any row.
    As you have filtered by host group, all displayed hosts go to the same host group.
  1. In the host group settings, select Anomaly detection > Infrastructure.

Go to Settings > Anomaly detection > Infrastructure > Hosts.

Set problem-creation thresholds

Use these settings to configure detection sensitivity and enable alerting for your hosts.

Hosts

  • Detect host or monitoring connection lost problems—Turn this on to detect host or monitoring connection lost problems.

    • Graceful host shutdowns—Whether to alert on graceful host shutdowns.
  • Detect CPU saturation on host—Turn this on to detect CPU saturation on the host.

    • Detection mode for CPU saturation—Whether to detect host CPU saturation based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      • Alert if the CPU usage is higher than this threshold for the defined amount of samples—The alerting threshold percentage.
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • OneAgent version 1.261+ Detect High System Load on host—Turn this on to detect high system load on the host. This alert is only available for AIX hosts.

    • Detection mode for High System Load—Whether to detect host high system load based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      • Alert if the System Load divided by the number of logical CPU cores is higher than this threshold for the defined amount of samples
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect high memory usage on host—Turn this on to detect high memory usage on the host.

    • Detection mode for high memory usage—Whether to detect host high memory usage based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Configure the detection mode for high memory usage and alerting:

      • Alert if the memory usage on Windows is higher than this thresholds
      • Alert if the memory usage on Unix systems is higher than this thresholds
      • Alert if the memory page fault rate on Windows is higher than this threshold for the defined amount of samples
      • Alert if the memory page fault rate on Unix systems is higher than this threshold for the defined amount of samples
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect high GC activity—Turn this on to detect high garbage collector activity.

    • Detection mode for high GC activity—Whether to detect high garbage collector activity based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Configure the detection mode for garbage collector activity and alerting:

      • Alert if GC time is higher than this threshold
      • Alert if the GC suspension is higher than this threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect Java out of memory problem—Turn this on to detect Java out-of-memory exceptions.

    • Detection mode for Java out of memory problem—Whether to detect Java out-of-memory exceptions based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Configure the detection mode for Java out-of-memory problems and alerting:

      • Alert if GC time is higher than this threshold
      • Alert if the GC suspension is higher than this threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect Java out of threads problem—Turn this on to detect Java out-of-threads exceptions.

    • Detection mode for Java out of threads problem—Whether to detect Java out-of-threads exceptions based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      • Alert if the number of Java out-of-threads exceptions is at least this value
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.

Networks

  • Detect high number of dropped packets—Turn this on to detect a high number of dropped packets.

    • Detection mode for high number of dropped packets—Whether to detect a high number of dropped packets based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Configure the detection mode for a high number of dropped packets and alerting:

      Alert if the dropped packet percentage with total packets rate is higher than the defined thresholds.

      • Receive/transmit dropped packet percentage threshold
      • Total packets rate threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect high number of network errors—Turn this on to detect a high number of network errors.

    • Detection mode for high number of network errors—Whether to detect a high number of network errors based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Alert if the error packet percentage with total packets rate is higher than the defined thresholds.

      • Receive/transmit error packet percentage threshold
      • Total packets rate threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect high network utilization—Turn this on to detect high network utilization.

    • Detection mode for high network utilization—Whether to detect high network utilization based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      • Alert if sent/received traffic utilization is higher than this threshold for the defined amount of samples
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect TCP connectivity problems for process—Turn this on to detect TCP connectivity problems.

    • Detection mode for TCP connectivity problems—Whether to detect TCP connectivity problems based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Alert if the new connection failure percentage and the number of failed connections are higher than the defined thresholds.

      • New connection failure threshold
      • Number of failed connections threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.
  • Detect high retransmission rate—Turn this on to detect a high retransmission rate. The setting is turned off by default.

    • Detection mode for high retransmission rate—Whether to detect a high retransmission rate based on automatic or custom settings.

      If you select Based on custom settings, additional related settings are displayed:

      Alert if the retransmission rate is higher than the specified threshold and the number of retransmitted packets is higher than the defined threshold for the defined amount of samples

      • Retransmission rate threshold
      • Number of retransmitted packets threshold
      • Alerting event thresholds and Dealerting event thresholds—For details, see Set event thresholds below.

Set event thresholds

You can set the alerting/dealerting event thresholds for the anomaly detection settings.

Alerting event thresholds

  • Violating samples—The number of violating 10-second samples that raise an alert. The value must be higher than the number of samples.
  • Evaluation window size for violating samples—The number of 10-second samples that form the sliding evaluation window to detect violating samples.
  • 1-minute window

    • Violating samples: 3
    • Evaluation window size for violating samples: 6
  • 5-minute window

    • Violating samples: 15
    • Evaluation window size for violating samples: 30
  • 10-minute window

    • Violating samples: 30
    • Evaluation window size for violating samples: 60

Dealerting event thresholds

  • Dealerting samples—The number of non-violating 10-second samples that deactivate the alert. The value must be lower than the number of samples.
  • Evaluation window size for dealerting samples—The number of 10-second samples that form the sliding evaluation window to detect deactivated samples.

The event thresholds are not available for the Detect host or monitoring connection lost problems and Detect high retransmission rate settings.

  • 1-minute window

    • Dealerting samples: 3
    • Evaluation window size for dealerting samples: 6
  • 5-minute window

    • Dealerting samples: 15
    • Evaluation window size for dealerting samples: 30
  • 10-minute window

    • Dealerting samples: 30
    • Evaluation window size for dealerting samples: 60