An anomaly detection configuration relies on several components:
Once configured and activated, the configuration observes the data and triggers and event when conditions are met. To ensure the configuration works as expected and alerts you about the right events, you can preview the results of its work:
Data source provides a time series that is evaluated by Davis:
If your data has a latency, you need to offset it in your configuration via the Query offset parameter. Specify the value in minutes.
Analyzer parameters define how Davis evaluates the data provided by the data source. The exact set of parameters depends on the type of the analysis:
Dynatrace provides you the ability to set an alert on missing data in a metric or a DQL query. If the alert is active, Dynatrace regularly checks whether the sliding window of the anomaly detection configuration contains any measurements. For example, if the sliding window is set to 3 minutes during any 5 minutes, Dynatrace triggers an alert if there's no data within a 3-minute period.
The missing data condition and threshold condition are combined by the OR logic.
We recommend disabling missing data alerts for sparse data streams, where measurements are not expected in regular intervals, as it will result in alert storms.
For expected late-incoming data (for example, cloud integration metrics with a 5-minute delay), use long sliding windows that cover delays. For a 5-minute delay, use a sliding window of at least 10 minutes.
The {missing_data_samples}
event description placeholder resolves to the number of minutes without data received.
The sliding window of an anomaly detection configuration defines how many one-minute samples must violate the threshold during a specific period. When the specified number of violations is reached, Dynatrace raises an event. The goal is to avoid overly aggressive alerting on single violations, when every measurement that violates the threshold triggers an event.
The event remains open until the metric stays within the threshold for a certain number of one-minute samples within the same sliding window, at which point Dynatrace closes the event. Keeping the event open helps to avoid over-alerting by adding new threshold violations to an existing problem instead of raising a new one.
You can find settings for the sliding window in the Advanced properties section of the configuration. By default:
You can set a sliding window of up to 60 minutes.
Let's consider a case of a static threshold of 90% CPU usage.
The event analysis starts with the first violating sample in the sliding window. Once the number of violating samples reaches the defined threshold, the event analysis stops and a problem is raised. Even though event analysis is stopped, the event itself remains open until the de-alerting criteria are met:
Both criteria must be met to close the event.
The default numbers (3 violating samples in the sliding window of 5 samples to trigger a problem, 5 de-alerting samples to close the event) are a good fit for most configurations. However, you might need to update them (for example, due to noise in measurements).
The event template defines characteristics of an event triggered by threshold violation. You need to provide at least the name and the type of the event.
High network activity
or CPU saturation
.{threshold}
or {alert_condition}
. Placeholders are replaced with real values in the actual event. To see available placeholders, type {
in the input field.You can provide additional parameters as key-value pairs. For a list of possible event properties, see Semantic dictionary.