Dynatrace uses specific formulas to calculate metric values and generate problems based on these metrics.
With the new Synthetic on Grail, availability metrics are calculated by dividing the number of successful executions ("up") by the total number of executions
Determine the number of up executions within the timeframe.
This is the total number of executions within the timeframe, minus the number of failed ("down") executions within the timeframe.
Determine availability.
Divide the number of up executions by the total number of executions, and then multiply by 100 to get a percentage.
Example
Suppose we have 5 down executions within a 35-minute timeframe.
35
5
35-5 = 30
(30/35) \* 100 = 0.8571 \* 100
= 85.71 percent
You can set up a maintenance window so test executions are not suppressed during maintenance periods. Then, if outages happen during the maintenance period, down executions are included in the metrics calculations.
In the example below, the availability is less than 100% because the down executions are included in the metric calculation by default.
timeseries avg(dt.synthetic.http.availability), by:{dt.entity.http_check, dt.maintenance_window_ids, interpolated}| filter dt.entity.http_check == "HTTP_CHECK-2F280898D4FCB1A8"
To exclude the maintenance period executions from the metric calculation
dt.maintenance_window_ids
dimension to the query.isNull(dt.maintenance_window_ids)
.In the example below, we have down executions detected during the maintenance period, but they were not included into the calculation metric due to we used the dt.maintenance_window_ids
dimension, and the isNull(dt.maintenance_window_ids)
filter condition.
timeseries av = avg(dt.synthetic.http.availability), by: {dt.entity.http_check, dt.maintenance_window_ids}| filter dt.entity.http_check == "HTTP_CHECK-2F280898D4FCB1A8"| filter isNull(dt.maintenance_window_ids)| fields avgAV=arrayAvg(av)
The approach described above requires all executions to happen at the same rate. However, in a real-world environment this is not always the case as the monitor execution frequency may change or additional on-demand executions may be triggered.
To make the calculation more accurate, the interpolation mechanism is introduced: the total monitoring time is divided into minute-level data points. All data points count as executions, although they don't necessarily coincide with actual executions.
On the screen below, the blue data points coincide with actual executions, and the white data points don't.
Let's imagine a monitor is set to execute tests every five minutes. So, every fifth data point (blue) coincides with an actual execution. All data points following the first down execution and preceding the first up execution count as "down". Thus, when calculating the availability metrics, the number of down executions is calculated as follows:
Down executions = the first actual down execution + all following "down" data points.
Availability calculation for a synthetic monitor isn't based on the number of successful executions but on the length of time (duration) that a monitor is considered to be UP
. Dynatrace stores timestamps of state changes—UP
, DOWN
, and UNMONITORED
.
The timespan covered by successive successful monitor executions is considered to be uptime (UP
). The time between the last successful execution and the first failed execution is also considered to be uptime (UP
). Likewise, the timespan covered by successive failed executions is considered to be downtime (DOWN
). The time between the last failed execution and the first successful execution is considered to be downtime (DOWN
). This is illustrated in the image below.
The time that a monitor spends in the UNMONITORED
state isn't considered in availability calculations.
UP
.DOWN
.Uptime / (Uptime + Downtime) × 100
.sum of availability % per location / number of locations
.Availability is stored as a percentage with two decimal places.
Maintenance windows can be excluded from availability calculation for synthetic monitors, displayed, for example, in Synthetic Classic, synthetic monitor details pages, and reports. A global setting enables you to always exclude maintenance windows from availability calculations—go to Settings > Web and mobile monitoring > Synthetic availability to access it.
Outages that occur within such excluded maintenance windows are shown, for example, in graphs and data points on the Multidimensional analysis page. Any failing resources are highlighted in waterfall graphs. However, failed executions are not included in availability calculations for the maintenance periods.
This setting also applies to retroactive maintenance windows. That is, you can exclude a retroactive maintenance window from synthetic availability calculations for the same period. Note, however, that maintenance windows are not retroactively excluded from any reports that were generated before the maintenance windows were created.
Data Explorer charts and the Metrics API provide availability metrics with the option of including or excluding maintenance windows.
Retry on error for single-URL browser monitors and browser clickpaths is configurable via monitor settings and is enabled by default. Discarded executions are ignored in availability calculations.
Total duration is calculated as a summation of the User action duration of the load and XHR actions in a monitor. Other key performance metric values are averages, calculated separately for load actions and XHR actions.
Dynatrace generates a performance problem if a monitor at a given location violates any of the defined performance thresholds in 3 of the 5 most recent executions, unless there is an open maintenance window for the monitor. That is, the violations must occur at the same location. Multiple locations can have such violations and be included in a problem.
A problem is not created, for example, if your monitor runs from 3 locations, and each location has 1 violation.
Many locations, each with 3 violations in the 5 most recent executions, can be part of the same problem if the violations occur around the same time. If the violations are further apart in time, separate problems are generated for each location.
The problem is closed if the performance thresholds are not violated in the 5 most recent executions at each of the previously affected locations.
Performance problem resolution occurs when a monitor is enabled/active. If a monitor is disabled/inactive, open performance problems are closed, or time out, after 10 minutes.
Performance thresholds for browser monitors are defined as the Total duration of the monitor or of individual events, which, in turn, can comprise multiple load or XHR actions. Total duration is the sum of the User action durations of the constituent actions. Where an event has just one action, Total duration is the same as the User action duration.
Note that Total duration is not available as a metric for individual load or XHR actions when viewing browser monitor Multidimensional analysis or a waterfall graph.
Performance thresholds for HTTP monitors are defined as the Response time of the monitor or of individual requests.
Synthetic monitors can experience global outage or local outage availability problems. A global outage occurs when all locations experience 1–5 consecutive failures simultaneously. A local outage occurs when the specified number of locations experiences the specified number of consecutive failures, for example, when 3 of 4 total locations experience two consecutive failures.
An outage problem is resolved when there are as many consecutive successful executions as the configured number of failed executions for generating the problem. The successful executions must occur on the number of locations that = the total number of locations–the number of locations required for the problem+1.
For example, for your monitor running on 4 locations configured to generate a problem when 3 locations have 2 consecutive failures, an outage problem is resolved when there are 2 consecutive successful executions on 2 (=4–3+1) locations.
Note that when a global outage problem is resolved, you might still have one or more locations experiencing monitor failure. Set up local outage rules to be alerted on these.
Outage problem resolution occurs when a monitor is enabled/active. If a monitor is disabled/inactive, open outage problems are closed, or time out, if there are no new executions for 2x the monitor frequency. For example, if a monitor scheduled to run every hour and has an open outage problem when it is disabled at 7:00 AM, the problem times out at 9:00 AM (2 × 1 hour).