Configure and monitor service-level objectives with Dynatrace
SLO overview
The list of defined service-level objectives (SLOs) within a Dynatrace monitoring environment on the Service-level objectives page shows important information such as current status, error budget and burn rate, target, warning, the number of open problems out of the total number of problems for the SLO entity selector, and the timeframe during which the SLO is to be evaluated.
Analyze problems
If there are any open problems associated with an SLO, the value in the Open/total problems column for the SLO is marked with a red warning symbol. Select the value to display the Problems page filtered by the respective entity selector. For more information on how to analyze problems, see Davis® AI.
SLO details
Expand the Details of an SLO for more information, such as:
-
The metric and entity selectors of the SLO
-
A graph representing the SLO evaluation over time
-
A table view of the latest 10 evaluated SLOs belonging to a certain entity type. Switch to the table view to find out, for example, the exact value that is negatively affecting the result of the aggregated SLO evaluation, and the entity associated with it. Additionally, you can:
Sort the table view by status in ascending or descending order
Select any of the entities for further details on the respective entity page
By default, each SLO is evaluated according to its defined timeframe, but for what-if analyses with different timeframes, or for a retrospective view, you can temporarily switch to the global timeframe.
Configure a service-level objective
To configure a new service-level objective, use the SLO wizard to select from a set of Dynatrace preconfigured templates for common use cases. Alternatively, you can create your own SLO definitions.
Add an SLO using the wizard
In the Dynatrace menu, go to Service-level objectives, select Add new SLO, and step through the SLO wizard as described below.
Select your indicators
-
Select the desired SLO:
-
Service-level availability SLO, where service-level availability is measured by dividing the number of successful service calls by the total number of service calls.
-
Service-method availability SLO, where the service-method availability is measured by dividing the number of successful key request service calls by the total number of key request service calls.
-
Service performance SLO, where the service performance ratio is measured by dividing the number of
good
minutes andtotal
minutes, via a metric expression.good
minutes count the number of minutes during which the response latency is below the defined threshold. -
User experience SLO, which is based on the Apdex measurement, representing the percentage of users who are SATISFIED out of the total number of users who are using a web or mobile application.
-
Mobile crash-free users SLO, which measures the percentage of crash-free users within your mobile applications.
-
Synthetic availability SLO, which represents the percentage of successful synthetic monitor executions related to the total number of executions.
For more information on the use cases, see Configuration examples of service-level objective definitions.
-
-
Enter an SLO name.
-
Enter a Metric name, which will be used to create two metric queries:
One custom query for the SLO status.
Another one for the SLO error budget burn rate.
You can chart these metric keys on all pages that allow using metrics, such as the Data explorer.
After creating the SLO:
The metric keys cannot be changed.
You can view the metric keys in the SLO details.
-
optional Configure the SLI metrics you want to add to your SLO.
Define a filter
In the timeframe selector, scroll down to select a timeframe value for your SLO.
- To choose one of the existing values, select Presets.
- To create your own timeframe value, select Custom.
The entity selector is consistent with the Dynatrace REST API query syntax. You can use filters on management zone ID/name, tags, entity name/ID/type, health state, or a combination of these. For management zones, you can choose from the list of accessible management zones.
After entering the entity selectors you want, you can test whether there aren't any mistakes by selecting Preview next to the entity selector bar.
Add success criteria
Set the target percentage (Failure) and the warning percentage (Warning).
The warning percentage has to be between 100% and your SLO target percentage in order to be effective. For example, if your SLO target percentage is 99.00%, you need to set your warning percentage between 99.00% and 100% to get an early warning (as indicated by a yellow state).
To see how fast a service consumes an error budget, relative to the SLO, make sure Error budget burn rate is turned on, and set the threshold values for the slow-burn and fast burn rates.
Evaluate
After entering the success criteria values, select Evaluate to evaluate the SLO based on the entered values.
If everything is correct and you get no errors, you can select Create to save your configuration and add your new SLO.
After you complete the setup, the newly created service-level objective appears on the SLOs page.
Create your own SLO
To set up your own service-level objective, go to Settings, select Cloud Automation > Definition, and select Add new SLO.
Edit SLOs
To edit an SLO, in the Dynatrace menu, go to Service-level objectives, find your SLO, and select More (…) > SLO definition in the Actions column.
Normalize error budget
To see a normalized error budget for all SLOs, go to Settings > Cloud Automation > Setup, and enable Normalize error budget.
Take, for example, an SLO target of 95% with a current SLO status of 96%. If the normalization is turned on, the remaining error budget left is (status − target) ÷ (100 − target) × 100, for example, (96% − 95%) ÷ (100 − 95%) × 100.
Error budget burn rate
The error budget burn rate shows how fast a service consumes an error budget, relative to the SLO. For example,
- A burn rate of
1
indicates that the service consumed 100% of the error budget during your SLO timeframe. - A burn rate of
2
indicates that the service consumed double the error budget during your SLO timeframe.
Burn rate is calculated either for the past hour (if you select the SLO timeframe) or for the global timeframe value (if no SLO timeframe is selected).
Set up an error budget burn rate
For an indication of how fast a service consumes an error budget, you can enable the burn rate visualization either from the wizard or settings page while creating the SLO.
At any time, you can change the threshold value or disable the burn rate visualization from the SLO definition of your SLO.
Visualize the error budget burn rate
After you set up the error budget burn rate, there are several places in your environment where you can view it:
-
On the SLO overview page, in the error budget column:
- A slow-burn yellow icon is displayed when the burn rate value is between
1
and the fast-burn threshold you entered while creating the SLO. A fast-burn red icon is displayed when the burn rate is greater than or equal to the fast-burn threshold you entered while creating the SLO.
If burn rate visualization is enabled, but no icon is displayed, the burn rate is below
1
. - A slow-burn yellow icon is displayed when the burn rate value is between
-
In the details of an SLO.
-
In the Data explorer.
-
On your dashboard, if you pin your SLO to your dashboard.
Set up alerts
You can set up two types of alerts:
SLO alerts are sent when an SLO status goes below the target value.
Burn rate alerts are sent when the error budget of an SLO decreases at a specific rate.
Alerts can only be created based on metric events within the last hour. If you set the threshold to 10
for a burn rate alert, an alert will be generated when the burn rate exceeds 10
during the last hour.
To set up an SLO alert
- Go to Service-level objectives, find your SLO, and select More (…) > Create alert.
- For Select alert type, select
Status
. Name your alert and set a threshold value. If you don't set a value, the threshold is populated with the existing SLO target value.
- Select Create alert.
To set up a burn rate alert
- Go to Service-level objectives, find your SLO, and select More (…) > Create alert.
- For Select alert type, select
Burn rate
. Name your alert and set the burn rate threshold.
- Select Create alert.
Your newly created SLO or burn rate alert will appear on the Metric events page, where you can configure it further. For details, see Metric events.
Add SLOs to management zones
SLOs that don't belong to any management zone are visible to all users. If you add an SLO to a management zone, only users who have access to that management zone can see it on the Service-level objectives overview page.
To add an SLO to a management zone
- In the Dynatrace menu, go to Settings.
- Select Cloud Automation > Definition.
- Select Add new SLO.
- In the Entity selector field, add the management zone name or ID.
- After entering all SLO details, select Save changes to save your configuration.
- To add an existing SLO to a management zone, see Edit SLOs.
To view SLOs belonging to a specific management zone, select the management zone using the filter button in the menu bar.
- To view the global SLOs regardless of any other selected management zone filter, turn on Show global SLOs. Global SLOs are SLOs that are visible to all users, regardless of their management zone permissions.
For more information on how you can control access to the SLOs in your environment by setting permissions, see View and edit SLOs based on permission levels.
Pin SLOs to your dashboard
After you define your objectives, you can add the SLOs to your dashboard to visualize their current status along with the remaining error budgets.
- In the Dynatrace menu, go to Service-level objectives, find your SLO, and select More (…) > Pin to your dashboard in the Actions column.
- From the list, select an existing dashboard or Create new dashboard, and then select Pin.
- Select Open dashboard to open the dashboard in edit mode with the SLO tile selected.
Adjust the tile configuration as needed.
- Select Done.
By default, the SLO tile evaluates the SLO timeframe instead of the selected global timeframe, which is shown by the small filter icon in the upper-right corner of the SLO tile. To compare between the global timeframe and the SLO timeframe, you can also override the timeframe used in the tile configuration.
For details, see View and add SLO dashboard tiles based on permission levels.
Visualize SLO status by color
After pinning an SLO to your dashboard, the SLO status is indicated in the tile by a combination of text and text color.
To optionally enable SLO background colorization to reflect the SLO status
- In the Dynatrace menu, go to Dashboards and open the dashboard.
- Select Edit.
Select the SLO tile that you want to colorize.
- In the Service-level objective pane on the right, turn on Colorize based on status.
- Select Done.
The background color of the SLO tile (instead of the text) then changes automatically to reflect the SLO status:
Tile color | Status |
---|---|
Green | Good |
Yellow | Warning |
Red | Bad |
Clone an SLO
Cloning an SLO allows you to create a new SLO reusing the configuration of an existing SLO.
To clone an SLO
- In the Dynatrace menu, go to Service-level objectives.
- Select the SLO that you want to clone, and then select Action > Clone.
The Add new SLO page is prefilled with the cloned SLO's settings. - Adjust the new SLO's settings where necessary, and then select Create.
Show metrics in Data explorer
To query and chart metrics, go to the service-level objective you want and select Actions > View in Data explorer. For more information on how to use the Data explorer, see Data explorer.
Limitations
The Data explorer shows metric keys; it doesn't show transformations or filters.
Davis alerting
Dynatrace Davis® provides quick notifications on anomalies detected, along with actionable root causes. If your SLO has turned red, this is most likely because Davis has already raised a problem for the underlying metrics, showing you the root cause.
Davis doesn't provide alerts for SLO target breaches, but for underlying metrics and SLO entities.
Troubleshoot
SLO calculation depends on real-time metric queries, so the filter used on an SLO is crucial for calculation performance. If your SLO list is very slow, check the entity filters on your defined SLOs.
You're probably missing :splitBy()
in your metric expression.
You need to choose a timeframe that starts after the creation time of the metric.
Users without a global write
permission are not allowed to create an SLO without a management zone.