Site Reliability Guardian
The Site Reliability Guardian is a Dynatrace app that automates change impact analysis to validate service availability, performance, and capacity objectives across various systems. It enables DevOps platform engineers to make the right release decisions and empowers SREs to apply Service-Level Objectives (SLOs) for their critical services.
Site Reliability Guardian concepts
Site Reliability Guardian is based on the following concepts:
A guardian is the grouping of objectives. It is built around a set of entities reflecting a service or application you want to safeguard.
Objectives are means for measuring the performance, availability, capacity, and security of your services. Objectives are measured by indicators. You can define an objective for your guardian that is validated on demand or automatically.
An indicator is a value against which the warning and failure thresholds are checked using a comparison operator. To retrieve an indicator value, use DQL or reference an existing SLO.
Warning and failure thresholds
The warning and failure thresholds determine whether the measured value of the indicator meets the objective, is close to violating the objective, or violates the objective.
Warning and failure are optional; objective validation can vary:
If both the warning and failure thresholds are set, the objective validation can return warning, failure, or pass.
If just the warning threshold is set, the objective validation can return warning or pass.
If no threshold is set, the objective validation does not return a status but is used for informational purposes.
The comparison operator defines whether the objective is met: the indicator is less than or equal to (
Lower than these numbers is good), or it is greater than or equal to (
Higher than these numbers is good), the warning and failure threshold.
To organize your guardians, you can assign tags to them. Tags use the
key:value format, with the value being optional.
To assign a tag to your guardian, either specify it in the Add tags to your guardian section during guardian creation or add the tag later in edit mode.
To filter the list of all guardians by a tag, type the tag in the Search by name or tag field—the page automatically updates to show only guardians with matching tags.
This DQL shows you the first
guardian.validation.objective business event with a specific guardian ID and parses the guardian tags field to extract a specific tag value from the event JSON.
1fetch bizevents |2filter event.type == "guardian.validation.objective" AND guardian.id == "vu9U3hXa3q0AAAABADFhcHA6ZHluYXRyYWNlLnNpdGUucmVsaWFiaWxpdHkuZ3VhcmRpYW46Z3VhcmRpYW5zAAZ0ZW5hbnQABnRlbmFudAAkMWNiZDVkYWYtZThhNi0zMDkxLWFkOGQtMmU5NDNmNWJmZWJmvu9U3hXa3q0" |3limit 1 |4parse guardian.tags, "JSON:parsed_guardian_tags"
This DQL shows you all
guardian.validation.finished business events from guardians tagged as
1fetch bizevents2| filter event.type == "guardian.validation.finished"3| expand guardian.tags4| filter contains(guardian.tags, "my-tagged-guardian")
Guardian workflow action
You can automate the execution of a guardian via Workflows, tying guardian execution to an event.
To add a guardian action to an existing workflow
- Open Workflows and open the required workflow.
Alternatively, select Automate on the guardian page. This option is available on the overview on the tile itself as well as in the validation details.
- In the Choose trigger panel, select the trigger best suited to your needs.
- On the trigger node, select to browse available actions.
Site Reliability Guardianin the Choose action panel.
- On the Input tab, select the required guardian from the Guardian list.
- Configure the validation timeframe. You have two options:
Select the required timeframe.
- Use the
event()expression to extract the timeframe from the triggering event.
You can create a new workflow by selecting Automate on the top right of the guardian page. When you create a workflow this way, the following parameters are configured, but be sure to adapt them as needed.
- If the guardian has tags defined, they are used for the event filter of the trigger.
Otherwise, it defaults to
tag.service == "carts" AND tag.stage == "production".
The first action of the workflow is the respective guardian.
The guardian action generates the following output and passes it to the subsequent actions of the workflow.
The ID of the validated guardian
The name of the validated guardian
An array of tags assigned to the validated guardian
The execution context property of the trigger, if it was set
The ID of all events generated by the validation
The URL with the full validation details
The status of the validation, indicating the overall result. The following values are possible:
The number of objectives for each status
To learn more about workflows for a guardian, open the help menu in the upper-right corner of a guardian and select Get started with Automation.
Validate guardian and its objectives
If a workflow is created, your guardian is validated automatically. You can also perform the validation manually.
The event subscriptions in the workflow define when the validation of a guardian has triggered automatically.
You can perform a validation of a guardian by selecting the Validate button on the overview screen or within the validation details screen.
Select the validation timeframe.
- Click the Validate button.
Individual objective result
For each objective, the validation returns the derived value and classification. The severity goes from the highest (1) to the lowest (5).
|The objective could not be validated due to an error deriving the indicator.
|The value violates the failure threshold; the objective is not met.
|The value is in the warning range; the objective is met, but close to failure.
|The value is within the target range, the objective is met.
|No classification, but the objective's value can be used for informational purposes.
Overall validation result
After the validation of each objective is done, the guarding uses the most severe of individual validations as the overall validation result. Examples of this result usage include:
Making a release decision in your delivery pipeline.
Reporting on the current status of your service.
Install, update, or uninstall
To install, update, or uninstall the Site Reliability Guardian, use the Dynatrace Hub. Go to Site Reliability Guardian and select either the Install, Update, or Uninstall action to perform this action in your environment.
If you uninstall the Site Reliability Guardian:
All guardians and their configurations are deleted.
Workflows that reference the workflow action of the Site Reliability Guardian persist, but future executions will fail due to the missing guardians.