Site Reliability Guardian

Latest Dynatrace

Create and use guardians
Use guardians at scale with Site Reliability Guardian as code
Upgrade from Cloud Automation

Permissions

The following table describes the required permissions.

Permission

Description

app-settings:objects:read

Read settings 2.0 for reliability guardian configuration

app-settings:objects:write

Write settings 2.0 for reliability guardian configuration

storage:buckets:read

Enables reading all system data stored in Grail

storage:logs:read

Read logs for reliability guardian validations

storage:metrics:read

Read metrics for reliability guardian validations

storage:bizevents:read

Read bizevents for reliability guardian validations

storage:events:write

Write bizevents for reliability guardian validations

storage:events:read

Read events for reliability guardian validations

storage:spans:read

Read spans from Grail

storage:entities:read

Read entities from Grail

Installation

Make sure the app is installed in your environment.

The Site Reliability Guardian is a Dynatrace app that automates change impact analysis to validate service availability, performance, and capacity objectives across various systems. It enables DevOps platform engineers to make the right release decisions and empowers SREs to apply Service-Level Objectives (SLOs) for their critical services.

The details of a Guardian in a failure state

Get started quickly by using predefined templates to guard your critical services.

1 of 3

Learning modules

Go through the following process to learn using Site Reliability Guardian:

01Create a Site Reliability Guardian

How-to guide
Create a guardian manually or from a predefined template.

02Add Site Reliability Guardian objective

How-to guide
Add a new Site Reliability Guardian objective.

03Guardian execution context

Concept
Filter Site Reliability Guardian validation events triggered by an external tool using the context information provided by the tool.

04Site Reliability Guardian as code

Concept
See configuration as code examples for a guardian and its workflow.

05Site Reliability Guardian Role Permissions

Reference
Configure role permissions to use the Site Reliability Guardian.

Site Reliability Guardian concepts

Site Reliability Guardian is based on the following concepts:

A guardian is the grouping of objectives. It is built around a set of entities reflecting a service or application you want to safeguard.

A guardian provides you with a default automation workflow that performs the objective validation. As a result, a guardian always represents the latest validation result derived from the objectives.

You can create a maximum of 1000 guardians.

Objectives are means for measuring the performance, availability, capacity, and security of your services. Objectives are measured by indicators. You can define an objective for your guardian that is validated on demand or automatically.

You can create a maximum of 50 objectives for each guardian.

An indicator is a value against which the warning and failure thresholds are checked using a comparison operator. To retrieve an indicator value, use DQL.

The static warning and failure thresholds determine whether the measured value of the indicator meets the objective, is close to violating the objective, or violates the objective.

Warning and failure are optional; objective validation can vary:

If both the warning and failure thresholds are set, the objective validation can return warning, failure, or pass.
If just the warning threshold is set, the objective validation can return warning or pass.
If no threshold is set, the objective validation does not return a status but is used for informational purposes.

Auto-adaptive thresholds are dynamic limits that adjust over time based on previous validations. If an objective changes its behavior, the threshold adapts automatically.

Auto-adaptive thresholds are only available for fetching data via DQL. The Davis AI threshold analyzer requires a minimum of 5 validations for auto-adaptive thresholds to take effect. During this learning phase, the objective validation returns an info state. All subsequent validations will then use the auto-adapted thresholds, impacting the overall validation.
Switching from static to auto-adaptive is supported.

The comparison operator defines whether the objective is met: the indicator is less than or equal to (A lower value is good for my result), or it is greater than or equal to (A higher value is good for my result), the warning and failure threshold.

To organize your guardians, you can assign tags to them. Tags use the key:value format, with the value being optional.

To assign a tag to your guardian, either specify it in the Add tags to your guardian section during guardian creation or add the tag later in edit mode.

To filter the list of all guardians by a tag, type the tag in the Search by name or tag field—the page automatically updates to show only guardians with matching tags.

This DQL shows you the first guardian.validation.objective business event with a specific guardian ID and parses the guardian tags field to extract a specific tag value from the event JSON.

fetch bizevents |
filter event.type == "guardian.validation.objective" AND guardian.id == "vu9U3hXa3q0AAAABADFhcHA6ZHluYXRyYWNlLnNpdGUucmVsaWFiaWxpdHkuZ3VhcmRpYW46Z3VhcmRpYW5zAAZ0ZW5hbnQABnRlbmFudAAkMWNiZDVkYWYtZThhNi0zMDkxLWFkOGQtMmU5NDNmNWJmZWJmvu9U3hXa3q0" |
limit 1 |
parse guardian.tags, "JSON:parsed_guardian_tags"

This DQL shows you all guardian.validation.finished business events from guardians tagged as tagkey:my-tagged-guardian.

fetch bizevents
| filter event.type == "guardian.validation.finished"
| expand guardian.tags
| filter contains(guardian.tags, "my-tagged-guardian")

You can automate the execution of a guardian via Workflows, tying guardian execution to an event.

To add a guardian action to an existing workflow

Go to Workflows and open the required workflow.
Alternatively, select Automate on the guardian page. This option is available on the overview on the tile itself as well as in the validation details.
In the Choose trigger panel, select the trigger best suited to your needs.
On the trigger node, select to browse available actions.
Find Site Reliability Guardian in the Choose action panel.
On the Input tab, you have two options to select the required guardian:
- Select the guardian from the list.
- Use an expression to extract the guardian from the triggering event or a previous workflow action.
Configure the validation timeframe. You have two options:
- Select the required timeframe.
- Use the event() expression to extract the timeframe from the triggering event.

You can create a new workflow by selecting Automate on the top right of the guardian page. When you create a workflow this way, the following parameters are configured, but be sure to adapt them as needed.

If the guardian has tags defined, they are used for the event filter of the trigger. Otherwise, it defaults to tag.service == "carts" AND tag.stage == "production".
The first action of the workflow is the respective guardian.

The guardian action generates the following output and passes it to the subsequent actions of the workflow.

Parameter

Description

To learn more about workflows for a guardian, select > Get started with Automation.

If a workflow is created, your guardian is validated automatically. You can also perform the validation manually.

Validation overview

By default, the overview page shows validations for the last seven days.

You can view older results by opening a guardian and selecting a different timeframe.

Automated validation

The event subscriptions in the workflow define when the validation of a guardian has triggered automatically.

Manual validation

You can perform a validation of a guardian by selecting the Validate button on the overview screen or within the validation details screen.

Select the validation timeframe.
Select the Validate button.

Individual objective result

For each objective, the validation returns the derived value and classification. The severity goes from the highest (1) to the lowest (5).

Severity

Name

Description

Overall validation result

After the validation of each objective is done, the guarding uses the most severe of individual validations as the overall validation result. Examples of this result usage include:

Making a release decision in your delivery pipeline.
Reporting on the current status of your service.

In this section, you’ll find answers about:

Supported pricing models
Estimation of yearly DPS costs

Pricing models

The Site Reliability Guardian (SRG) executes a set of DQL queries to validate the performance, capacity, or security objectives of a new software release. The validation results are stored and retained as events data.

Dynatrace Platform Subscription

Classic licensing

If you're not familiar with DPS and its capabilities, we recommend reading the Dynatrace Platform Subscription (DPS) documentation first.

If you had early access to DPS licensing (prior to April 2023), please review the required DPS capabilities with the listed DPS units in the earlier version of DPS documentation.

Required DPS capabilities

Required unit

Reason

If you’re missing DPS rate card capabilities, please contact your Account team or Dynatrace support.

Cost estimation

This section explains how to estimate the costs based on the main use cases:

A manual validation results from either manually validating all of a guardian's objectives or validating a single guardian objective. Objectives can execute DQL queries, which typically use metrics or log queries.

Example of one objective and its DQL query

When are costs charged?

Every validation can use:

Query: Each validation executes DQL queries. These are used to retrieve the value used in the validation.
- If auto-adaptive thresholds are enabled for the objective, two queries are executed.
- If auto-adaptive thresholds are not enabled, one query is executed.
Events - Ingest & Process: The validation result will be stored as event data (with an average of 1.5 KiB).
Events - Retain: The validation result will be retained for 35 days by default.
AppEngine Functions: Every validation uses an AppEngine function.

Yearly costs: less than $ 0.01

The calculation example assumes one guardian validation per day. The screenshot below shows the total query usage results as 0 GB-scanned bytes because the guardian only contains objectives with a Metrics – Query; queries are included when using a timeseries command.

Formula: (Query Costs + Events Ingest + Events Retain + AppEngine functions)

The following calculation examples show the usage per DPS capability. Apply your DPS rate card prices to calculate the costs.

Query

365 * 0 = 0 GiB-scanned (Metrics – Query is included in Ingest & Process)

Formula: Number of validations per year * Total value of scanned bytes

When creating or editing a guardian's objective, the query preview returns the current value of Scanned bytes. Select Information to view Scanned bytes.
Events Ingest & Process

1.5 * 1 * (4+2) * 365 / 1024 / 1024 = 0.00313 GiB

Formula: AVG ingest size in KiB * Number of validations per year * (number of objectives + 2 events per guardian validation ) / 1024 / 1024

Every validation of a guardian results in 2 events (start- and end-event) with an average size of 1.5 KiB per event. Additionally, every objective is validated and the result is ingested as an event with an average size of 1.5 KiB.
Events Retain

0.003 * 30 = 0.09 Gib-Days

Formula: Number of GiB of processed data ingested per year * retention period in days
AppEngine Functions

365 * 4 = 1460 invocations

Formula: Number of validations per year * number of objectives

A validation can be automated with Workflows , which incur additional costs beyond the already explained Manual validation costs.

When are costs charged?

Every automated validation uses:

Automation Workflow
AppEngine Functions

Because each guardian requires a workflow for automation, you might end up with multiple workflows performing the same task of triggering guardian validation. To optimize your Automation Workfow costs, you can use simple workflows or a workflow with a workflow action that handles multiple guardians. For more information, see Automation Workflow pricing.

To view and analyze validation results, use the Site Reliability Guardian to query stored events data.

When are costs charged?

Viewing the validation results a guardian uses:

Events - Query

Events - Query

X * Y = GiB-scanned

Formula: (X) Scanned bytes in Gib * (Y) number of validation analyses per year

The following query supports you in deriving the number of scanned_bytes information listed above.

You can find the scanned records information in the icon following the query execution.

To extract the required guardian ID, select > Drilldown in Notebooks on the Guardian's details page. The ID is part of the presented DQL query in the notebook.

fetch bizevents
| filter event.provider == "dynatrace.site.reliability.guardian" AND guardian.id == "vu9U3hXa3q0AAAABADFhcHA6ZHluYXRyYWNlLnNpdGUucmVsaWFiaWxpdHkuZ3VhcmRpYW46Z3VhcmRpYW5zAAZ0ZW5hbnQABnRlbmFudAAkMmYxZmFjZWEtYzc1Ni0zYTdkLWI2NzAtZDA4YjEyZGExZmRhvu9U3hXa3q0"
| filter in(event.type, {"guardian.validation.started", "guardian.validation.finished", " guardian.validation.objective "})
| limit 1000

Disclaimer

Cost estimates are based on the main use cases and might not account for every action that incurs cost.

Explore in Dynatrace Hub

Automated change impact analysis for your deployment and release processes

Dynatrace Hub

Site Reliability Guardian

Permissions

Installation

Learning modules

Site Reliability Guardian concepts

Guardian

Objective

Indicator

Static thresholds

Auto-adaptive thresholds

Operator

Tags

Guardian workflow action

Validation

Validation overview

Automated validation

Manual validation

Individual objective result

Overall validation result

Pricing models

Required DPS capabilities

Cost estimation

Manual validations

When are costs charged?

Calculation example: “Four golden signals” guardian

Automated validations

When are costs charged?

Validation analysis

When are costs charged?

Calculation example “Four golden signals”

Events - Query

Required monitoring units

Cost estimation

Manual validations

When are costs charged?

Automated validations

When are costs charged?

Validation analysis

When are costs charged?

Explore in Dynatrace Hub