With the latest Dynatrace, you gain significant improvements in storing and analyzing observability, security, and business data.
To leverage these improvements, you need to adapt the data retention, data access, and segmentation configuration. This article walks you through the best practices for configuring data retention and data access control on Grail.
This guide is intended for Dynatrace SaaS admins with environments on AWS or Azure.
With this guide, a Dynatrace admin can configure Dynatrace and Grail and get their teams started with new functionalities like AppEngine, AutomationEngine, or Notebooks.
Please note that these adaptations only apply to the usage of new and improved functionality referred to as "Latest Dynatrace". Existing functionality is not affected by these adaptations.
Before you can get started, you need to familiarize yourself with concepts that provide the new value we refer to as the latest Dynatrace.
The latest Dynatrace is a combination of intertwined new features that provide improvements in storing and analyzing observability, security, and business data.
One of the major changes is the introduction of Grail as a new unified data lakehouse. Grail gives you significant improvements in data ingestion, retention, and analytics. Later on, you'll learn how to configure data retention and data access control on Grail.
For more information, see Grail.
Grail not only stores all the observability and security data but also enables access to that data with Dynatrace Query Language (DQL). Access to all data through the query language is new to Dynatrace and goes far beyond data access through APIs and UIs we have offered up to now.
For more information, see Dynatrace Query Language
Together with the introduction of Grail, Dynatrace has also released a set of major improvements to user interfaces, APIs, and services.
The introduction of the Notebooks, Workflows, and new Dashboards enables you to access data in Grail in various ways. This ranges from consuming views sourced from Dynatrace or peers to constructing highly specialized views. Moreover, you can implement your own logic using fully customized apps running on the AppEngine. Every conceivable operation on Dynatrace data becomes possible.
Before non-admin users can use the latest Dynatrace functionalities, however, a few administrative steps are needed to manage data access. These are explained below.
Your users already have the option to switch to the new user interface, though most of its functionality is derived from existing views that still rely on the current management concepts: management zones and permissions.
Indirectly control access to monitoring data (distributed traces and metrics) based on entity IDs.
Control access to monitoring data based on permission-relevant fields and the newly introduced security context.
While management zones are a very versatile tool for setting up complex permissions, this flexibility also turned out to be a bottleneck in large enterprise environments that store large amounts of data. To successfully use management zones, the ingested data needs to be related to a monitored entity. Data access management with Grail is based on data, data type, and data organization. Security policies can be defined to allow access only to specific data even without any entity relation.
Out-of-the-box permission-relevant fields can be used in most cases for access control. Additionally, we've added the dt.security_context
field, which is reserved for environment-specific details. All Dynatrace components will ensure that these table-relevant permission fields are available on each record (event, log, span) or metric. This allows for a consistent permission concept for the entire environment.
Field name
IAM condition
Supported IAM tables
event.kind
storage:event.kind
events
, bizevents
, system
event.type
storage:event.type
events
, bizevents
, system
event.provider
storage:event.provider
events
, bizevents
, system
k8s.namespace.name
storage:k8s.namespace.name
events
, bizevents
, logs
, metrics
, spans
k8s.cluster.name
storage:k8s.cluster.name
events
, bizevents
, logs
, metrics
, spans
host.name
storage:host.name
events
, bizevents
, logs
, metrics
, spans
dt.host_group.id
storage:dt.host_group.id
events
, bizevents
, logs
, metrics
, spans
metric.key
storage:metric.key
metrics
log.source
storage:log.source
logs
dt.security_context
storage:dt.security_context
events
, bizevents
, system
, logs
, metrics
, spans
, entities
For more information, see Identity and access management.
The Grail data model consists of Grail buckets, tables, and views.
A fresh Dynatrace environment comes with a couple of default buckets. These buckets are prefixed with default_
and cannot be modified. In the process of customizing the environment, users can define custom buckets (as explained later).
The ingestion of data to Grail starts with OneAgent or one of our ingest APIs:
Data stored in Grail can be accessed via DQL. The easiest way to take a first look at your data is by using Notebooks.
Logs, Events, and Business Events use their own tables with default buckets and 35 days of retention. You can access them by using these commands in the DQL sections of a notebook:
fetch logs
fetch events
fetch bizevents
Traces are defined by spans, and spans are the default bucket for spans kept for 10 days. Look at the notebook code snippets section for an easy start, or explore using:
fetch spans
Spans are in preview release and not available on every environment immediately.
Topology (monitored entities) is not yet stored in Grail. Instead, the classic entity storage can be accessed via DQL through predefined views that are all prefixed with dt.entity
.
fetch dt.entity.process_group
fetch dt.entity.host
Many metrics are already by default stored in Grail. The list will grow every month. Metrics are retained for 15 months. You can access them easily via the notebook metric explorer or using the timeseries command as in the examples below:
timeseries avg(dt.host.cpu.load)
timeseries sum(dt.host.disk.util_time)
Grail also stores Dynatrace system data that you can explore. For example, you can see which tables and buckets are defined, and check system events such as audit events.
dt.system.data_objects
list all tables and views.dt.system.buckets
displays all defined buckets.dt.system.events
contains environment-generated events like audit events.dt.system.query_executions
is a view on dt.system.events
for quick access to query executions. It allows admins to see by whom and which queries are executed.Data access is controlled through security policies instead of management zones. Generally, there are two layers of access control: bucket access and table access. For a user to fetch a record, they will need access to the bucket that stores the record and the table to which the bucket is assigned.
For example, a user who wants to access log records from the default_logs
bucket requires the following two permissions:
ALLOW storage:logs:read
ALLOW storage:buckets:read WHERE storage:bucket-name = "default_logs"
Administrators receive automatically assigned policies that allow them to use the new platform and query all data via notebooks. For other users, an administrator first must assign policies accordingly (for example, by applying the AppEngine User to access notebooks and Storage All Grail Data Read to get data). Depending on the data in your environment, as a first step, you could use the default policies for this.
Dynatrace comes with a set of built-in policies for data access. They are all prefixed with Storage
:
For example, have a look at the Storage Default Monitoring Read policy, which provides the following two permissions:
ALLOW storage:buckets:read WHERE storage:bucket-name STARTSWITH "default_;"ALLOW storage:events:read,storage:logs:read,storage:metrics:read,storage:entities:read,storage:bizevents:read,storage:spans:read;
This grants the user access to all tables and to all default buckets (which are prefixed with _default
). Once custom buckets are created, users must get explicit access to access them.
Note: Built-in policies all provide unconditional table access. Once you start adopting record-level permissions, you'll need to replace the default policies with your own policies.
Data access can be further restricted based on the permission fields. This can be achieved by adding a WHERE clause to the table permissions. For example, giving a user access only to logs of a certain host group can be achieved by providing the following permission:
ALLOW storage:logs:read WHERE storage:dt.host_group.id = "myhostgroup"
Be aware that all permissions that a user is assigned will be evaluated.
If a user has already received access to all logs via an unconditional ALLOW storage:logs:read;
permission, this takes precedence, and the record level permission will have no effect.
Learn more about record-level permissions:
The field dt.security_context
is not filled automatically by Dynatrace components. It can be used freely to add a custom dimension to record-level permissions that is not reflected by any of the other permission fields. If you need to restrict access on fields that are not in the supported fields table, you have to set dt.security_context
on ingest.
Implementing the data access control requires these two sets of actions:
First, you need to identify the details that you'll use for access control. Examples could be product IDs or names, teams, or other organizational details. Typically these details are already modeled via, for example, Dynatrace host groups or Kubernetes namespaces. The following flow chart shows how to figure out the permission concept for your data.
If the permission fields already cover your required details, you can move on to defining record-level permissions.
event.kind
storage:event.kind
event.type
storage:event.type
event.provider
storage:event.provider
k8s.namespace.name
storage:k8s.namespace.name
k8s.cluster.name
storage:k8s.cluster.name
host.name
storage:host.name
dt.host_group.id
storage:dt.host_group.id
metric.key
storage:metric.key
log.source
storage:log.source
dt.security_context
storage:dt.security_context
Very often, the details are already available in the record, but not in one of the permission fields.
For example, log files might contain the details in their content. In these cases, you can leverage the ingest pipeline to set a security context based on any details of the incoming record. The following illustration shows how to use a DQL matching expression on the log content to determine a security context based on it.
For more information, see Log security context.
In cases where the details are not yet part of the record, you will need to modify the data source.
You have two options:
To unlock the full power of Grail, you can also use custom buckets to manage your monitoring data. Custom buckets serve the following purposes:
See the following sections for details on these two purposes.
Start by thinking about the retention times per data type for your use cases. Your application troubleshooting use cases might only require 14 days of logs, whereas audit logs might have to be retained for multiple years.
We'll look at logs as an example data type, but these tips also apply to all other data types. Only metrics are different, and we will take a look at metrics later.
Most of your log data might be needed for troubleshooting, and you don't want to store this mass data for a very long time. Try to store these logs as briefly as possible. This reduces costs and makes troubleshooting faster and cheaper. Typically, 14 days might be enough. So how to do it best?
Create a custom bucket.
You can create a custom bucket via the Grail – Storage Management API or use the Grail Storage Management App.
To create custom Grail buckets, your group needs to be bound to the policy with the following statements:
ALLOW storage:bucket-definitions:read,storage:bucket-definitions:write,storage:bucket-definitions:truncate,storage:bucket-definitions:delete;
We called our first custom bucket log_sec_dev_troubleshoot_14d
.
Send the data.
After the Grail bucket is created, it's time to bring data in. You need to create or adjust a log bucket assignment rule to store troubleshooting logs in that new bucket. You have two possibilities:
Let's create a new default rule to store all logs for only 14 days, with one exception, in which we store them in the default 35-day bucket.
The matcher for logs will always be true and therefore match any log line. This will be improved soon by just adding "true" to the matcher. This rule must be the last one in the list. This ensures that every log line not matching before is stored in the bucket for 14 days.
This second rule stores logs from a booking app with the term b-app
or http
in the log source in our default log bucket for 35 days:
Note: the old default rule will not match anymore. It is always executed last and, because we have a new catch-all rule, it will not be used anymore.
Create a longer retention bucket for analytic data.
Before we're done, we need another bucket for log information that we need to keep much longer for analytic use cases.
This log contains technical information on product usage combined with technical information. We want to store them in a separate bucket with a longer retention time. This log analytics bucket can be used for all your log lines needed for analytics use cases as long as the retention is not different.
We need to repeat the steps and create a new log bucket and a bucket assignment rule. The 1-year log bucket is called log_sec_dev_analytics_1y
.
The analytics log lines can be easily identified, as there is a company-wide agreement that all analytic log lines have log-level INFO
and contain the term app_analytics
.
We add a bucket assignment rule with the following matcher:
matchesPhrase(content, "app_analytics") and matchesValue(loglevel,"INFO")
The log files stored in our three buckets (two custom buckets plus the default log bucket) can be accessed by all developers, and we need to set the right permissions to access these buckets.
This is the needed policy:
ALLOW storage:buckets:read WHERE storage:bucket-name STARTSWITH "log_sec_dev";ALLOW storage:buckets:read WHERE storage:bucket-name = "default_logs";ALLOW storage:logs:read;
If you look to the policy statement, you can identify the STARTSWITH
condition. The naming pattern we used for our buckets starts with the data type log
followed by the security level, where sec_dev
is for developers. In that way, can reduce the different policies needed.
Suppose you also have event, span, and even business event data, and you allow all developers to access this data. By removing the data type log_
from the bucket name, you can write a simple policy allowing access to all buckets that start with sec_dev
.
How whould you deal with log lines to which only specific persons should have access, and you need it for even a longer period? A typical example would be access logs.
Once again, we create a bucket for 3 years and a log matcher for storing these specific logs in the bucket and we create a policy for data access.
log_sec_high_3y
(sec_high
is the naming pattern for high security)matchesPhrase(log.source, "audit")
ALLOW storage:buckets:read WHERE storage:bucket-name STARTSWITH "log_sec_high";
Indirectly filter monitoring data based on entity IDs of precalculated management zones.
Segment data through reusable filter conditions to analyze monitoring data in real-time.
After having organized monitoring data in Grail buckets and mastered the setup of IAM policies, giving users access to data they need, the final step is to understand how monitoring data can be filtered in the latest Dynatrace.
Segments mirror management zones in some ways, in that they allow you to abstract complex filter conditions and perform lookups to the monitored entity model. Segments, however, are no longer precalculated attributes, but query-time filter conditions instead. This solves the performance bottleneck of management zones, making it possible to both ingest and analyze several orders of magnitude higher volumes of monitoring data in the latest Dyntrace.
For more information, see Segments.
As described in the Grail Data Model reference, buckets are assigned to tables. Because our four custom buckets with log data are assigned to the log table, you can access all data in this table via fetch logs
. There is no need for a user to select a bucket (or, as in other products, an index). Developers will of course get only data from the buckets log_sec_dev_analytics_1y
,log_sec_dev_troubleshoot_14d
, and default_logs
, and not from log_sec_high_3y
because of our policy definitions.
Yes, it is always good practice to narrow down the data you want to analyze or search for specific terms.
Adding a bucket filter is an effective way to speed up a query and reduce the scanned bytes. Depending on your data volume this might not always be perceptible.
To filter on our troubleshooting logs, we can add this filter criteria:
fetch logs| filter dt.system.bucket == "log_sec_dev_troubleshoot_14d"
Grail is built for storing and querying the data with hyper-scale data processing. If you use bucket filters for your data queries, it will speed up queries, but the number of buckets is limited per environment to 80.
Data can't be moved between buckets. Keep this in mind if:
Example:
To ensure that data ingestion from unit A does not affect the query costs or performance of unit B, set up dedicated buckets for each business unit.
You should not use Grail buckets for data separation on dynamic dimensions, such as creating buckets for host groups or teams. You can filter the data on other dimensions.
If you need to restrict access within buckets, you can use record-level permissions.
Record permissions are the right way to implement fine-grained access control and restrict data access further. You can either use the defined dedicated fields, like event.kind
or k8s.namespace.name
, for creating policies, or you can add a field/dimension dt.security_context
for access control on any values such as team names.
If you look back at our examples, we defined a policy that allows every developer to access logs stored in the default bucket and buckets starting with the naming pattern: log_sec_dev
.
Suppose a new external team will work for the next 18 months on a project and you only want to expose logs that are tagged with their team name: extDev
. To do so, make sure that the dt.security_context
is set on their log lines, and create an additional statement:
ALLOW storage:buckets:read WHERE storage:bucket-name STARTSWITH "log_sec_dev";ALLOW storage:buckets:read WHERE storage:bucket-name = "default_logs";ALLOW storage:logs:read;ALLOW storage:logs:read WHERE storage:dt.security_context=`"extDev";
You can use OneAgentCtl to set dt.security_context
as a host tag on OneAgent-monitored hosts:
./oneagentctl --set-host-tag=dt.security_context=my_security_context
Custom buckets are currently not available for metrics, but they are planned. Today, metrics in Grail are always stored in the default bucket with a retention of 15 months. In the future, you will be able to extend the 15 months and be able to use custom buckets for metrics.
Record permissions can also be used for metrics on the supported fields
or by using the dt.security_context
.
After your environment is set up and ready for your teams, start exploring the new user interface, apps, and learn how to upgrade some of the current capabilities to the new platform.
Check the following additional resources available across various Dynatrace self-service channels.
See some of the on-demand courses related to the latest Dynatrace. Login required,
For more videos, see the Learning Dynatrace Query Language playlist
For more videos, see Dynatrace Observability Clinics playlist
Go to the Dynatrace Playground environment and get a hands-on experience using the latest Dynatrace.