This page provides best practices for Log Management and Log Analytics powered by Grail. It also shows a use-case example, where some of these best practices are applied in a real-life scenario.
Once you've read this page, you'll have the knowledge to optimize how you retain and scan logs–and therefore reduce your costs–while still getting the results you expect.
By following these best practices, you can:
Here are some things to think about before you start, so that you can make an effective plan to optimize logs in Dynatrace.
Different sources use different ways to send log data. By estimating your daily ingest volume, you can better decide on data partition and segmentation.
For more about collecting and ingesting data, see Log ingestion.
Classify your log data and think about compliance and privacy requirements.
Classification: Which log types will you use? Common log types are:
Compliance: How long do you need to store logs?
Retention time is defined at the bucket level.
Privacy: Are there specific requirements that demand data masking?
You can redact data on ingest with OneAgent, or during ingest processing with OpenPipeline.
Grail organizes data in buckets. Buckets behave like folders in a file system and are designed for records that should be handled together. These could be, for example, data that:
For more about buckets, see Bucket assignment.
Grail is capable of scanning petabytes of log data with high performance. However, the more you scan, the higher your log query consumption will be. This is even true if the query doesn't return any data, as each scanned log record contributes to the total scanned bytes volume.
Here are some best practices to help reduce the size of retained and scanned data, while still getting the expected results:
default_logs
bucket as your playground.By using dedicated buckets to separate your data, you can reduce the amount of data that you need to scan to get the relevant results.
By default, a single query can scan up to 500 GB of data. But how many log records does this represent?
Creating buckets can help to separate data, but too many buckets can make it cumbersome to access log data.
default_logs
bucket as your playgroundBy default, all log records are sent to the default_logs
bucket.
Once you start making other buckets, you can direct certain log records to those buckets.
Then, the only log records that end up in the default_logs
bucket are those that you haven't specifically routed to another bucket.
This usually–but not always–means that the default_logs
bucket has log records that you don't need to preserve.
At this point you can treat the default_logs
bucket as your playground:
If you intentionally use the default bucket for onboarding new data, a good practice is always to keep the bucket empty. Therefore, if you see new logs in that bucket, you will know that you are ingesting logs which aren't assigned to a specific bucket.
For most use cases, try to keep the volume of daily retained data in a single bucket to around 2–3 TB. This is especially true for frequently queried buckets. (However, it is usually not possible for buckets used to address compliance use cases, where you'll likely retain petabytes worth of log records in a single bucket.)
This will help to ensure the best user experience and performance, especially if users don't follow DQL Best practices (such as applying specific filters for time spans or buckets, or increasing the query volume limit with the scanLimitGBytes
parameter).
You can set different retention periods for each bucket. This allows you to optimize buckets for individual retention periods, compliance, and cost.
For example:
Log records can be stored from one day up to 10 years. The retention period is defined when you create a bucket, and can be re-configured at any time. For more information about retention periods, see Data retention periods: Log Management and Analytics.
You can filter logs so that non-relevant logs are either sent to a different bucket or deleted outright. To filter logs on ingest, use either OneAgent (see Log ingest rules) or OpenPipeline (see OpenPipeline processing examples).
Bucket filters are like permissions on the query level. By adding a bucket filter to the query, you can restrict the DQL query to scan a single bucket, regardless of which buckets the user can access.
This reduces the amount of scanned data and the associated costs, especially with queries used in auto-refreshing dashboards.
Additionally, you can use segments to provide easy filtering by bucket, see Segment logs by bucket.
For more information about bucket filters, see Query and filter logs.
By default, a DQL query will scan all buckets that the user has access to. To limit the number and kind of buckets that a user has access to, you can use IAM policies to set access permissions and policy boundaries.
This way you don't have to define bucket filters manually, with every query.
Policy boundaries in Dynatrace are a modular and reusable way to define access conditions for resource and record-level permissions. They act as an additional layer of control, refining scope of permissions granted by IAM policies without the need to create additional specific policies.
By externalizing access conditions, policy boundaries simplify management, ensure consistent enforcement, and improve scalability across large environments.
For more information about access permissions, see
You can create events and metrics from log records.
To convert log queries to log-based metrics, see Optimize performance and costs of dashboards running log queries. After you've extracted metrics, you can delete the log records–this is especially useful for aggregated information where access to the raw record isn’t important.
You can use log-based events and metrics for alerting, instead of log queries. For more information, see Set up Davis alerts based on events and Set up Davis alerts based on metrics.
Some apps, such as Kubernetes, let you see logs in context. This lets you scan only the logs that are relevant to a specific use case. For more information, see Use logs in context to troubleshoot issues.
Since you use DQL to access log records, follow DQL best practices to create optimized queries. For more information, see DQL Best practices.
The best way to learn about usage and adoption is with Dynatrace ready-made dashboards.
You can find these in Dashboards > Ready-made.
Use the dashboards to learn more about consumption, ingested and retained volumes, query patterns, bucket utilization, and more.
This section presents an example situation that demonstrates how to apply some of the best practices.
Let's assume that your organization has already prepared a plan for data segmentation, but hasn't yet configured anything in Dynatrace.
Your organization has two main user groups:
Dynatrace ingests and retains the following types of log data, described in the table below.
Log type | Source | Daily ingest size | Bucket name | Retention | Relevant user group |
---|---|---|---|---|---|
Infrastructure logs | Kubernetes system logs monitored with OneAgent (journald) | 2 TB | infra_logs | 90 days | Platform |
Application logs | Kubernetes monitored with OneAgent | 2 TB | app_logs | 60 days | Developers and Platform |
Application logs | Lambda monitored with Lambda Layer | 1 TB | app_logs | 60 days | Developers and CloudOps |
Access logs | CloudFront logs sent via Kinesis | 3 TB | access_logs | 365 days | CloudOps |
Audit logs | AWS Resource Audit Logs | 2 GB | audit_logs | 3650 days (10 years) | Security |
To start, first you need to set up log ingestion.
To set up log ingestion, follow the steps described in Log ingestion.
By default, all log data is ingested into the default_logs
bucket.
Ideally, after you have implemented all the best practices, only admins should have access to this bucket.
Bucket permissions should follow the principle of least privilege, in which individual users have access to just the buckets that they're required to query or visualize.
There are two ways that you can verify data is ingested and retained.
The Log ingest overview ready-made dashboard, available in Dashboards, lets you check ingested log volumes.
Use Logs or
Notebooks to fetch logs from any bucket and validate if the ingested data arrives correctly and looks as expected.
Run the DQL query shown below.
fetch logs| filter dt.system.bucket == "default_logs"
If you don't see any log data, see Troubleshooting Log Management and Analytics for troubleshooting tips.
This section shows how to apply some of the best practices to this example use case.
This step creates a dedicated bucket for certain data.
access_logs
.365
.logs
.OpenPipeline handles log ingestion from all sources and allows processing, transformation and bucket assignment before logs are stored in Grail.
For this example, let's use OpenPipeline to filter logs on ingest.
We'll configure a pipeline that processes CloudFront logs and stores them in the access_logs
bucket.
Go to OpenPipeline and select Logs.
In the Pipelines tab, select + Pipeline to create a new pipeline.
Name the pipeline AWS CloudFront logs
.
Add technology bundle processors.
Select Save to save the configuration.
Get the pipeline ID, which you'll need to filter logs later.
AWS CloudFront logs
pipeline.
Depending on how many pipelines are configured, you may need to select > to get to the right page.pipeline_AWS_cloudfront_logs_5498
.matchesValue(aws.log_stream, "CloudFront_*")
.AWS CloudFront logs
pipeline that you just created.To verify the configuration, go to Logs or
Notebooks and run the following query.
This checks if the pipeline is processing the most recently ingested logs.
fetch logs| filter dt.openpipeline == {pipeline_AWS_cloudfront_logs_5498}
AWS access logs
.true
.access_logs
bucket you already created.This step grants users access to only specific buckets.
Open Account Management > Identity & access management > Policy management and then select the Boundaries tab.
Select + Boundary to create a new boundary.
access_log read
.storage:bucket-name = "access_logs";
.Select Save.
Open Account Management > Identity & access management > Group management.
Select + Group to create a new group.
For this example, name the new group CloudOps
.
Select Create to create the group. The View group page appears.
Select + Permission to add a new permission.
access_logs read
boundary that you previously created.Select Save.
CloudOps
group, select > Edit.CloudOps
group.
You may need to search for the group using the Filter groups text field.