Quickly detecting and solving the problems in your environment is crucial to retaining a stable revenue and ensuring the trust of your customers. However, manually analyzing older application or third-party applications where you don't have access to the source code can be time-consuming.
Resolving a problem with Dynatrace drastically accelerates your Mean-time-to-Identify (MTTI) for critical issues, and increases your speed in fixing them before impacting customer experience, thus minimizing impact to your business from outages. By having a single observability platform for all signals, you reduce the risk for human errors from manual correlation of problem details.
Using Dynatrace allows you to avoid looking through all the existing records by showing you only the log lines directly related to the detected problem. This method also allows you to quickly inspect the error details such as message, status, and line of code (LOC) where the error has occurred.
This tutorial will guide you through the process of extracting relevant information from logs through OpenPipeline and accessing logs through the Problems app. It'll show you how to use DQL queries to find information relevant to your problem and get a deeper, more contextual view on the issue with traces.
By the end, you'll know how to
The example used in this guideline is taken from Dynatrace Observability Lab: Problem Detection with Logs. For the full experience, you can follow this hands-on demo. It explains the process of problem creation and test environment setup.
This tutorial assumes that you're already monitoring your environment with Dynatrace.
Create a new pipeline for data extraction
Add a dynamic route for the pipeline
Access problems through the Problems app
View Logs through the problem records
View details in Distributed Traces
OpenPipeline is the Dynatrace data handling solution for data processing and ingestion. You can configure OpenPipeline to extract specific information relevant for your case and convert it into an event that can be alerted on. For more information on OpenPipeline processing capabilities, see Processing.
To create a new pipeline
In Dynatrace, select or press Ctrl+K
to find and select the OpenPipeline app.
Go to Logs, select the Pipelines tab, and select Pipeline.
Select and change the pipeline name to Log Errors
.
Select to save your changes.
On the Data extraction tab, select Processor and choose Davis event. This creates a new processor.
Fill in the required fields. The result should be similar to the example that follows this procedure.
true
[{priority}][{deployment.release_stage}][{deployment.release_product}][{dt.owner}] {alertmessage}
{supportInfo} - Log line: {content}
Set the following Event properties:
Event property
Value
event.type
ERROR_EVENT
dt.owner
{dt.owner}
dt.cost.costcenter
{dt.cost.costcenter}
dt.cost.product
{dt.cost.product}
deployment.release_product
{deployment.release_product}
deployment.release_stage
{deployment.release_stage}
Select Save to save your pipeline.
Ingested and extracted data needs to be directed to the pipeline before it's processed. Creating a route is necessary to make sure that your data is directed to the right pipeline, especially in cases where you have multiple pipelines. For more information, see Routing.
To add new dynamic routing
isNotNull(alertmessage) andisNotNull(priority) andpriority == "1"
Once the problem is detected and recorded in logs, you can check its status in the Problems app.
The Problems app is a tool designed to help operational and site reliability teams reduce the mean time to repair (MTTR) by presenting every aspect of the incident. For more information, see Problems app.
To access the Problems app
Ctrl+K
to find and select the Problems app.Active
.A problem record shows you the number of events, SLOs, affected users, and affected entities. By default, the record shows you the affected deployment and a chart illustrating the problem. You can switch between Chart and Properties, as well as display Deployment, Events, or Logs connected to the problem.
Show x errors
where x
is the number of errors recorded for your problem.host.name
corresponding to the container nameloglevel
(for example, ERROR
)span_id
and trace_id
dt.owner
, the owner of the componentdt.cost.product
and dt.cost.costcenter
corresponding to the cost informationtrace_id
.Logs related to the error, such as info events, may contain additional information that will help you in locating the root cause. Examples of additional information include:
statuscode
contains the error status code. For example, FailedPrecondition
.detail
contains the error message and the line of code where the error has occurred.Traces provide you with a deeper view and additional context for the information available in logs. To be able to access traces through logs, you need to connect log data to traces via OpenTelemetry or OneAgent. To learn more about enriching logs with traces, see Understand and fix multiple problems via logs and traces.
To access traces through logs
trace_id
while you're in the expanded log view.The Distributed Traces app displays a chronological list of called functions, which you can use to make a step-by-step analysis of the problem. The failing point will be marked by a red line.
If you followed all the steps, you have:
If your developer has provided you with a runbook that you can use as a guide for resolving errors, you can follow it as your next step.
Otherwise, your next step should be to contact the team responsible for maintaining the service or feature that led to an error. If you're a part of that team, you can begin the process of debugging the issue.