Modern distributed systems generate complex failure patterns that require fast, reliable analysis.
Failure Analysis in Dynatrace addresses this need by offering a focused experience for investigating service failures, helping you uncover patterns and contributing factors across your environments.
In Dynatrace, a failure refers to any request or transaction that does not complete successfully. Failures may result from HTTP errors, unhandled exceptions, or custom-defined conditions that indicate unexpected behavior.
Failure Analysis helps you detect, investigate, and resolve service failures faster and more effectively. It modernizes the Dynatrace experience by introducing a user-focused interface and new capabilities that support both reactive and proactive troubleshooting.
You can access Failure Analysis directly from the problems-specific drill-down options when a failure rate increase is detected, or explore it manually from the Failures tab in Services.
This feature is designed for SREs, developers, and DevOps teams who need fast, actionable insights into service health and failure patterns.
Failure Analysis provides:
Advanced filtering
Use Dynatrace filters to narrow down failures by attributes such as service, endpoint, failure type, and more.
Timeframe filters allow users to isolate failures within specific periods or compare across time ranges.
Comparison mode
Visually compare failure rates across different timeframes to identify trends, assess the impact of changes, or validate fixes.
Interactive annotations—labels displayed above the graph, provide additional context for specific timeframes when hovered over, highlighting key failure events for deeper insights.
Contextual log integration
View logs directly associated with failed service calls (via trace/span IDs).
This provides essential context for understanding the root cause.
Outgoing call analysis
Investigate failures in downstream dependencies such as HTTP requests, gRPC calls, or third-party APIs.
Dynatrace detects failed states based on HTTP/gRPC response codes, span status, and the presence of exceptions within traces.
Database failure insights
Analyze failed database interactions, including query-level visibility for supported databases.
This helps identify backend issues contributing to service instability.
Exploratory and contextual access
Access the Failure Analysis page with or without predefined context. When accessed via the problems-specific drill-down options, filters are pre-applied.
Users can also explore failures manually by adjusting filters.
The new Failure Analysis feature is available as a dedicated Failures tab in Dynatrace Services. It is designed to support both contextual and exploratory workflows.
Contextual Access via problems-specific drill-down options
When a failure rate increase is identified as the root cause of a problem, you are directed to the Failure Analysis page with pre-applied filters based on the affected service and endpoint.
Exploratory Access
You can also navigate to the Failures tab manually and adjust filters to explore failures across services, endpoints, and timeframes.
To get started troubleshooting a service failure
Same timeframe last week
) to detect anomalies.While the new Failure Analysis experience introduces significant improvements in usability and functionality, you may occasionally experience slower load times when accessing Failure Analysis in environments with high trace volumes or complex service architectures. Dynatrace is actively monitoring performance and continuously optimizing the experience.
To deepen your understanding of failure detection and troubleshooting in Dynatrace, explore the following resources:
We’d love to hear your feedback and questions about the new Failure Analysis experience.
Visit the Feedback Channel in the Dynatrace Community