Drill-down to service failure causes
Analyzing individual requests is a useful way of gaining a better understanding of detected errors. In this article, you will learn how to determine the error underlining an increasing service failure rate using distributed tracing.
Scenario
In the image below, you can see that requests to Redis
started to fail around the 10:45
mark on the timeline.
Steps
-
To find the Failure rate tab, go to the service’s details page and select a View button (such as View requests, View dynamic requests, or View resource requests).
-
Select Analyze backtrace to see where these requests came from.
The requests originate from the
weather-express
service and nearly all failed requests toRedis
have the same exception—anAbortError
caused by a closed connection. -
To analyze down to the affected
Node.js
traces, select More (…) > Distributed traces.By looking at the
Node.js
trace and its code-level execution tree below, you can see that aRedis
request leads to an error. You can see where this error occurs in the flow of theNode.js
code. -
Select the Errors tab to analyze the exception.
Conclusion
Each distributed trace on the Errors tab shows a unique set of parameters leading up to the error. With this approach to analysis, the distributed traces view can be very useful in helping you understand why certain exceptions occurred.