Drill-down to service failure causes

Analyzing individual requests is a useful way of gaining a better understanding of detected errors. In this article, you will learn how to determine the error underlining an increasing service failure rate using distributed tracing.

Scenario

In the image below, you can see that requests to Redis started to fail around the 10:45 mark on the timeline.

Steps

To find the Failure rate tab, go to the service’s details page and select a View button (such as View requests, View dynamic requests, or View resource requests).
Select Analyze backtrace to see where these requests came from.

The requests originate from the weather-express service and nearly all failed requests to Redis have the same exception—an AbortError caused by a closed connection.
To analyze down to the affected Node.js traces, select More (…) > Distributed traces.

By looking at the Node.js trace and its code-level execution tree below, you can see that a Redis request leads to an error. You can see where this error occurs in the flow of the Node.js code.
Select the Errors tab to analyze the exception.

Conclusion

Each distributed trace on the Errors tab shows a unique set of parameters leading up to the error. With this approach to analysis, the distributed traces view can be very useful in helping you understand why certain exceptions occurred.