Multiple-thread architectures can easily scale, as the distribution of work allows CPUs to remain active and continue running code. However, when systems need to coordinate work across multiple threads, locks increase and reductions in code run might occur.
In Dynatrace, such behavior is automatically and continuously detected by OneAgent. With continuous thread analysis, data is continuously available, with historical context, and doesn’t need to be triggered if a problem occurs. You can receive alerts on trends, analyze thread dumps, and directly start improving code, when and where it's needed, to prevent bottlenecks.
To start using continuous thread analysis
Go to Settings > Preferences > OneAgent features.
Find and turn on the following OneAgent features for the technology you want to monitor:
Technology to monitor
OneAgent features to turn on
Java
Continuous thread analysis
Node.js
Due to a known issue with Java 11, continuous thread analysis of JVM threads is only available for Java 8 and Java 17+ (based on the supported versions). Application and agent threads remain unaffected.
To start analyzing thread dumps
Go to Profiling & Optimization > Continuous CPU profiling.
For the process group that you want to analyze, select More (…) > Threads.
On the Thread analysis page, you can:
Threads are a source of scalability, as they enable your applications to carry out multiple tasks at once. In certain situations, this can become a source of performance bottlenecks. For example:
In this example, the process group idea*.exe
shows a spike in CPU consumption. We investigate the problem in its Thread analysis page, starting by filtering the thread group list by the Locking thread state.
We see that the system is not in bad shape, but there is still locking behavior affecting 23
thread groups. We select the first and most affected thread group, JobScheduler FJ pool
, to continue our analysis.
For the thread group JobScheduler FJ pool
, a considerable amount of time (almost half: 46.1%
) is continuously spent in locking, and it's distributed across 15
threads. This indicates a general lock affecting all of those threads. We want to avoid this behavior because it limits both speed and the ability to increase throughput by adding resources. Ultimately, it can lead to a state where the system won’t be able to process more data even if we add more hardware.
To get to the root cause, we select More (…) > Method hotspots.
Based on the retrieved information, we can locate the problem of locking and test different ways to write the code to increase its performance.
In this example, the process group doaks-prod1-eastus-cassandra
spends 9.97%
of the time in Code execution.
We select CPU time to understand how the behavior is affecting CPU consumption. From the chart, we can see that the thread group MutationStage
is the biggest contributor. From the table below, we learn that it's spending 1.6 d
in Estimated CPU time and 3.55%
of the Overall execution breakdown in code execution.
We want to identify which code is executed and how it impacts CPU consumption. After selecting the thread group name (MutationStage
), the first thread in the list at the bottom of the page is the highest contributor (MutationStage-2
).
To see a list of hotspots, we select More (…) > Method hotspots for MutationStage-2
.
We now see that Unsafe.unpark
contributes 30.1%
to code execution and is a good candidate for optimization.