Crash analysis
Processes crash for a multitude of reasons and it’s often difficult to understand the root causes that contribute to such crashes. When a monitored process crashes, you’ll see a process crash entry in the Events section of each affected process and host page. The example process below has some availability problems (shown in red on the timeline). By selecting the affected timeframe in the timeline, the Events section shows you the number of process crashes that occurred during that timeframe (1 crash in this example).
Select Process crash details to view a detailed list of the crashes that occurred during the selected timeframe. Here you’ll find all details related to why each process crashed.
The provided crash details include the signal that killed the process (for example, Segmentation fault
or Abort
), the execution stack frame that crashed, and more. The crash type—such as a native core dump, Java core dump, or abnormal program exit due to an exception—determines which crash details are available.
This functionality works for all processes on each monitored host.
Analyze additional crash artifacts
Crash details often include a Download button that provides access to additional crash artifacts, such as hs_err_pid
files for Java crashes, text files that provide analysis of Linux and Windows core dumps, or files containing the .NET, Java, or Node.js exceptions that were potentially responsible for the crashes. For example, the Segmentation fault crash report above resulted in a core dump. OneAgent analyzed the core dump automatically and then produced the following report as a log artifact:
dumpproc version 1.108.0.20161025-115919, installer version 1.108.0.20161025-1210462016-11-09 18:00:44: Application 'CreditCardAutho', inner pid '15891', outer pid '0', signal: 'Segmentation fault' (11)process group ID: 0x441b2cb89962033dprocess group instance ID: 0xfe58bab23100f42cprocess group Name: easytravel-*-x*threadCount: 1thread: 0 - stack range: 0x7ffeda572000-0x7ffeda594000, size: 136 kB0x00007ffeda592be0 0x00007f4de477604d libpthread-2.15.so!<imagebase>+0xf04d0x00007ffeda592bf0 0x00000000004038d8 CreditCardAuthorizationS64!main+0x1b80x00007ffeda592c60 0x00007f4de41c676d libc-2.15.so!__libc_start_main+0xed0x00007ffeda592d20 0x000000000040329a CreditCardAuthorizationS64!<imagebase>+0x329amapped files:0000000000400000-000000000041e000 0 /home/labuser/easytravel-2.0.0-x64/CreditCardAuthorizationS64 (MD5: da5992daf5ba3b76c633c853c7da5e87)000000000051d000-000000000051e000 1d /home/labuser/easytravel-2.0.0-x64/CreditCardAuthorizationS64 (MD5: da5992daf5ba3b76c633c853c7da5e87)00007f4de41a5000-00007f4de4359000 0 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)00007f4de4359000-00007f4de4558000 1b4 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)00007f4de4558000-00007f4de455c000 1b3 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)00007f4de455c000-00007f4de455e000 1b7 /lib/x86_64-linux-gnu/libc-2.15.so (GNU Build-Id: aa64a66ac46bff200848c0a0694011bd0140ab4e)00007f4de4563000-00007f4de4565000 0 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)00007f4de4565000-00007f4de4765000 2 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)00007f4de4765000-00007f4de4766000 2 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)00007f4de4766000-00007f4de4767000 3 /lib/x86_64-linux-gnu/libdl-2.15.so (GNU Build-Id: d181af551dbbc43e9d55913d532635fde18e7c4e)00007f4de4767000-00007f4de477f000 0 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)00007f4de477f000-00007f4de497e000 18 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)00007f4de497e000-00007f4de497f000 17 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)00007f4de497f000-00007f4de4980000 18 /lib/x86_64-linux-gnu/libpthread-2.15.so (GNU Build-Id: c340af9dee97c17c730f7d03693286c5194a46b8)00007f4de4984000-00007f4de4a02000 0 /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)00007f4de4a02000-00007f4de4c01000 7e /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)00007f4de4c01000-00007f4de4c03000 7d /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)00007f4de4c03000-00007f4de4c05000 7f /lib/x86_64-linux-gnu/liboneagentproc.so (1.108.0.20161025-115919)00007f4de4cc0000-00007f4de4ce2000 0 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)00007f4de4ee2000-00007f4de4ee3000 22 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)00007f4de4ee3000-00007f4de4ee5000 23 /lib/x86_64-linux-gnu/ld-2.15.so (GNU Build-Id: e25ad1a11ccf57e734116b8ec9c69f643dca9f18)
Protect sensitive user data
Crash reports might include sensitive personal information that should not be viewed by all users. For this reason, your Dynatrace administrator must enable the View logs account-security option and the View sensitive request data permissions in your user profile before you can view personal data. This option is disabled by default for all non-admin users and must be explicitly enabled before you can access log contents.
Crash handling on Windows
In order for a generic Windows process crash (core dump) to be visible to Dynatrace, the crash must be detected by Windows Error Reporting. For this reason, the Windows Error Reporting service must be enabled.
When a crash occurs on Windows, a dialog appears, asking if you want to debug or close the crashed application. This is not desirable for headless systems. You can disable this dialog by adding a value to the registry, as shown below:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting] "DontShowUI"=dword:00000001
You can learn about other valuable settings related to Windows Error Reporting by visiting Microsoft documentation.
Linux core dump handling
In Linux, the way the kernel handles the core dump is set in /proc/sys/kernel/core_pattern
. Beginning with kernel 2.6.19 (1), there are two methods of dealing with application crashes. The core dump might be written to a file pointed to by the /proc/sys/kernel/core_pattern
entry or pushed to an application—the entry must be prefixed with a vertical slash character (|
) character.
Because Suse Linux uses the first method, the entry is similar to
/proc/sys/kernel/core_pattern:core
. This means that a file with the name core
is written in the current working directory of the crashed process.
Ubuntu and Red Hat generally rely on their own tools to report crash dumps, so the lines appear as follows:
|/usr/share/apport/apport %p %s %c %P
or
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
In the last example, when a program crashes, the coredump
output is pushed to stdin
of the application given in the first parameter. Moreover, the kernel fills the values of any parameters formatted as %[a-zA-Z]
. The apport
reporting service overwrites the file /proc/sys/kernel/core_pattern
. If apport
is enabled (in /etc/default/apport
), then the /proc/sys/kernel/core_pattern
configuration setting is set when the apport
crash reporting service starts on system boot.
Read more…
Operating system changes
OneAgent installer performs the following changes to your system to handle core dumps.
Disabling ABRT and Apport
ABRT (Red Hat) and Apport (Debian) services are stopped and disabled.
Both services are re-enabled during OneAgent uninstallation.
For more information, see OneAgent security on Linux.
Core pattern handling
The OneAgent installer overwrites the core pattern with its own command but preserves the original pattern.
-
The content of the original
/proc/sys/kernel/core_pattern
file is copied to:- OneAgent version 1.301 and earlier
/opt/dynatrace/oneagent/agent/conf/original_core_pattern
- OneAgent version 1.302+
/var/lib/dynatrace/oneagent/agent/backup/original_core_pattern
When OneAgent is uninstalled, the original core pattern present in this file, is restored to
/proc/sys/kernel/core_pattern
. - OneAgent version 1.301 and earlier
-
The content of the original
kernel.core_pattern
option of/etc/sysctl.conf
is copied to:- OneAgent version 1.301 and earlier
/opt/dynatrace/oneagent/agent/conf/original.sysctl.corepattern
- OneAgent version 1.302+
/var/lib/dynatrace/oneagent/agent/backup/original.sysctl.corepattern
When OneAgent is uninstalled, the original core pattern present in this file is restored to
kernel.core_pattern
in/etc/sysctl.conf
. Ifkernel.core_pattern
was not present in/etc/sysctl.conf
before OneAgent installation, the backup file is not created. - OneAgent version 1.301 and earlier
Depending on the original entry in core_pattern
, Dynatrace will write different patterns to core_pattern
. The possible configurations and expected entries after installation are listed below:
-kp
parameter is appended along with all kernel parameters needed for Dynatrace to substitute in the original filename.-a
argument is not appended to the output core_pattern
entry if there are no parameters.-a
argument gets appended along with all of the parameters after the binary path to apport
.Core handling by OneAgent dumpproc
When a crash occurs:
rdp
is called to dump the core to OneAgent folders. This core is used by the Crash Reporting functionality.- OneAgent reads
original_core_pattern
and generates a core dump based on its settings. Therefore, if the original configuration specified a particular location for writing the core dump file, this will continue even after OneAgent is installed. - The core dump is analyzed to check if Dynatrace could have been the root cause of the crash.
- If OneAgent determines that Dynatrace could have been at fault:
- A support alert is generated. This is reported to our DevOps team.
- The core dump is zipped and retained in addition to all involved libraries. This is needed for later offline analysis.
- If OneAgent determines that Dynatrace is not at fault:
- A crash is reported via the Dynatrace web UI to the user.
- If it has any impact on the customer's application, a problem is opened and an appropriate event is generated for the involved processes as described above.
- If OneAgent determines that Dynatrace could have been at fault:
Cleanup
The log and support alert directories are cleaned up automatically.
- For support alerts, we process the
core dump
, then zip it and keep it in order to be sent to cluster. - For crashes (non-instrumented processes or instrumented ones where we decide Dynatrace is not at fault), we process and then delete the copy of the
core dump
.