OpenTelemetry exceptions and error handling

No matter how complex or simple an application seems to be, there is always room for error, be it a connection issue, invalid input from a user, or a bug hidden deep within the code of the application itself.

Bad or even non-existent error handling makes an application annoying at best and downright awful or unusable at its worst. So, it makes sense to wonder about the “what if?” scenarios as a software engineer and handle them accordingly.

Instrumenting a third-party library or framework requires additional exception and error handling, since it introduces new features to an application, and, along with it, an increased potential for errors.

Exceptions and error handling in OpenTelemetry

Exceptions and error handling pose unique requirements in OpenTelemetry.

Since the data generated by OpenTelemetry is used to monitor application code and behavior, the telemetry data is non-essential when viewed in terms of the application business logic. This means that, in case of an error, it is more acceptable to lose the generated telemetry data than to lose functionality in the application that instruments OpenTelemetry.
Additionally, OpenTelemetry can be enabled via platform extensibility mechanisms or loaded at runtime. This means that the library's use might be not only non-obvious, but even be out of the developer's control.

Never throw unhandled exceptions at runtime

As the official OpenTelemetry documentation states, OpenTelemetry implementations MUST NOT throw unhandled exceptions at runtime.

The API and SDK should provide safe defaults for missing or invalid arguments to prevent an API method throwing an unhandled exception. For instance, if the span name is passed as [null], a name like [empty] may be used instead during span creation.
The API and SDK must not cause the application to fail at runtime, and instead fail fast and cause the application to fail on initialization. For instance, if a bad configuration or environment is provided, the app should not be initialized at all.
The SDK must not throw unhandled exceptions for errors in their own operations. For instance, if the exporter cannot reach the endpoint to which to send data to, it should not throw an exception.

Whenever catching an exception, put it in an event on a span. The name of the event must be exception.

For more specific guidance on background tasks and internal error handling, see the official OpenTelemetry documentation.

Performance implications

Error handling or extensive input validation can lead to a decrease in performance.

Error handling adds complexity and other steps to the application logic.
A dynamic language with unclear object types at runtime requires performance-heavy and error-prone type checks.

We therefore recommend that, following the guiding principles above, you define global exception handling logic to ensure exceptions do not leak into user code.

Luckily, OpenTelemetry has also implemented a self-diagnostics tool. All libraries can expose self-troubleshooting telemetry that can be enabled and filtered out. For instance, a span exporter can indicate how much time exporters spend uploading telemetry. Any time such an error is suppressed, the library should log the error with language-specific conventions.

How to implement it into your code

SDK implementations must allow end users to change the default error handling behavior. This enables developers to run stricter or more specific error handling.

Generally, an exception can be caught on an event on a span after it was created but before it ends.

Many languages will have a method to record the exception implemented and ready to be called.
Whenever dealing with exception handling, it is important to remember to finish the span, regardless of the outcome of exception handling.

Java example

This example is a pseudo-Java code block to catch and record an exception on a span:

Span span = myTracer.startSpan(/*...*/);
try {
  //Code that does the actual work that the Span represents
} catch (Throwable e) {
  span.recordException(e, Attributes.of("exception.escaped", true));
  throw e;
} finally {
  span.end();
}

Specifically to Java, OpenTelemetry makes use of java.util.logging to handle logs and errors. Custom handlers can be registered in code as well as the Java logging configuration file.

Ruby example

This example is for Ruby:

rescue Exception => e
  span&.record_exception(e)
  span&.status = OpenTelemetry::Trace::Status.error("Unhandled exception of type: #{e.class}")
  raise e
ensure
  span&.finish
end

Semantic conventions for exceptions

The name of the event on which the exception is caught must always be exception.
You can add up to four different attributes to your event:
- exception.type
- exception.message
- exception.stacktrace (all strings)
- exception.escaped (boolean)
Of these, either exception.type or exception.message is required. For details, see the official OpenTelemetry documentation.