Best practices for OpenTelemetry metrics
There are various guidelines and recommendations to keep in mind when working with OpenTelemetry, as with almost any other area of software engineering.
For OpenTelemetry metrics, we recommend following these best practices.
Check our samples
We provide you with working instrumentation samples in selected languages as well as sample code in Java for all the OpenTelemetry instruments.
For more information, see OpenTelemetry instrument code samples.
Use descriptive names
Use descriptive names for instruments, keys, and dimensions. Descriptive names enhance observability, especially in case of an error, making root cause analysis more efficient.
For more information, check the OpenTelemetry documentation on metrics semantic conventions.
Use gzip compression
We recommend using gzip
compression.
For more information, see Performance.
Limit payload size
We recommend a maximum payload size of 4 MB (the default limit).
If that limit is exceeded, the entire OTLP message is dropped.
Refer to limits and limitations
Data points can be dropped if, for example, their keys do not match the expected syntax.
For more information, see OpenTelemetry Metrics Limitations.
Use the Collector
The Collector is a vendor-agnostic tool for handling telemetry data.
- It can receive data in numerous formats, process it, and send it out to various backends, including Dynatrace.
- Every Collector component can be defined and enabled with a single YAML file, which reduces the code that has to be maintained.
For more information about the Collector, see Dynatrace Collector
Use a batch processor when using the Collector
We recommend using a batch processor. Batching helps to compress the data and reduce the number of outgoing connections required to transmit data, which helps you avoid being rate limited.
For more information, see Performance.
Use dimensions
Dimensions are used in Dynatrace to help distinguish what is being measured in a specific data point.
For example, if you're measuring the number of requests an endpoint has received, you can use dimensions to split that metric into requests that went through (status code 200) and requests that failed (status code 500).
Your dimensions should be well-annotated (recognizable, readable, understandable), have a descriptive name, and provide good information.
In OpenTelemetry, dimensions are called attributes.
Use the default aggregation
Not all aggregation temporalities are suitable for every instrument.
In most cases, the default aggregation recommended by OpenTelemetry will be the best match, so we recommend using the default aggregation. All OpenTelemetry default aggregations work with Dynatrace out of the box.
It is possible to adjust aggregation and aggregation temporality. However, we recommend using the delta aggregation temporality.
For more information on how OpenTelemetry instruments and metrics are received and mapped in Dynatrace, see Ingest OpenTelemetry metrics.
Semantic conventions
Apply the following guidelines when creating names for your metrics.
Consistency
Be consistent when naming metrics and attributes, which includes nesting associated metrics in a hierarchical structure and sticking to consistent naming for common attributes.
Name reuse
When you rename a metric, try to avoid reusing the old metric name when you subsequently name other metrics.
Units
Metrics that have their unit included in the OpenTelemetry metadata should not include the unit in their name.
Pluralization
The name of a metric should be pluralized only if the unit of the metric in question is a non-unit (such as operations or packets).
For topic-specific semantic conventions, see the Metrics Semantic Conventions OpenTelemetry documentation.
Select a correct instrument
Choosing the correct instrument to report Measurements is critical to achieving better efficiency, easing consumption for the user, and maintaining clarity in the semantics of the metric stream.
OpenTelemetry documentation provides a way of choosing the correct instrument.
Based on your intention, you can apply the following guidelines.
I want to count something
To count is to record the delta value.
- If the value increases monotonically (the delta value is always non-negative), use a
Counter
. - If the value does not increase monotonically (the delta value can be positive, negative, or zero), use an
UpDownCounter
.
I want to measure something
To measure is to report an absolute value.
- If it makes no sense to add up the values across different sets of attributes, use an
Asynchronous Gauge
. - If it makes sense to add up the values across different sets of attributes:
- If the value increases monotonically, use an
Asynchronous Counter
. - If the value does not increase monotonically, use an Asynchronous
UpDownCounter
.
- If the value increases monotonically, use an