Sample data
A distributed application under heavy load may generate a massive amount of observability data. This data incurs generation, processing, transmission, and storage costs. However, it's often possible to use sampling—where you use only a relatively small portion of the observability data and drop the rest—to reduce costs and still effectively monitor your application.
In OpenTelemetry, there are two main sampling methods:
-
Head sampling is done within your application by the OpenTelemetry SDK, and typically involves saving a random sample of transactions.
Head sampling is simple and effective, but it has important limitations. For example, because the sampling decision needs to be made at the start of the transaction, it can't be affected by anything that happens after that point.
-
Tail sampling is used to make sampling decisions based on information unknown at the start of the transaction.
In OpenTelemetry, tail sampling is typically done with the Collector by temporarily storing the full set of monitoring data until a transaction is completed. The Collector then decides to either save or drop the transaction data based on a set of sampling policies.
Because tail sampling typically is not random, it's important to ensure that any calculated metrics are unbiased. This can be done by calculating metrics from the full set of transactions, as shown below, or from a separate, randomly sampled stream.
The following configuration example shows how to configure a Collector instance to sample trace data and import it as an OTLP request into Dynatrace. It uses the spanmetrics
connector to compute service metrics from traces before sampling in order to ensure their accuracy.
Prerequisites
- One of the following Collector distributions with the
transform
,filter
, andtail_sampling
 processors, and thespanmetrics
 connector:- The Dynatrace Collector
- The OpenTelemetry Contrib distribution
- A custom Builder version
- The Dynatrace API endpoint URL to which the data should be exported
- An API token with the relevant access scope (only required for SaaS and ActiveGate)
Demo configuration
receivers:otlp:protocols:grpc:endpoint: 0.0.0.0:4317http:endpoint: 0.0.0.0:4318processors:transform:metric_statements:- context: metricstatements:# Get count from the histogram. The new metric name will be <histogram_name>_count- extract_count_metric(true) where type == METRIC_DATA_TYPE_HISTOGRAM# Get sum from the histogram. The new metric name will be <histogram_name>_sum- extract_sum_metric(true) where type == METRIC_DATA_TYPE_HISTOGRAMfilter:metrics:metric:# The Dynatrace OTLP metrics ingest doesn't currently support histograms- type == METRIC_DATA_TYPE_HISTOGRAMtransform/spanmetrics:metric_statements:- context: metricstatements:# Map the units to something that explicitly counts them in Dynatrace.- set(unit, "{requests}") where IsMatch(name, "^requests.duration_count")- set(unit, "{requests}") where IsMatch(name, "^requests.calls")tail_sampling:# This configuration keeps errors, traces longer than 500ms, and 20% of all remaining traces.# Adjust with policies of your choice.policies:- name: policy1-keep-errorstype: status_codestatus_code: {status_codes: [ERROR, UNSET]}- name: policy2-keep-slow-tracestype: latencylatency: {threshold_ms: 500}- name: policy3-keep-random-sampletype: probabilisticprobabilistic: {sampling_percentage: 20}decision_wait: 30sconnectors:spanmetrics:aggregation_temporality: "AGGREGATION_TEMPORALITY_DELTA"namespace: "requests"metrics_flush_interval: 15sexporters:otlphttp:endpoint: ${env:DT_ENDPOINT}headers:Authorization: Api-Token ${env:DT_API_TOKEN}service:pipelines:traces:receivers: [otlp]processors: [tail_sampling]exporters: [otlphttp]traces/spanmetrics:receivers: [otlp]processors: []exporters: [spanmetrics]metrics:receivers: [spanmetrics]processors: [transform, filter, transform/spanmetrics]exporters: [otlphttp]
Validate your settings to avoid any configuration issues.
Components
For our configuration, we configure the following components.
Receivers
Under receivers
, we specify the standard otlp
receiver as active receiver component for our Collector instance and configure it to accept OTLP requests on gRPC and HTTP.
Processors
transform
 to compute the desired sum and count values of the histograms. For details see Compute histogram summariesfilter
 to drop the existing histogram metrics (based ontype
) and avoid histogram-related error messages.transform/spanmetrics
 to set appropriate units on span metrics.tail_sampling
 to sample distributed traces based on properties of the trace.
Connectors
Under connectors
, we specify the spanmetrics
connector to compute service metrics from spans.
Exporters
Under exporters
, we specify the default otlphttp
exporter and configure it with our Dynatrace API URL and the required authentication token.
For this purpose, we set the following two environment variables and reference them in the configuration values for endpoint
and Authorization
.
DT_ENDPOINT
contains the base URL of the Dynatrace API endpoint (for example,https://{your-environment-id}.live.dynatrace.com/api/v2/otlp
)DT_API_TOKEN
contains the API token
Service pipelines
Under service
, we assemble three pipelines:
traces
assembles the OTLP receiver, tail sampling processor, andotlphttp
exporter to send sampled spans to Dynatrace.traces/spanmetrics
uses the same OTLP receiver and thespanmetrics
connector to compute service metrics from received spans, without sampling, and forwards the computed metrics tometrics
.metrics
uses thetransform
,filter
, andtransform/spanmetrics
processors to format metrics for Dynatrace metric ingest before sending metrics to Dynatrace using theotlphttp
exporter.
OpenTelemetry sampling considerations
Mixed-mode sampling
OpenTelemetry and OneAgent use incompatible approaches to sampling that should not be mixed. If a distributed trace, which may include multiple applications and services, only partially utilizes either method, it's likely to result in inconsistent results and incomplete distributed traces. Each distributed trace should be sampled by only one of the methods to ensure it's captured in its entirety.
Trace-derived service metrics
Dynatrace trace-derived metrics are calculated from trace data after it's ingested into Dynatrace.
If OpenTelemetry traces are sampled, the trace-derived metrics are calculated only from the sampled subset of trace data. This means that some trace-derived metrics might be biased or incorrect.
For example, a probabilistic sampler that saves 5% of traffic will result in a throughput metric that shows 5% of the actual throughput. If you use OpenTelemetry tail-based sampling to also capture 100% of slow or error traces, your service metrics will not only show incorrect throughput, but will also incorrectly bias error rates and response times.
To mitigate this, if you want to sample OpenTelemetry traces, you should calculate service metrics before sampling and use those metrics rather than the trace-derived metrics calculated by Dynatrace. If you're using the Collector for sampling, trace-derived metrics should be calculated by the Collector before applying sampling, or by the SDK. This can be done with the spanmetrics
connector as shown in the example above.
Limits and limitations
Data is ingested using the OpenTelemetry protocol (OTLP) via the Dynatrace OTLP APIs and is subject to the API's limits and restrictions. For more information see: