Adaptive Traffic Management with Dynatrace Platform Subscription (DPS)

Adaptive Traffic Management manages the sampling rate dynamically and targets a specific trace data volume. This volume scales according to the amount of Full-Stack GiB-hours of memory connected to your environment.

Before you begin

How does Adaptive Traffic Management work?

Dynatrace Full-Stack Monitoring packages a variety of features, including fully automatic distributed tracing. Each monitored application or microservice is constantly monitored and the Dynatrace code module collects distributed traces, containing code-level and business insights, that are sent to Dynatrace.

Full-Stack Monitoring includes a defined amount of trace data volume. This volume depends on the amount of Full-Stack GiB-hours of memory connected to Dynatrace. Every contributing gibibyte of host or application memory adds a certain amount of trace volume ingest rate to your environment.

In a dynamic cloud environment this can change all the time. Adaptive Traffic Management automatically adjusts the sampling rate of trace data collection so that the collected trace data doesn't exceed the included trace volume in 15-minute intervals. This way Dynatrace guarantees that no overage of trace data ingest is produced without your explicit consent.

In many cases this results in a trace data capture rate of 100% of all possible traces. However, depending on the monitored applications and the configuration of certain features in your environment, your capture rate might be lower.

Differences from other sampling mechanisms

  • Sampling rate distribution

    Most sampling mechanisms manage traffic on the agent in an isolated and uncoordinated manner. In Dynatrace, the trace volume is dynamically shared between all monitored applications in the environment. In a sense, low-volume applications share their unused trace volume with high-volume applications that need it.

  • Sampling rate scenarios

    In static sampling systems, you configure a fixed sampling rate for your deployment and distribute it across your deployment to apply different rates to different scenarios. Adaptive Traffic Management automatically efficiently captures requests; you can adjust the default logic and configure the capturing rate for specific requests via URL-based sampling.

  • Costs associated with captured data

    In static sampling systems, the amount of captured data depends on the amount of transactions executed in your system, which is indetermined, therefore the associated costs are often hard to predict. With Adaptive Traffic Management, the cost is determined by your license and capturing scales with it.

Capturing logic

We recognize that the distribution of requests and their relevance to your observability goals is not even. It's rather a combination of: a large number of unique URLs, a medium number of important requests, and, finally, a few kinds of requests that make up the majority of the traffic (for example, image requests or status checks).

OneAgent first calculates a list of top requests starting each minute, from which it then captures:

  • Most traces of unique and rare requests.
  • A significant but lower volume of highly frequent requests.

The trace volume is dynamically shared on the environment level between all monitored applications. In a sense, low-volume applications share their unused trace volume with high-volume applications that need it. This ensures that you use your trace volume most effectively. Because the sampling is not random, important data is captured while maintaining a statistically valid sample set.

The following table represents a top-request calculation example, along with the respective capture rates.

RequestNumber of requests processed by the application (per minute)Capture factorCaptured end-to-end distributed traces (per minute)

URI A

900

1/2

450

URI B

440

1/2

220

URI C

250

1

250

URI D

60

1

60

…50 other URIs

100

1

100

Total:

1500

1080

In this example, OneAgent can capture a bit more than 1,000 requests per minute, according to the amount of Full-Stack GiB-hours of memory connected to the environment. Adaptive Traffic Management adjusts and communicates to OneAgent the capture rate for each URI depending on:

  • The amount of trace data that is being ingested.
  • The included trace volume available at the time (according to the GiB-hours of connected memory).

OneAgent continues to capture end-to-end transactions every minute, however, every 15 minutes,

  • If the used trace volume is below the included trace volume and the capture rate is not 100%, Adaptive Traffic Management increases OneAgent capture rate to capture more requests per minute.
  • If the trace volume is above the included trace volume, Adaptive Traffic Management reduces OneAgent capture rate for less important URIs (URIs A and B) in order to capture less requests per minute, until the used trace volume is no longer exceeding the included volume.

Trace volume calculation

The Full-Stack included trace volume is measured in bytes per minute and is calculated based on the number of gibibytes that contribute to your environment's GiB-hour.

Each environment can process a minimum trace volume. For each contributing gibibyte, the environment peak trace volume is increased by a number of kibibytes per minute.

Every 15 minutes, the peak trace volume is calculated and automatically adjusted based on the average of contributing gibibytes in the previous 15-minute interval.

With Dynatrace Platform Subscription (DPS), all features (especially data-heavy ones like bind variables capture) are available.

To learn more about the trace included volume, see Full-Stack Monitoring.

Effects

If OneAgent is sampling and not all requests are captured, then captured traces point out that similar requests have not been captured with the message [number of traces] x. You can see it by expanding the trace in the Distributed Tracing Distributed Tracing list.

Monitoring

To monitor your environment trace capture rate and volume ingress, go to Dashboards and select the ready-made dashboard Full-Stack Adaptive Traffic Management and trace capture.

Adaptive Traffic Management dashboard with Dynatrace Platform Subscription (DPS)

TileDescription

Request capture rate

Captured requests, as a percentage of the total number of transactions processed by OneAgent monitored application or host.

Trace capture rate

Captured traces, as a percentage of the total number of observed end-to-end transactions processed by OneAgent monitored application or host. Note that the trace capture rate might be lower than the request capture rate because a single trace might consist of multiple requests.

Full stack trace data volume

Amount of trace data ingested from Full-stack monitored applications or hosts. The chart includes

  • The trace data volume captured by OneAgent and regulated by Adaptive Traffic Management (fullstack-adaptive-ingested_bytes_sum; green bar).
  • The included trace volume based on the contributing Full-stack memory-gibibytes (included_limit; blue line).
  • The OpenTelemetry trace data volume ingested from Full-stack monitored applications or hosts and not regulated by Adaptive Traffic Management. It can exceed the included limit and the excess will be charged.

Full-Stack trace volume used

Ingested trace volume, as a percentage of your licensed Full-Stack included trace volume. Adaptive Traffic management keeps it around the Full-Stack included limit. Dynatrace's algorithm accounts for a degree of fluctuation and the used trace volume can be above 100% without extra charges, unless you opted for Extended trace ingest on top of Full-Stack Monitoring.

Average size of Full-Stack spans

Average size of spans ingested from Full-stack monitored applications or hosts. Typical values are in the 1.5-2 KiB range; if the span size is larger and the used trace volume is high (or the trace capture rat is low), you might be capturing a lot of data per span.

Adaptive trace volume per contributing memory-gibibytes per minute

Average trace volume every 15 minutes (trace_volume_per_gibh; green bar). Full-Stack Monitoring threshold starts from 200 KiB/min.

Full-Stack trace ingest and billable extended ingest

The reletionship between the amount of ingested trace data (included_ingested_byte_sum; green bar) and the included trace volume (included_limit; blue line).

If you opted for Extended trace ingest on top of Full-Stack Monitoring,

  • Adaptive Traffic Management adjustes the trace ingest over the configured limit (configured_limit; red line) instead of the included limit.
  • The chart includes the extended trace volume charged via Trace Ingest & Process (billingAmount; orange bar).

Frequently asked questions

Usually not at all.

The shaping of traffic is accounted for transparently and done in a way that ensures statistical validity while capturing rare requests with high probability. All charts show the total number of requests that your application processes that should be accurate or have a very high statistical validity. The same is true for all ad-hoc analyses. You will not see a difference in charts or service call analysis data unless you're looking at a single distributed trace.

No, Adaptive Traffic Management focuses only on the number of traces. Neither service settings nor (global) request settings are modified by Adaptive Traffic Management. Depending on the capture rate and sampling, a low-volume or unique request might not be captured. Service settings such as request naming rules and key request settings will apply only to captured traces.

Yes, in a few cases, as service monitoring metrics are based on captured traces. The following are some known effects.

  • For low-frequency requests in high-volume environments, sampling and a low capture rate can impact the accuracy of metrics. Due to the low frequency of the requests, traces might be captured in a lower volume or not be captured at all. Consequentially some metrics values can't be collected. Note that this is reflected in service metric calculations to avoid distortions in charts.
  • Because every single request is accounted for in charts with high resolution and in short timeframes, for high-volume services, sampling and a low capture rate might impact the accuracy of metrics such as request count or error count. Conversely, the accuracy will statistically be better in charts with low resolutions and long timeframes.

If the OneAgent capture rate is below 100%, sampling has been applied because the amount of traces that can be captured by OneAgent has exceeded the Full-Stack included trace volume. There are several things you can do to increase the capture rate:

  • Verify what is currently being captured and reduce the rate for traces and requests of lower relevance. Start by looking at the following:

    • Excessive custom services

      Custom services with poor configurations can lead to a high number of full-service calls and increase the trace ingress volume. If custom services are consuming a considerable amount of the trace volume, revisit the configuration to reduce the amount of capture custom-service calls.

    • Background activity services

      In certain environments, background activity produces a lot of service calls but adds little value. To disable this feature on an environment-wide level, go to Settings > OneAgent features and turn off BBackground Requests for Services (HTTP/GRPC) and Background Requests for Services (Messaging).

    • High number of low-value traces

      In all environments, there are transactions for which traces are of lower value. You can exclude from capture:

      • Specific transactions such as ping and health check traces. To disable tracing for specific traces, go to Settings and select Server-side service monitoring > URL-based sampling.
      • Entire processes such as log forwarders written in a runtime for which deep monitoring by OneAgent is available. To disable tracing for specific traces, go to Settings > Server-side service monitoring > Deep monitoring.
    • Data-heavy features

      Data-heavy features can reduce the capture rate. You can

      • Disable bind variables
      • Reduce the span size, by reducing the number of request attribute or configuring URL-based sampling to reduce or completely exclude capture of certain transactions.
  • Extend the trace ingest as a billed option on top of Full-Stack availability. To learn how to, see Extended trace ingest on top of Full-Stack Monitoring.