Adaptive Traffic Management with Classic license

Adaptive Traffic Management manages the sampling rate dynamically and targets a specific peak trace data volume. This volume scales with the number of active host units in your Dynatrace Full-Stack Classic license.

Before you begin

Full-service call

A server-side call that starts a distributed trace, a service call at a deep monitored tier, or a custom service call. A single distributed trace can contain multiple full-service calls. Full-service calls include all requests for web request services and web services (except for external ones), RMI services, messaging services, and custom services. External requests (such as database calls, external web requests, or generally any opaque service call) are not full-service calls, and so aren't counted against your traffic limit. The minimum number of full-service calls per minute in a given environment is 5,000 (the equivalent of 20 host units). Each process can start between 50 and 50,000 full-service calls per minute.

Active host units

Host units currently in use and connected to the environment (not the host units assigned to the environment).

Assigned host units

Host units currently assigned and connected to the environment, but not necessarily in use.

How does Adaptive Traffic Management work?

Dynatrace Full-Stack Monitoring packages a variety of features, including fully automatic distributed tracing. Each monitored application or microservice is constantly monitored and the Dynatrace code module collects distributed traces, containing code-level and business insights, that are sent to Dynatrace.

Full-Stack Monitoring includes a trace data volume. Depending on the number of application transactions, OneAgent captures end-to-end traces up to a peak trace volume, which is defined per environment by your license. When the volume of transactions is high, the amount of traces that can be captured by OneAgent might exceed the peak trace volume available in your environment, or in other words, there are not enough active host units connected to your environment to capture all traces.

When this happens, OneAgent starts sampling new incoming traces that have a trace root span. It samples incoming traces in the most effective way possible, via the intelligent mechanism of Adaptive Traffic Management, stopping overages and consequentially saving a lot of network bandwidth.

The resulting capture rate is defined as the OneAgent capture rate. While not all possible traces might be captured, any trace that is captured represents a full end-to-end transaction.

Differences from other sampling mechanisms

  • Sampling rate distribution

    Most sampling mechanisms manage traffic on the agent in an isolated and uncoordinated manner. In Dynatrace, the trace volume is dynamically shared between all monitored applications in the environment. In a sense, low-volume applications share their unused trace volume with high-volume applications that need it.

  • Sampling rate scenarios

    In static sampling systems, you configure a fixed sampling rate for your deployment and distribute it across your deployment to apply different rates to different scenarios. Adaptive Traffic Management automatically efficiently captures requests; you can adjust the default logic and configure the capturing rate for specific requests via URL-based sampling.

  • Costs associated with captured data

    In static sampling systems, the amount of captured data depends on the amount of transactions executed in your system, which cannot be determined in advance. Therefore, the associated costs are often hard to predict. With Adaptive Traffic Management, the cost is determined by your license and capturing scales with it.

Capturing logic

We recognize that the distribution of requests and their relevance to your observability goals is not even. It's rather a combination of: a large number of unique URLs, a medium number of important requests, and, finally, a few kinds of requests that make up the majority of the traffic (for example, image requests or status checks).

OneAgent first calculates a list of top requests starting each minute, from which it then captures:

  • Most traces of unique and rare requests.
  • A significant but lower volume of highly frequent requests.

The trace volume is dynamically shared on the environment level between all monitored applications. In a sense, low-volume applications share their unused trace volume with high-volume applications that need it. This ensures that you use your trace volume most effectively. Because the sampling is not random, important data is captured while maintaining a statistically valid sample set.

The following table represents a top-request calculation example, along with the respective capture rates.

Request

Number of requests processed by the application (per minute)

Capture factor

Captured distributed traces (per minute)

URI A

900

1/2

450

URI B

440

1/2

220

URI C

250

1

250

URI D

60

1

60

…50 other URIs

100

1

100

Total:

1500

1080

In this example, OneAgent can capture a bit more than 1,000 requests per minute, according to the amount of active host units connected to the license. Adaptive Traffic Management adjusts the capture rate for each URI to meet the target. Depending on the capture factor, URIs are captured each time the application processes them (URIs C, D, and 50 other URIs) or half of the time (URIs A and B). In both cases the requests are captured end-to-end.

Peak trace volume calculation

The peak trace volume is measured in full-service calls per minute and is calculated based on the number of active host units in your environment.

Each environment processes a minimum trace volume. For each active host unit, the environment peak trace volume is increased by a fixed value.

Every 15 minutes, the peak trace volume is calculated and automatically adjusted based on the current number of active host units.

To learn more about the trace included volume of Full-Stack, see Application and Infrastructure Monitoring.

Effects

If OneAgent is sampling and not all requests are captured, then captured traces point out that similar requests have not been captured with the message [number of traces] similar trace. You can see it by expanding the trace in Distibuted Traces Classic Distributed Traces Classic list.

Monitoring

To monitor your environment trace capture rate and volume ingress, go to Dashboards or Dashboards Classic (latest Dynatrace) and select the OneAgent Traces - Adaptive traffic management (Classic License) dashboard.

Monitoring dashboard for ATMv2 Classic License

TileDescription

Dynatrace process rate

Percent of full-service calls processed by OneAgent over all full-service calls received by the environment. It represents the environment's health.

If the value is continuously below 90%, Please contact a Dynatrace product expert via live chat within your environment.

OneAgent capture rate

Percent of traces captured end-to-end by OneAgent over all traces received by the environment. Values below 100% indicate sampling was applied to reduce trace volume down to within licensed limits.

Captured full service calls

Indicates the amount of received full-service calls (blue), the peak trace volume (red), and the overall potentially traceable service calls (green) processed by the OneAgent over time1.

Size of full-service calls

Amount of data per service call2 over time. Typical values are around 2-3 KiB per environment.

FSC/HU

Number of full-service calls per active host unit. Typical values are around 250.

1

Values above the licensed limit indicate overages, for which sampling is triggered. After sampling is applied, values return to within licensed limits and the OneAgent capture rate will be below 100%.

2

Service call data includes request attributes, span attributes, HTTP headers, and bind variables.

Frequently asked questions

Usually not at all.

The shaping of traffic is accounted for transparently and done in a way that ensures statistical validity while capturing rare requests with high probability. All charts show the total number of requests that your application processes that should be accurate or have a very high statistical validity. The same is true for all ad-hoc analyses. You will not see a difference in charts or service call analysis data unless you're looking at a single distributed trace.

No, Adaptive Traffic Management focuses only on the number of traces. Neither service settings nor (global) request settings are modified by Adaptive Traffic Management. Depending on the capture rate and sampling, a low-volume or unique request might not be captured. Service settings such as request naming rules and key request settings will apply only to captured traces.

Yes, in a few cases, as service monitoring metrics are based on captured traces. The following are some known effects.

  • For low-frequency requests in high-volume environments, sampling and a low capture rate can impact the accuracy of metrics. Due to the low frequency of the requests, traces might be captured in a lower volume or not be captured at all. Consequentially some metrics values can't be collected. Note that this is reflected in service metric calculations to avoid distortions in charts.
  • Because every single request is accounted for in charts with high resolution and in short timeframes, for high-volume services, sampling and a low capture rate might impact the accuracy of metrics such as request count or error count. Conversely, the accuracy will statistically be better in charts with low resolutions and long timeframes.

In Adaptive Traffic Management with Classic license, the calculation of peak trace volume is based on the currently active host units. To use all the host units that are assigned to the environment, contact a Dynatrace product expert via live chat and provide a rationale.

This is an environment-wide change, so you need Dynatrace administrator permissions to turn this feature on.

If OneAgent capture rate is below 100%, sampling has been applied because the amount of traces that can be captured by OneAgent has exceeded the Full-Stack included trace volume. There are several things you can do to increase the capture rate:

  • Verify what is currently being captured and reduce the rate for traces and requests of lower relevance. Start by looking at the following:

    • Excessive custom services

      Custom services with poor configurations can lead to a high number of full-service calls and increase the trace ingress volume. If custom services are consuming a considerable amount of the trace volume, revisit the configuration to reduce the amount of capture custom-service calls.

    • Background activity services

      In certain environments, background activity produces a lot of service calls but adds little value. To disable this feature on an environment-wide level, go to Settings > OneAgent features and turn off BBackground Requests for Services (HTTP/GRPC) and Background Requests for Services (Messaging).

    • High number of low-value traces

      In all environments, there are transactions for which traces are of lower value. You can exclude the following transactions from capture:

      • Specific transactions such as ping and health check traces. To disable tracing for specific traces, go to Settings > Server-side service monitoring > Deep monitoring.

        • Entire processes (such as log forwarders) that are written in a runtime for which deep monitoring by OneAgent is available. To disable tracing for entire processes, go to the process-level Settings > Deep monitoring.