Properly configured resource limits ensure optimal performance and stability of Dynatrace Operator components while preventing resource contention in your Kubernetes cluster. This guide helps you understand how to set appropriate resource limits based on your environment size and usage patterns.
The provided default resource limits have been validated through performance testing. These defaults performed well in the following environment:
The following five key indicators influence resource consumption across different Dynatrace Operator components:
| Indicator | Dynatrace Operator | Webhook | CSI driver |
|---|---|---|---|
| Namespaces | |||
| Nodes | |||
| DynaKubes | |||
| Pods | |||
| Number of OneAgent versions |
By default, Dynatrace Operator monitors host availability in your cluster to detect expected removal of host OneAgent pods especially in scaling scenarios. This monitoring is not necessary in serverless environments.
This functionality is present in all versions of the Operator before 1.6.0 and it can't be turned off.
It can be turned off in 1.6.3 and 1.7.3 or in newer Operator versions.
applicationMonitoring
You can reduce the Operator's resource consumption by disabling host availability detection if your DynaKubes:
To disable host availability detection, set the following Helm value:
operator:hostAvailabilityDetection: false
This optimization can reduce CPU and memory usage of the Operator, especially in large clusters with many nodes.
Cluster-wide setting: operator.hostAvailabilityDetection affects all DynaKubes managed by the Operator. Only disable this if you are certain that none of your DynaKubes require host-based monitoring. Disabling it when host OneAgents are required can cause false-positive host missing warnings during node scaling or other node-related operations.
While the default resource limits should be sufficient for most use cases, you can customize them based on your specific needs.
Modify values.yaml to set resource limits for Dynatrace Operator, WebHook, or Dynatrace Operator CSI driver.
Dynatrace Operator
operator:requests:cpu: 50mmemory: 64Milimits:cpu: 100mmemory: 128Mi
WebHook
webhook:requests:cpu: 300mmemory: 128Milimits:cpu: 300mmemory: 128Mi
CSI driver
csidriver:csiInit:resources:requests:cpu: 50mmemory: 100Milimits:cpu: 50mmemory: 100Miserver:resources:requests:cpu: 50mmemory: 100Milimits:cpu: 50mmemory: 100Miprovisioner:resources:requests:cpu: 300mmemory: 100Mijob:resources:requests:cpu: 200mmemory: 30Miregistrar:resources:requests:cpu: 20mmemory: 30Milimits:cpu: 20mmemory: 30Milivenessprobe:resources:requests:cpu: 20mmemory: 30Milimits:cpu: 20mmemory: 30Mi
The CSI driver provisioner and job components do not have default resource limits specified. This allows them to use additional resources when available, improving performance.
The job component is only used with node-image-pull.
If you set limits for these components, ensure they are high enough to avoid the following scenarios:
The default resource requests and limits are designed for medium-scale environments. Use the following guidelines to adjust limits based on your environment size. These are starting recommendations—-always monitor actual resource usage in your environment and adjust accordingly.
Pod Quality of Service Classes: Some components have their limits and requests set to the same value to ensure a Guaranteed Pod Quality of Service. When scaling the limits of such components, always scale the requests proportionally as well.
Proportional fairness: Having low CPU requests on a container can cause throttling due to the CPU management policy of the node. The requests serve as a minimum guarantee. On a heavily utilized node, therefore, containers with smaller requests will be throttled more compared to containers with larger requests, regardless of their limit. When scaling the limits, always consider scaling the requests as well.
The CSI driver job resource requests and limits do not need to scale based on environment size. They are independent of node count, pod count, and DynaKube count. However, you can adjust the CPU request to control how quickly the job completes, which determines how soon the CSI driver is ready to mount volumes:
| CPU request | Approximate completion time |
|---|---|
| 100m | ~1 min |
| 200m (default) | ~30 sec |
| 300m | ~25 sec |
This only affects the job's Running duration. It doesn't impact ContainerCreation or PodScheduling times.
Increase the default requests/limits by 50–100%:
Increase the default requests/limits by 100–200%: