Set resource limits for Dynatrace Operator components
2-min read
Properly configured resource limits ensure optimal performance and stability of Dynatrace Operator components while preventing resource contention in your Kubernetes cluster. This guide helps you understand how to set appropriate resource limits based on your environment size and usage patterns.
Default resource limits baseline
The provided default resource limits have been validated through performance testing. These defaults performed well in the following environment:
25 nodes (e2-standard-32 nodetype on Google Kubernetes Engine)
20 DynaKubes
2,500 namespaces
5,000 pods
Resource consumption factors
The following five key indicators influence resource consumption across different Dynatrace Operator components:
Indicator
Dynatrace Operator
Webhook
CSI driver
Namespaces
Nodes
DynaKubes
Pods
Number of OneAgent versions
Understanding the impact indicators
Namespaces: More namespaces increase the workload for the Operator and webhook as they need to monitor and manage resources across all namespaces.
Impact:
Increases CPU/memory usage of Operator
Increases CPU usage of webhook
Nodes: Additional nodes require more resources as the Operator keeps a list of all available nodes on the Kubernetes clusters and verifies that they match the available hosts on the Dynatrace server.
Impact:
Increases CPU/memory usage of Operator
DynaKubes: Each DynaKube resource represents a separate Dynatrace deployment that needs individual management.
Impact:
Increases CPU/memory usage of Operator
Increases CPU/memory usage of webhook
Increases CPU/memory usage of CSI driver provisioner
Pods: The webhook processes admission requests for every pod, while the CSI driver handles volume mounting for pods using OneAgent.
Impact:
Increases CPU/memory usage of CSI driver server/liveness-probe/registrar
OneAgent versions: The CSI driver needs to manage and provide access to different OneAgent versions, requiring additional storage and processing resources.
Impact:
Increases CPU/memory usage of CSI driver provisioner
Minimize impacts of large number of Node
By default, Dynatrace Operator monitors host availability in your cluster to detect expected removal of host OneAgent pods especially in scaling scenarios. This monitoring is not necessary in serverless environments.
When to disable host availability detection
This functionality is present in all versions of the Operator before 1.6.0 and it can't be turned off.
It can be turned off in 1.6.3 and 1.7.3 or in newer Operator versions.
applicationMonitoring
You can reduce the Operator's resource consumption by disabling host availability detection if your DynaKubes:
only use application-only monitoring modes and
do not use any of the host-based monitoring features.
How to disable host availability detection
To disable host availability detection, set the following Helm value:
operator:
hostAvailabilityDetection:false
This optimization can reduce CPU and memory usage of the Operator, especially in large clusters with many nodes.
Cluster-wide setting: operator.hostAvailabilityDetection affects all DynaKubes managed by the Operator. Only disable this if you are certain that none of your DynaKubes require host-based monitoring. Disabling it when host OneAgents are required can cause false-positive host missing warnings during node scaling or other node-related operations.
Customize resource limits
While the default resource limits should be sufficient for most use cases, you can customize them based on your specific needs.
Dynatrace Operator
operator:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
WebHook
webhook:
requests:
cpu: 300m
memory: 128Mi
limits:
cpu: 300m
memory: 128Mi
CSI driver
csidriver:
csiInit:
resources:
requests:
cpu: 50m
memory: 100Mi
limits:
cpu: 50m
memory: 100Mi
server:
resources:
requests:
cpu: 50m
memory: 100Mi
limits:
cpu: 50m
memory: 100Mi
provisioner:
resources:
requests:
cpu: 300m
memory: 100Mi
limits:
cpu: 300m
memory: 100Mi
registrar:
resources:
requests:
cpu: 20m
memory: 30Mi
limits:
cpu: 20m
memory: 30Mi
livenessprobe:
resources:
requests:
cpu: 20m
memory: 30Mi
limits:
cpu: 20m
memory: 30Mi
Dynatrace Operator
spec:
template:
spec:
containers:
-name: dynatrace-operator
resources:
requests:
cpu: 50m
memory: 64Mi
limits:
cpu: 100m
memory: 128Mi
WebHook
spec:
template:
spec:
containers:
-name: webhook
resources:
requests:
cpu: 300m
memory: 128Mi
limits:
cpu: 300m
memory: 128Mi
CSI driver
csi-init
spec:
template:
spec:
initContainers:
-name: csi-init
resources:
requests:
cpu: 50m
memory: 100Mi
limits:
cpu: 50m
memory: 100Mi
server
spec:
template:
spec:
containers:
-name: server
resources:
limits:
cpu: 50m
memory: 100Mi
requests:
cpu: 50m
memory: 100Mi
provisioner
spec:
template:
spec:
containers:
-name: provisioner
resources:
limits:
cpu: 300m
memory: 100Mi
requests:
cpu: 300m
memory: 100Mi
registrar
spec:
template:
spec:
containers:
-name: registrar
resources:
limits:
cpu: 20m
memory: 30Mi
requests:
cpu: 20m
memory: 30Mi
liveness-probe
spec:
template:
spec:
containers:
-name: liveness-probe
resources:
limits:
cpu: 20m
memory: 30Mi
requests:
cpu: 20m
memory: 30Mi
Scaling resource limits for different environments
The default resource requests/limits are designed for medium-scale environments. Use the following guidelines to adjust limits based on your environment size:
These are starting recommendations. Always monitor actual resource usage in your environment and adjust accordingly.
Pod Quality of Service Classes: Some components have their limits and requests set to the same value to ensure a Guaranteed Pod Quality of Service. When scaling the limits of such components, always scale the requests proportionally as well.
Proportional fairness: Having low CPU requests on a container can cause throttling due to the CPU management policy of the node. The requests serve as a minimum guarantee. On a heavily utilized node, therefore, containers with smaller requests will be throttled more compared to containers with larger requests, regardless of their limit. When scaling the limits, always consider scaling the requests as well.