Latest Dynatrace
Maximize your cluster resources and reduce costs by identifying and optimizing underutilized workloads. Leverage the Kubernetes app alongside advanced queries in Notebooks, powered by data from Grail, for precise resource allocation suggestions.
This guide is tailored for Ops, DevOps, and DevSecOps professionals managing Kubernetes clusters. A basic understanding of Kubernetes concepts such as resource requests/limits, pods, and nodes is assumed, though expertise in Dynatrace or cluster management is not required.
Strategies applicable for optimizing a specific resource type, such as CPU, can similarly be adapted for others, like memory.
Identify clusters for optimization
Analyze cluster workloads
Optimize workload resources
Address workloads lacking request or limits
Select one or more clusters for detailed analysis, focusing on those with the highest levels of unused CPU resources. This can be accomplished by organizing clusters based on their CPU slack.
Slack refers to the difference between the resources requested (CPU or memory) and those actually utilized:
Ideally, slack should be minimized while maintaining a buffer for fluctuations. Such approaches are discussed in step 3.
Examine the specific workloads within the chosen cluster. Navigate to the workload overview via the cluster's page to observe the CPU slack visually.
Selecting View in workloads list applies a filter for the selected cluster, transitioning the display to an aggregated view of all workloads.
Prior to diving into our workload analysis, it's beneficial to apply another filter to ensure we're only examining healthy workloads. Given that slack is determined by current resource usage, metrics from workloads that are restarting or in an unhealthy or pending state may not accurately reflect typical usage patterns. To focus our analysis on reliable data, we'll add a filter to display only healthy workloads.
Select Add filter next to any existing filters and select Health with Healthy value.
The perspective and sorting preferences established at the cluster level persist, allowing you to quickly identify the workload exhibiting the greatest CPU slack—namely, the cartservice
.
Upon selecting this workload, it becomes apparent that there's a significant gap between the actual usage and the requested resources, and notably, the margin between the request and limit is minimal. Given that the Kubernetes app presents only the most recent data, we must verify these usage figures to ensure they accurately represent typical behavior, rather than being anomalies.
To verify the consistency of these usage patterns, let's examine the historical CPU usage data. This is easily accomplished by transitioning from the Overview tab to the Utilization tab.
From there, we access the detailed data by selecting More (…) > Open in Notebook, which opens the corresponding DQL query in Notebooks.
Now, we'll broaden our analysis window to encompass a larger timeframe, examining the workload's historical data. For this scenario, a seven-day period is selected, given the service's resemblance to a webshop, where usage patterns may vary significantly across the week.
Upon review, we can see that our service's CPU usage is less than 0.1 CPUs (illustrated by the blue line at the chart's lower end), substantially below the 0.8 CPUs requested (represented by the green line at the top).
You can now fine-tune the workload's resource requests and limits effectively. However, there are critical considerations in this process that might not be immediately apparent:
A workload might consist of multiple pods which in turn might consist of multiple containers. In this example the workload consists only of a single pod with a single container, so the workload resources are the same as the resources of that container. If there are multiple pods and/or containers you have to calculate how to adjust the respective requests or limits on the container level accordingly.
You can use Dynatrace to drill down to the pod/container level by navigating from the overview page of the workload and clicking on View in pods list under the pods section. On the overview page of a pod there is the respective View in containers list link.
With these factors in mind, it's evident that our current request and limit settings are likely excessive. Observing minor usage spikes, we opt to reduce the CPU request to 100m
, a conservative figure that still ensures stable performance during sudden increases in demand.
For limits, we choose 200m
, a threshold unlikely to be breached under normal conditions. While it's possible to set even lower limits, given the minimal risk of service interruption with CPU constraints (unlike with memory), we prioritize a cautious approach, allowing for future adjustments based on ongoing performance monitoring.
Excessively stringent limits may cause CPU throttling, even without surpassing the set thresholds. This is due to the Completely Fair Scheduler (CFS) mechanism used for enforcing limits in Kubernetes, which allocates CPU time in quantums of 100 milliseconds. For instance, a limit of 0.4 (or 400m)
implies the process has 40 milliseconds of every 100 milliseconds period to execute, leading to potential wait times if not completed within that window.
Through this optimization, we've reclaimed more than half a CPU core by adjusting a single workload's resources.
Workloads without set resource requests pose a significant challenge in Kubernetes clusters. The allocation of resources through requests is important, as it plays a key role in the scheduling decisions within Kubernetes. When workloads lack specified requests, they can be placed on any node, irrespective of the node's available resources which can strain nodes with limited resources, potentially leading to workload evictions and disruptions.
In reality, however, not all workloads may have these resource requests configured. The good news is that identifying such workloads is straightforward with the use of DQL.
We are using the following query that is based on the table we see in the Kubernetes app:
fetch dt.entity.cloud_application, from: -30m | fields id, workload.name = entity.name, workload.type = arrayFirst(cloudApplicationDeploymentTypes), cluster.id = clustered_by[dt.entity.kubernetes_cluster], namespace.name = namespaceName| lookup [fetch dt.entity.kubernetes_cluster, from: -30m | fields id, cluster.name = entity.name, cluster.distribution = kubernetesDistribution, cluster.cluster_id = kubernetesClusterId | limit 20000], sourceField:cluster.id, lookupField:id, fields:{cluster.name}| fieldsRemove cluster.id| filter cluster.name == "demo-aks-ger-westcentral-asc"| filterOut namespace.name == "kube-system"| lookup [timeseries values = sum(dt.kubernetes.container.requests_CPU), by:{dt.entity.cloud_application}, from: -2m, filter: dt.kubernetes.container.type == "app"| fieldsAdd requests_CPU = arrayFirst(values)| limit 20000], sourceField:id, lookupField:dt.entity.cloud_application, fields:{requests_CPU}| lookup [timeseries values = sum(dt.kubernetes.container.requests_memory), by:{dt.entity.cloud_application}, from: -2m, filter: dt.kubernetes.container.type == "app"| fieldsAdd requests_memory = arrayFirst(values)| limit 20000], sourceField:id, lookupField:dt.entity.cloud_application, fields:{requests_memory}| filter isNull(requests_CPU) or isNull(requests_memory)
This query generates a list of all workloads lacking either CPU or memory requests. It specifically targets our cluster and excludes any entities from the kube-system
namespace, acknowledging that certain workloads must be scheduled regardless of a node's resource availability.
The result might look like this: