Connectivity issues between Dynatrace and Kubernetes cluster
This guide explores common issues that may arise when monitoring Kubernetes with Dynatrace. It provides troubleshooting steps for various scenarios, such as pods getting stuck in the Terminating
state after an upgrade, inability to retrieve the complete list of server APIs, and encountering a CrashLoopBackOff
error when trying to downgrade OneAgent.
Problem with ActiveGate token
Example error on the ActiveGate deployment status page:
Problem with ActiveGate token (reason:Absent)
Example error on Dynatrace Operator logs:
1{"level":"info","ts":"2022-09-22T06:49:17.351Z","logger":"dynakube-controller","msg":"reconciling DynaKube","namespace":"dynatrace","name":"dynakube"}2{"level":"info","ts":"2022-09-22T06:49:17.502Z","logger":"dynakube-controller","msg":"problem with token detected","dynakube":"dynakube","token":"APIToken","msg":"Token on secret dynatrace:dynakube missing scopes [activeGateTokenManagement.create]"}
Example error on DynaKube status:
1status:2 ...3 conditions:4 - message: Token on secret dynatrace:dynakube missing scopes [activeGateTokenManagement.create]5 reason: TokenScopeMissing6 status: "False"7 type: APIToken
Starting Dynatrace Operator version 0.9.0, Dynatrace Operator handles the ActiveGate token by default. If you're getting one of these errors, follow the instructions below, according to your Dynatrace Operator version.
- For Dynatrace Operator versions earlier than 0.7.0: you need to upgrade to the latest Dynatrace Operator version.
- For Dynatrace Operator version 0.7.0 or later, but earlier than version 0.9.0: you need to create a new access token. For instructions, see Tokens and permissions required: Dynatrace Operator token.
ImagePullBackoff
error on OneAgent and ActiveGate pods
The underlying host's container runtime doesn't contain the certificate presented by your endpoint.
The skipCertCheck
field in the DynaKube YAML doesn't control this certificate check.
Example error (the error message may vary):
1desc = failed to pull and unpack image "<environment>/linux/activegate:latest": failed to resolve reference "<environment>/linux/activegate:latest": failed to do request: Head "<environment>/linux/activegate/manifests/latest": x509: certificate signed by unknown authority2Warning Failed ... Error: ErrImagePull3Normal BackOff ... Back-off pulling image "<environment>/linux/activegate:latest"4Warning Failed ... Error: ImagePullBackOff
In this example, if the description on your pod shows x509: certificate signed by unknown authority
, you must fix the certificates on your Kubernetes hosts, or use the private repository configuration to store the images.
There was an error with the TLS handshake
The certificate for the communication is invalid or expired. If you're using a self-signed certificate, check the mitigation procedures for the ActiveGate.
Invalid bearer token
The bearer token is invalid and the request has been rejected by the Kubernetes API. Verify the bearer token. Make sure it doesn't contain any whitespaces. If you're connecting to a Kubernetes cluster API via a centralized external role-based access control (RBAC), consult the documentation of the Kubernetes cluster manager. For Rancher, see the guidelines on the official Rancher website.
Could not check credentials. Process is started by other user
There is already a request pending for this integration with an ActiveGate. Wait for a couple minutes and check back.
Internal error occurred: failed calling webhook (…) x509: certificate signed by unknown authority
If you get this error after applying the DynaKube custom resource, your Kubernetes API server may be configured with a proxy. You need to exclude https://dynatrace-webhook.dynatrace.svc
from that proxy.
OneAgent unable to connect when using Istio
cloudNativeFullStack applicationMonitoring
Example error in the logs on the OneAgent pods: Initial connect: not successful - retrying after xs
.
You can fix this problem by increasing the OneAgent timeout. Add the following feature flag to DynaKube:
1kubectl annotate dynakube <name-of-your-DynaKube> feature.dynatrace.com/oneagent-initial-connect-retry-ms=6000 -n dynatrace
Connectivity issues when using Calico
If you use Calico to handle or restrict network connections, you might experience connectivity issues, such as:
The operator, webhook, and CSI driver pods are constantly restarting
The operator cannot reach the API
The CSI driver fails to download OneAgent
Injection into pods doesn't work
If you experience these or similar problems, use our GitHub sample policies for common problems.
- For the
activegate-policy.yaml
anddynatrace-policies.yaml
policies, if Dynatrace Operator isn't installed in thedynatrace
namespace (Kubernetes) or project (OpenShift), you need to adapt the metadata and namespace properties in the YAML files accordingly. - The purpose of the
agent-policy.yaml
andagent-policy-external-only.yaml
policies is to let OneAgents that are injected into pods open external connections. Onlyagent-policy-external-only.yaml
is required, whileagent-policy.yaml
allows internal connections to be made, such as pod-to-pod connections, where needed. - Because these policies are needed for all pods where OneAgent injects, you also need to adapt the
podSelector
property of the YAML files.