To configure automatic backup for a Dynatrace Managed cluster
Go to Settings > Backup.
Enable cluster backup and choose the scope:
User sessions may contain sensitive information.
Exclude user sessions from the backup to remain compliant with GDPR.
Exclude timeseries metric data from the backup if your historical data isn't relevant and you only want to retain configuration data.
Include backup of Log Monitoring events.
(Optional) Select data center. This step is required only if you have multiple data center deployment (Premium High Availability deployment). For more information on Premium High Availability deployments, see High availability for multi-data centers and Disaster recovery from backup
Provide a full path to the mounted network file storage where backup archives are stored.
Configure start time.
You can automatically back up Dynatrace Managed configuration data (naming rules, tags, management zones, alerting profiles, and more), time series metric data, and user sessions. To maximize resilience, save your backup files to an off-site location.
Each node needs to be connected to the shared file system—for example, Network File Sharing (NFS)—and the shared file system needs to be mounted at the same shared directory on each node.
To test your user permissions, run the following command on each node:
su - dynatrace -s /bin/bash -c "touch /nfs/dynatrace/backup/$(uname -n)"ls -ltr /path/to/backup/
The user running Dynatrace services needs read/write permissions for the shared file system.
The shared file system mount must be available at system restart.
You can add a mount point to fstab
or use your disk management tool to make the shared file system mount persistent.
The protocol used to transmit data depends on your configuration. We recommend NFSv4. Due to low performance and resilience, we don't recommend CIFS.
If the shared file system mount point isn't available on system boot, Dynatrace won't start on that node. This may lead to the cluster becoming unavailable. You must disable backups manually to allow Dynatrace to start.
Dynatrace keeps the previous backup until a new one is completed.
Elasticsearch files are stored in uncompressed binary format. While the data is replicated across nodes and there are two replicas in addition to the primary shard, the backup excludes the replicated data.
Elasticsearch backs up incrementally, so it needs to be able to append recent changes to the previous backup. Don't remove Elasticsearch backup files.
The snapshot is performed, by default, every 2 hours and it is incremental.
Initially, Dynatrace copies the entire data set and then creates snapshots of the differences. Older snapshots are removed gradually once they are five (5) days old.
Since Dynatrace keeps some of the older snapshots, backup size grows regardless of the current size on disk. The snapshots are incremental, but Elasticsearch merges data segments over time, which results in certain duplicates in the backup.
Transaction storage data isn't backed up, so when you restore backups you may see gaps in deep monitoring data (for example, distributed traces and code-level traces). By default, transaction storage data is only retained for 10 days. From a long-term perspective, it's not necessary to include transaction storage data in backups.
To restore a cluster, follow the steps below.
To restore a cluster on the same host as the source cluster, make sure to uninstall it first.
Make sure the machines prepared for the cluster restore have a similar hardware and disk layout as the original cluster and sufficient capacity to handle the load after the restore.
We recommend that you restore the cluster to the same number of nodes as the backed up cluster. In exceptional cases, it's possible to restore to a cluster with up to two nodes fewer than the backed up cluster. You risk losing the cluster configuration if you attempt to restore to a cluster that is more than two nodes short of the original backed up cluster.
Make sure the existing cluster is stopped to prevent two clusters with the same ID connecting to Dynatrace Mission Control. See Start/stop/restart a cluster.
Make sure that system users created for Dynatrace Managed have the same UID:GID identifiers on all nodes.
On each target node, mount the backup storage to, for example, /mnt/backup
. This path is referred to as <path-to-backup>
in the steps below.
Ensure the installer has read permissions to the NFS. For example:
sudo adduser dynatrace && sudo chown -R dynatrace:dynatrace <path-to-backup>
Create your cluster inventory. You'll need this information during the restore.
node_<node_id>
(for example, node_1
, node_5
, etc).To restore a cluster, follow the steps below:
Copy the installer to target nodes
To restore the cluster, you need to use the exact same installer version as in the original one. Copy the installer from <path-to-backup>/<UUID>/node_<node_id>/
to a local disk on each target node.
For example cp <path-to-backup>/<UUID>/node_<node_id>/files/backup-001-dynatrace-managed-installer.sh /tmp/
Launch Dynatrace restore on each node
In parallel, on each node, execute the Dynatrace Managed installer using the following parameters:
--restore
- switches the installer into the restore mode.--cluster-ip
- IPv4 address of the node on which you run the installer.--cluster-nodes
- the comma-delimited list of IDs and IP addresses of all nodes in the cluster, including the one on which you run the installer, in the following format <node_id>:<node_ip>,<node_id>:<node_ip>
.--seed-ip
- IPv4 address of the seed node.backup-file
- the path to the backup *.tar
file, which includes the path to the shared file storage mount, the cluster ID, the node ID, the backup version, and the backup *.tar
file in the following format:<path-to-backup>/<UUID>/node_<node_id>/files/<backup_version_number>/<backup_file>
In this example path:
/mnt/backup/bckp/c9dd47f0-87d7-445e-bbeb-26429fac06c6/node_1/files/19/backup-001.tar
the parts of the path are as follows:
<path-to-backup>
= /mnt/backup/bckp/
<UUID>
= c9dd47f0-87d7-445e-bbeb-26429fac06c6
<node_id>
= 1
<backup_version_number>
= 19
While the backup is in progress, two backup directories may be present with different backup version numbers:
The backup version number is incremented with each backup execution.
Get the IDs and IP addresses from the inventory you created before you started.
For example:
10.176.41.168
- The IP address of the node to restore
1: 10.176.41.168, 3: 10.176.41.169, 5: 10.176.41.170
- Node IDs and new IP addresses of all nodes in the cluster
sudo /tmp/backup-001-dynatrace-managed-installer.sh--restore--cluster-ip "10.176.41.168"--cluster-nodes "1:10.176.41.168,3:10.176.41.169,5:10.176.41.170"--seed-ip "10.176.41.169"--backup-file /mnt/backup/bckp/c9dd47f0-87d7-445e-bbeb-26429fac06c6/node_1/files/19/backup-001.tar
Start the firewall, Cassandra and Elasticsearch
On each node successively, start the firewall, Cassandra and Elasticsearch using the launcher script:
/opt/dynatrace-managed/launcher/firewall.sh start/opt/dynatrace-managed/launcher/cassandra.sh start/opt/dynatrace-managed/launcher/elasticsearch.sh start
Verify Cassandra state
On each node, check if Cassandra is running. Execute the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status
All the nodes of the restored cluster should be listed in the response with the following values:
Status = Up
State = Normal
Verify Elasticsearch state
On each node, check if Elasticsearch is running. Execute the command:
curl -s -N -XGET 'http://localhost:9200/_cluster/health?pretty' | grep status
You should get the following response:
"status" : "green"
or for one node setup:
"status" : "yellow"
Restore the Elasticsearch database
On the seed node, run the following command:
<dynatrace-install-dir>/utils/restore-elasticsearch-data.sh <path-to-backup>/<UUID>
Restore metrics and configuration data files
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/utils/restore-cassandra-data.sh <path-to-backup>/<UUID>/node_<node_id>/files/backup-001.tar
Wait until Cassandra has its cluster fully set. Use the command:
<dynatrace-install-dir>/utils/cassandra-nodetool.sh status
Status = Up
State = Normal
optional Repair Cassandra
You can run the repair only on clusters with more than one node.
Sequentially on all nodes, initiate the Cassandra repair:
<dynatrace-install-dir>/utils/repair-cassandra-data.sh
This is for ensuring data consistency between the nodes. This step may take several hours to complete.
To restore just the configuration data without metric timeseries, execute the following command:
<dynatrace-install-dir>/utils/repair-cassandra-data.sh 1
Start Dynatrace
On each node successively, starting with the seed node, run the following command:
<dynatrace-install-dir>/launcher/dynatrace.sh start
Wait until you can sign in to Cluster Management Console.
optional Remove remaining references to old nodes
In case you decided to restore fewer nodes than in the original cluster, remove the nodes marked as Offline
in the Cluster Management Console. For more information, see Remove a cluster node
Switch OneAgents to the new cluster address
If you originally configured the cluster with the DNS for OneAgents, you only need to update the DNS records as explained in the next step.
Otherwise, you need to configure Cluster ActiveGates (or OneAgents, if no ActiveGates are used) with the new target address and restart them. If there are no Cluster ActiveGates but there are Environment ActiveGates, this should be done on the Environment ActiveGates.
Otherwise, you need to configure and restart Cluster ActiveGates (or OneAgents if no ActiveGates are used) with the new target address.
seedServerUrl
parameter in config.properties
.cluster.properties
with new URLs.connectivity_history.properties
file.Execute the following cluster API call for each node, replacing <node-id>
with the node identifier, <node-ip>
with the node IPV4 address, and <Api-Token>
with a valid Cluster API token.
curl -ikS -X PUT -d <node-ip> https://<node_ip>:8021/api/v1.0/onpremise/endpoint/publicIp/agents/<node-id>?Api-Token=<Api-Token> -H "accept: application/json" -H "Content-Type: application/json"
You should receive the 200
response as in the example below:
HTTP/1.1 200 OKDate: Tue, 19 Feb 2019 17:49:06 GMTX-Robots-Tag: noindexServer: ruxit serverContent-Length: 0
optional Update cluster DNS records
If the cluster restore resulted in changing the IP addresses, update the DNS records.
<node-id>
with the node identifier, <node-ip>
with the node IPV4 address, and <Api-Token>
with a valid API token.curl -ikS -X PUT -d <node-ip> https://<Node-ip>:8021/api/v1.0/onpremise/endpoint/publicIp/domain/<node-id>?Api-Token=<Api-Token> -H "accept: application/json" -H "Content-Type: application/json"
You should receive the 200
response as in the example below:
HTTP/1.1 200 OKDate: Tue, 19 Feb 2019 17:49:06 GMTX-Robots-Tag: noindexServer: ruxit serverContent-Length: 0
Enable the backup
To prevent overwriting the previous snapshot, the backup is automatically disabled after the restore. Once you have finished restoring, you should enable the backup again.
Go to Settings > Backup.
Turn on Enable cluster backup and confirm the full path to the backup archive and schedule daily backup time.
Certain situations require that you manually disable cluster backup. For example, if the shared file system mount point isn't available on system boot, Dynatrace won't start on that node. In this case, you need to disable backups manually to allow Dynatrace to start.
Edit the <install-dir>/elasticsearch/config/elasticsearch.yml
file.
Remove the line with the path.repo:
parameter.
For example:
network.host: [ _local_, "10.10.10.10" ]network.publish_host: 10.10.10.10path.data: /var/opt/dynatrace-managed/elasticsearchpath.repo: <REMOVE THIS LINE>
Save the file and restart the system. See Start/stop/restart a cluster