NVIDIA BCM extension

  • Latest Dynatrace
  • Extension
  • Published Oct 27, 2025

Monitor your NVIDIA Base Command Manager (BCM) cluster by enabling this ActiveGate extension

Get started

Overview

NVIDIA Base Command Manager (BCM) streamlines cluster provisioning, workload management, and infrastructure monitoring. It provides all the tools you need to deploy and manage an AI data center.

This extension provides real-time insights into your whole cluster—including nodes, disks, and GPUs—allowing you to correlate that data with the rest of your monitored environment and easily pinpoint issues and bottlenecks.

Requirements

  • Dynatrace version 1.309+
  • ActiveGate version 1.309+
  • ActiveGate with the Extensions 2.0 module enabled.
  • A certificate and its key to access the NVIDIA BCM API, typically found under /root/.cm on the head node. Once located, you will need to copy them into the filesystem of the ActiveGate and make sure that the dtuserag(Linux) system user or Local Service (Windows) can access them.

Activation and setup

  1. Under Extensions in the left menu, select NVIDIA BCM.
  2. Select an ActiveGate group where the extension will run.
  3. Configure it as follows:
    • URL: Address to connect to the API of the head node, usually on port 8081.
    • Certificate path: Physical location where you added the certificate. It needs to be accessible by the dtuserag system user (Linux) or Local Service (Windows).
    • Key path: Physical location where you added the key to the above certificate. It needs to be accessible by the dtuserag system user (Linux) or Local Service (Windows).
    • HTTPS Proxy: Address for the proxy, if one is required.
    • Proxy username: Username to authenticate against the proxy.
    • Proxy password: Password for the above user.
    • Debug: Produces more verbose logs for troubleshooting.

Details

Licensing and cost

The metrics collected through this extension consume Dynatrace Davis Data Units (see DDUs for metrics).

A rough estimation of the amount of DDUs consumed by metric ingest can be obtained through the following formula:

( (4 * number of clusters)
+ (10 * number of nodes (both head and worker nodes))
+ (1 * number of disks)
+ (2 * number of GPUs)
) * 525.6 DDUs/year

If your license consists of Custom Metrics, each custom metric is equivalent to 525.6 DDUs/yr. For details, see Metric Cost Calculation.

Feature sets

When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.

In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.

All metrics that aren't categorized into any feature set are considered to be the default and are always reported.

A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.

GPU
Metric nameMetric keyDescription
GPU memory freenvidia.bcm.gpu_mem_free
GPU memory utilizationnvidia.bcm.gpu_utilization
Disk
Metric nameMetric keyDescription
Disk free spacenvidia.bcm.free_space
CPU
Metric nameMetric keyDescription
CPU Systemnvidia.bcm.cpu_system
CPU Usagenvidia.bcm.cpu_usage
CPU Usernvidia.bcm.cpu_user
CPU Waitnvidia.bcm.cpu_wait
Memory
Metric nameMetric keyDescription
Hardware corrupted memorynvidia.bcm.hardware_corrupted_memory
Memory freenvidia.bcm.memory_free
Page swap innvidia.bcm.page_swap_in
Page swap outnvidia.bcm.page_swap_out
Swap freenvidia.bcm.swap_free
Out of memory killernvidia.bcm.oomkiller
Total free memorynvidia.bcm.total_memory_free
Total free swapnvidia.bcm.total_swap_free
Related tags
ComputePythonGPUNVIDIAInfrastructure Observability