NVIDIA GPU extension

  • Latest Dynatrace
  • Extension
  • Published Oct 27, 2025

Monitor base parameters of the NVIDIA GPU, including load, memory and temperature

Get started

Overview

This extension monitors base parameters of NVIDIA GPUs, tracking load, memory and resource utilization of the GPUs. The extension leverages Python access to the NVIDIA toolset to provide details on GPU utilization.

Use it to expand monitoring of your hosts onto GPU and have an overview of their utilization.

This extension is executed by the OneAgent (local monitoring).

Use cases

This extension enables you to:

  • Monitor utilization of the GPU across your environment
  • Locate bottlenecks in GPU memory usage

Requirements

This extension relies on following external libraries, which need to be supported by your GPU (card and driver):

  • gpustat
  • nvidia-ml-py

Activation and setup

To start, simply activate the extension in your environment using the in-product Hub.

Details

Metrics collected:

  • Number of processes running on GPU
  • Utilization percent
  • Memory usage
  • Total memory
  • GPU temperature

This extension is built on top of the Extension Framework 2.0 and delivers:

  • Code to retrieve metrics from NVIDIA GPU
  • Unified analysis screens expanding the host overview

Feature sets

When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.

In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.

All metrics that aren't categorized into any feature set are considered to be the default and are always reported.

A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.

Related tags
ComputePythonGPUNVIDIAInfrastructure Observability