Monitor base parameters of the NVIDIA GPU, including load, memory and temperature
This extension monitors base parameters of NVIDIA GPUs, tracking load, memory and resource utilization of the GPUs. The extension leverages Python access to the NVIDIA toolset to provide details on GPU utilization.
Use it to expand monitoring of your hosts onto GPU and have an overview of their utilization.
This extension is executed by the OneAgent (local monitoring).
This extension enables you to:
This extension relies on following external libraries, which need to be supported by your GPU (card and driver):
To start, simply activate the extension in your environment using the in-product Hub.
Metrics collected:
This extension is built on top of the Extension Framework 2.0 and delivers:
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.